Guidelines for Autonomic Service AgentsSchool of Computer ScienceUniversity of AucklandPB 92019Auckland1142NZbrian.e.carpenter@gmail.comRakuten MobileParisFRlaurent.ciavaglia@rakuten.comHuawei Technologies Co., LtdQ14 Huawei Campus156 Beiqing RoadHai-Dian DistrictBeijing100095CNjiangsheng@huawei.comNokiaVillarceaux91460NozayFRpierre.peloso@nokia.comGRASPautonomousautonomic functionself-managementautonomic networkingautonomous operationself-managementinfrastructureintentautonomic control planeThis document proposes guidelines for the design of Autonomic Service
Agents for autonomic networks. Autonomic Service Agents, together with
the Autonomic Network Infrastructure, the Autonomic Control Plane, and
the GeneRic Autonomic Signaling Protocol, constitute base elements of an
autonomic networking ecosystem.
Status of This Memo
This document is not an Internet Standards Track specification; it is
published for informational purposes.
This document is a product of the Internet Engineering Task Force
(IETF). It represents the consensus of the IETF community. It has
received public review and has been approved for publication by the
Internet Engineering Steering Group (IESG). Not all documents
approved by the IESG are candidates for any level of Internet
Standard; see Section 2 of RFC 7841.
Information about the current status of this document, any
errata, and how to provide feedback on it may be obtained at
.
Copyright Notice
Copyright (c) 2022 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
() in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with
respect to this document. Code Components extracted from this
document must include Revised BSD License text as described in
Section 4.e of the Trust Legal Provisions and are provided without
warranty as described in the Revised BSD License.
Table of Contents
. Introduction
. Terminology
. Logical Structure of an Autonomic Service Agent
. Interaction with the Autonomic Networking Infrastructure
. Interaction with the Security Mechanisms
. Interaction with the Autonomic Control Plane
. Interaction with GRASP and its API
. Interaction with Policy Mechanisms
. Interaction with Non-autonomic Components and Systems
. Design of GRASP Objectives
. Life Cycle
. Installation Phase
. Installation Phase Inputs and Outputs
. Instantiation Phase
. Operator's Goal
. Instantiation Phase Inputs and Outputs
. Instantiation Phase Requirements
. Operation Phase
. Removal Phase
. Coordination and Data Models
. Coordination between Autonomic Functions
. Coordination with Traditional Management Functions
. Data Models
. Robustness
. Security Considerations
. IANA Considerations
. References
. Normative References
. Informative References
. Example Logic Flows
Acknowledgements
Authors' Addresses
Introduction
This document proposes guidelines for the design of Autonomic Service Agents
(ASAs) in the context of an Autonomic Network (AN) based on the Autonomic Network
Infrastructure (ANI) outlined in the autonomic networking reference model .
This infrastructure makes use of
the Autonomic Control Plane (ACP) and
the GeneRic Autonomic Signaling Protocol (GRASP) .
A general introduction to this environment may be found at ,
which also includes explanatory diagrams,
and a summary of terminology is in .
This document is a contribution to the description of an autonomic
networking ecosystem, recognizing that a deployable autonomic network
needs more than just ACP and GRASP implementations. Such an autonomic
network must achieve management tasks that a Network Operations Center
(NOC) cannot readily achieve manually, such as continuous resource
optimization or automated fault detection and repair. These tasks, and
other management automation goals, are described at length in . The net result should be significant operational
improvement. To achieve this, the autonomic networking ecosystem must
include at least a library of ASAs and corresponding GRASP technical
objective definitions. A GRASP objective is a
data structure whose main contents are a name and a value. The value
consists of a single configurable parameter or a set of parameters of
some kind.There must also be tools to deploy and oversee ASAs, and integration with
existing operational mechanisms . However, this document focuses
on the design of ASAs, with some reference to implementation and operational aspects.
There is considerable literature about autonomic agents with a variety of
proposals about how they should be characterized. Some examples are
,
,
, and
. However, for the present document,
the basic definitions and goals for autonomic networking given in
apply. According to RFC 7575, an Autonomic Service Agent is
"An agent implemented
on an autonomic node that implements an autonomic function, either in part
(in the case of a distributed function) or whole."ASAs must be distinguished from other forms of software
components. They are components of network or service management; they
do not in themselves provide services to end users. They do, however,
provide management services to network operators and administrators.
For example, the services envisaged for network function virtualization
(NFV) or for service function chaining (SFC) might be managed by an ASA rather than by traditional
configuration tools.Another example is that an existing script running within a router to
locally monitor or configure functions or services could be upgraded to an
ASA that could communicate with peer scripts on neighboring or remote
routers. A high-level API will allow such upgraded scripts to take full
advantage of the secure ACP and the discovery, negotiation, and
synchronization features of GRASP. Familiar tasks such as configuring an
Interior Gateway Protocol (IGP) on neighboring routers or even exchanging
IGP security keys could be performed securely in this way. This document
mainly addresses issues affecting quite complex ASAs, but initially, the
most useful ASAs may in fact be rather simple evolutions of existing
scripts.The reference model for autonomic networks
explains further the functionality of ASAs by adding the following:
[An ASA is] a process that makes use of the features provided by the
ANI to achieve its own goals, usually including interaction with other
ASAs via GRASP or otherwise. Of
course, it also interacts with the specific targets of its function,
using any suitable mechanism. Unless its function is very simple, the
ASA will need to handle overlapping asynchronous operations. It may
therefore be a quite complex piece of software in its own right, forming
part of the application layer above the ANI.
As mentioned, there will certainly be simple ASAs that manage a
single objective in a straightforward way and do not need asynchronous
operations. In nodes where computing power and memory space are limited,
ASAs should run at a much lower frequency than the primary workload, so
CPU load should not be a big issue, but memory footprint in a
constrained node is certainly a concern. ASAs installed in constrained
devices will have limited functionality. In such cases, many aspects of
the current document do not apply. However, in the general case, an ASA
may be a relatively complex software component that will in many cases
control and monitor simpler entities in the same or remote host(s). For
example, a device controller that manages tens or hundreds of simple
devices might contain a single ASA. The remainder of this document offers guidance on the design of complex ASAs.
Some of the material may be familiar to those experienced in distributed
fault-tolerant and real-time control systems. Robustness and security
are of particular importance in autonomic networks and are discussed
in Sections and .TerminologyThis section summarizes various acronyms and terminology used in the
document. Where no other reference is given, please consult or .
Autonomic:
self-managing (self-configuring, self-protecting, self- healing,
self-optimizing), but allowing high-level guidance by a central entity
such as a NOC
Autonomic Function:
a function that adapts on its own to a changing environment
Autonomic Node:
a node that employs autonomic functions
ACP:
Autonomic Control Plane
AN:
Autonomic Network; a network of autonomic nodes, which interact directly with each other
ANI:
Autonomic Network Infrastructure
ASA:
Autonomic Service Agent; an agent installed on an autonomic node
that implements an autonomic function, either partially (in the case
of a distributed function) or completely
BRSKI:
Bootstrapping Remote Secure Key Infrastructure
CBOR:
Concise Binary Object Representation
GRASP:
GeneRric Autonomic Signaling Protocol
GRASP API:
GRASP Application Programming Interface
NOC:
Network Operations Center
Objective:
A GRASP technical objective is a data structure whose main
contents are a name and a value. The value consists of a single
configurable parameter or a set of parameters of some kind .
Logical Structure of an Autonomic Service AgentAs mentioned above, all but the simplest ASAs will need to support
asynchronous operations. Different programming environments support
asynchronicity in different ways. In this document, we use an explicit
multi-threading model to describe operations. This is illustrative, and
alternatives to multi-threading are discussed in detail in connection
with the GRASP API (see ).
A typical ASA will have a main thread that performs various initial
housekeeping actions such as:
obtain authorization credentials, if needed
register the ASA with GRASP
acquire relevant policy parameters
declare data structures for relevant GRASP objectives
register with GRASP those objectives that it will actively manage
launch a self-monitoring thread
enter its main loop
The logic of the main loop will depend on the details of the autonomic function concerned.
Whenever asynchronous operations are required, extra threads may be launched.
Examples of such threads include:
repeatedly flood an objective to the AN so that any ASA can
receive the objective's latest value
accept incoming synchronization requests for an objective managed by this ASA
accept incoming negotiation requests for an objective managed by this ASA,
and then conduct the resulting negotiation with the counterpart ASA
manage subsidiary non-autonomic devices directly
These threads should all either exit after their job is done or
enter a wait state for new work to avoid wasting system resources.According to the degree of parallelism needed by the application,
some of these threads might be launched in multiple instances. In
particular, if negotiation sessions with other ASAs are expected to be
long or to involve wait states, the ASA designer might allow for
multiple simultaneous negotiating threads, with appropriate use of
queues and synchronization primitives to maintain consistency.The main loop itself could act as the initiator of synchronization
requests or negotiation requests when the ASA needs data or resources
from other ASAs. In particular, the main loop should watch for changes
in policy parameters that affect its operation and, if appropriate,
occasionally refresh authorization credentials. It should also do
whatever is required to avoid unnecessary resource consumption, for
example, by limiting its frequency of execution.The self-monitoring thread is of considerable importance. Failure of
autonomic service agents is highly undesirable. To a large extent, this
depends on careful coding and testing, with no unhandled error returns
or exceptions, but if there is nevertheless some sort of failure, the
self-monitoring thread should detect it, fix it if possible, and, in the
worst case, restart the entire ASA. presents some example logic flows in informal pseudocode.Interaction with the Autonomic Networking InfrastructureInteraction with the Security MechanismsAn ASA by definition runs in an autonomic node. Before any normal
ASAs are started, such nodes must be bootstrapped into the autonomic
network's secure key infrastructure, typically in accordance with
. This key infrastructure will be used to
secure the ACP (next section) and may be used by ASAs to set up
additional secure interactions with their peers, if needed.Note that the secure bootstrap process itself incorporates simple
special-purpose ASAs that use a restricted mode of GRASP ().Interaction with the Autonomic Control PlaneIn a normal autonomic network, ASAs will run as clients of the ACP,
which will provide a fully secured network environment for all
communication with other ASAs, in most cases mediated by GRASP (next
section).Note that the ACP formation process itself incorporates simple
special-purpose ASAs that use a restricted mode of GRASP ().Interaction with GRASP and its APIIn a node where a significant number of ASAs are installed, GRASP
is likely to run as a separate process with
its API available in user space. Thus, ASAs
may operate without special privilege, unless they need it for other
reasons. The ASA's view of GRASP is built around GRASP objectives
(), defined as data structures containing
administrative information such as the objective's unique name, and
its current value. The format and size of the value is not restricted
by the protocol, except that it must be possible to serialize it for
transmission in Concise Binary Object Representation (CBOR) , subject only to GRASP's maximum message size as
discussed in .As discussed in , GRASP is an asynchronous
protocol, and this document uses a multi-threading model to describe
operations. In many programming environments, an "event loop" model is
used instead, in which case each thread would be implemented as an event
handler called in turn by the main loop. For this case, the GRASP API
must provide non-blocking calls and possibly support callbacks. This
topic is discussed in more detail in , and other
asynchronicity models are also possible. Whenever necessary, the GRASP
session identifier will be used to distinguish simultaneous
operations.The GRASP API should offer the following features:
Registration functions, so that an ASA can register itself and
the objectives that it manages.
A discovery function, by which an ASA can discover other ASAs
supporting a given objective.
A negotiation request function, by which an ASA can start
negotiation of an objective with a counterpart ASA. With this,
there is a corresponding listening function for an ASA that wishes
to respond to negotiation requests and a set of functions to
support negotiating steps. Once a negotiation starts, it is a
symmetric process with both sides sending successive objective
values to each other until agreement is reached (or the negotiation
fails).
A synchronization function, by which an ASA can request the
current value of an objective from a counterpart ASA. With this,
there is a corresponding listening function for an ASA that wishes
to respond to synchronization requests. Unlike negotiation,
synchronization is an asymmetric process in which the listener sends
a single objective value to the requester.
A flood function, by which an ASA can cause the current value of
an objective to be flooded throughout the AN so that any ASA can
receive it.
For further details and some additional housekeeping functions, see .
The GRASP API is intended to support the various interactions
expected between most ASAs, such as the interactions outlined in . However, if ASAs require additional
communication between themselves, they can do so directly over the ACP
to benefit from its security. One option is to use GRASP discovery and
synchronization as a rendezvous mechanism between two ASAs, passing
communication parameters such as a TCP port number via GRASP. The use
of TLS over the ACP for such communications is advisable, as described
in .Interaction with Policy Mechanisms At the time of writing, the policy mechanisms for the ANI are
undefined. In particular, the use of declarative policies (aka
Intents) for the definition and management of an ASA's behaviors remains
a research topic . In the cases where ASAs are defined as closed control
loops, the specifications defined in
regarding imperative and declarative goal statements may be
applicable.In the ANI, policy dissemination is expected to operate by
an information distribution mechanism (e.g., via GRASP ) that can reach all autonomic nodes and
therefore every ASA. However, each ASA must be capable of
operating "out of the box" in the absence of locally defined
policy, so every ASA implementation must include carefully
chosen default values and settings for all policy
parameters.Interaction with Non-autonomic Components and SystemsTo have any external effects, an ASA must also interact with non-autonomic
components of the node where it is installed. For example, an ASA whose purpose
is to manage a resource must interact with that resource. An ASA managing
an entity that is also managed by local software must interact
with that software. For example, if such management is performed by NETCONF
, the ASA must interact with the NETCONF
server as an independent NETCONF client in the same node to avoid
any inconsistency between configuration changes delivered
via NETCONF and configuration changes made by the ASA.In an environment where systems are virtualized and specialized using
techniques such as network function virtualization or network slicing,
there will be a design choice whether ASAs are deployed once per physical node
or once per virtual context. A related issue is whether the ANI as a whole
is deployed once on a physical network or whether several virtual ANIs
are deployed. This aspect needs to be considered by the ASA designer.Design of GRASP ObjectivesThe design of an ASA will often require the design of a new GRASP
objective. The general rules for the format of GRASP objectives, their
names, and IANA registration are given in . Additionally, that document discusses various
general considerations for the design of objectives, which are not
repeated here. However, note that GRASP, like HTTP, does
not provide transactional integrity. In particular, steps in a GRASP
negotiation are not idempotent. The design of a GRASP objective and the
logic flow of the ASA should take this into account. One approach, which
should be used when possible, is to design objectives with idempotent
semantics. If this is not possible, typically if an ASA is allocating
part of a shared resource to other ASAs, it needs to ensure that the
same part of the resource is not allocated twice. The easiest way is to
run only one negotiation at a time. If an ASA is capable of overlapping
several negotiations, it must avoid interference between these
negotiations.
Negotiations will always end, normally because one end or the other
declares success or failure. If this does not happen, either a timeout or
exhaustion of the loop count will occur. The definition of a GRASP
objective should describe a specific negotiation policy if it is not self-evident.GRASP allows a "dry run" mode of negotiation, where a negotiation
session follows its normal course but is not committed at either end
until a subsequent live negotiation session. If dry run mode is
defined for the objective, its specification, and every implementation,
must consider what state needs to be saved following a dry run
negotiation, such that a subsequent live negotiation can be expected to
succeed. It must be clear how long this state is kept and what happens
if the live negotiation occurs after this state is deleted. An ASA that
requests a dry run negotiation must take account of the possibility that
a successful dry run is followed by a failed live negotiation. Because
of these complexities, the dry run mechanism should only be supported by
objectives and ASAs where there is a significant benefit from it.The actual value field of an objective is limited by the GRASP
protocol definition to any data structure that can be expressed in
Concise Binary Object Representation (CBOR) . For some objectives, a single data item will
suffice, for example, an integer, a floating point
number, a UTF-8 string, or an arbitrary byte string. For more complex
cases, a simple tuple structure such as [item1, item2, item3] could be
used. Since CBOR is closely linked to JSON, it is also rather easy to
define an objective whose value is a JSON structure. The formats
acceptable by the GRASP API will limit the options in practice. A
generic solution is for the API to accept and deliver the value field in
raw CBOR, with the ASA itself encoding and decoding it via a CBOR
library ().The maximum size of the value field of an objective is limited by the
GRASP maximum message size. If the default maximum size specified as
GRASP_DEF_MAX_SIZE by is not enough, the
specification of the objective must indicate the required maximum message
size for both unicast and multicast messages.A mapping from YANG to CBOR is defined by . Subject to
the size limit defined for GRASP messages, nothing prevents objectives transporting YANG in this way.The flexibility of CBOR implies that the value field of many objectives can be extended in service,
to add additional information or alternative content, especially if JSON-like structures are
used. This has consequences for the robustness of ASAs, as discussed in .Life CycleThe ASA life cycle is discussed in ,
from which the following text was derived. It does not cover all details, and some
of the terms used would require precise definitions in a given implementation.In simple cases, autonomic functions could be permanent, in the sense
that ASAs are shipped as part of a product and persist throughout the
product's life. However, in complex cases, a more likely situation is
that ASAs need to be installed or updated dynamically because of new
requirements or bugs. This section describes one approach to the
resulting life cycle of individual ASAs. It does not consider wider
issues such as updates of shared libraries.Because continuity of service is fundamental to autonomic networking,
the process of seamlessly replacing a running instance of an ASA with a
new version needs to be part of the ASA's design. The implication of
service continuity on the design of ASAs can be illustrated along the
three main phases of the ASA life cycle, namely installation,
instantiation, and operation.Installation PhaseWe define "installation" to mean that a piece of software is loaded into
a device, along with any necessary libraries, but is not yet activated.Before being able to instantiate and run ASAs, the operator will first provision the
infrastructure with the sets of ASA software corresponding to its needs and objectives.
Such software must be checked for integrity and authenticity before installation.
The provisioning of the infrastructure is realized in the installation phase and consists of
installing (or checking the availability of) the pieces of software of the different ASAs
in a set of Installation Hosts within the autonomic network.There are three properties applicable to the installation of ASAs:
The dynamic installation property
allows installing an ASA on demand, on any hosts compatible with the ASA.
The decoupling property
allows an ASA on one machine to control resources in another machine
(known as "decoupled mode").
The multiplicity property
allows controlling multiple sets of resources from a single ASA.
These three properties are very important in the context of the installation
phase as their variations condition how the ASA could be installed on the infrastructure. Installation Phase Inputs and OutputsInputs are:
[ASA_type]: specifies which ASA to install.
[Installation_target_infrastructure]: specifies the candidate installation Hosts.
[ASA_placement_function]: specifies how the installation phase will meet the operator's
needs and objectives for the provision of the infrastructure. This
function is only useful in the decoupled mode. It can be as simple
as an explicit list of hosts on which the ASAs are to be
installed, or it could consist of operator-defined criteria and
constraints.
The main output of the installation phase is a [List_of_ASAs] installed on
[List_of_hosts]. This output is also useful for the coordination function
where it acts as a static interaction map (see ).The condition to validate in order to pass to next phase is to
ensure that [List_of_ASAs] are correctly installed on
[List_of_hosts]. A minimum set of primitives to support the
installation of ASAs could be the following: install (List_of_ASAs,
Installation_target_infrastructure, ASA_placement_function) and
uninstall (List_of_ASAs).Instantiation PhaseWe define "instantiation" as the operation of creating a single ASA instance
from the corresponding piece of installed software.Once the ASAs are installed on the appropriate hosts in the
network, these ASAs may start to operate. From the operator
viewpoint, an operating ASA means the ASA manages the network
resources as per the objectives given. At the ASA local level,
operating means executing their control loop algorithm.There are two aspects to take into consideration. First, having a
piece of code installed and available to run on a host is not the same
as having an agent based on this piece of code running inside the
host. Second, in a coupled case, determining which resources are
controlled by an ASA is straightforward (the ASA runs on the same
autonomic node as the resources it is controlling). In a decoupled
mode, determining this is a bit more complex: a starting agent will
have to either discover the set of resources it ought to control, or
such information has to be communicated to the ASA.The instantiation phase of an ASA covers both these aspects:
starting the agent code (when this does not start automatically) and
determining which resources have to be controlled (when this is not
straightforward).Operator's GoalThrough this phase, the operator wants to control its autonomic
network regarding at least two aspects:
determine the scope of autonomic functions by instructing which network
resources have to be managed by which autonomic function (and more precisely
by which release of the ASA software code, e.g., version number or provider).
determine how the autonomic functions are organized by instantiating a set
of ASAs across one or more autonomic nodes and instructing them
accordingly about the other ASAs in the set as necessary.
In this phase, the operator may also want to
set goals for autonomic functions, e.g., by
configuring GRASP objectives.
The operator's goal can be summarized in an instruction to the autonomic ecosystem matching the following format,
explained in detail in the next sub-section:
[Instances_of_ASA_type] ready to control [Instantiation_target_infrastructure] with [Instantiation_target_parameters]Instantiation Phase Inputs and OutputsInputs are:
[Instances_of_ASA_type]: specifies which ASAs to instantiate
[Instantiation_target_infrastructure]: specifies which
resources are to be managed by the autonomic function; this can be
the whole network or a subset of it like a domain, a physical
segment, or even a specific list of resources.
[Instantiation_target_parameters]: specifies which GRASP
objectives are to be sent to ASAs (e.g., an optimization target)
Outputs are:
[Set_of_ASA_resources_relations]: describes which resources are managed by which ASA instances; this is
not a formal message but a resulting configuration log for a set of ASAs.
Instantiation Phase RequirementsThe instructions described in could be
either of the following:
Sent to a targeted ASA. In this case, the receiving Agent will have to manage the
specified list of [Instantiation_target_infrastructure], with the
[Instantiation_target_parameters].
Broadcast to all ASAs. In this case, the ASAs would determine from the list which
ASAs would handle which [Instantiation_target_infrastructure],
with the [Instantiation_target_parameters].
These instructions may be grouped as a specific data structure referred to
as an ASA Instance Mandate. The specification of such an ASA Instance Mandate
is beyond the scope of this document.The conclusion of this instantiation phase is a set of ASA
instances ready to operate. These ASA instances are characterized
by the resources they manage, the metrics being monitored, and the
actions that can be executed (like modifying certain parameter
values). The description of the ASA instance may be defined in an
ASA Instance Manifest data structure. The specification of such an
ASA Instance Manifest is beyond the scope of this document.The ASA Instance Manifest does not only serve informational
purposes such as acknowledgement of successful instantiation to the
operator but is also necessary for further autonomic operations with:
coordinated entities (see )
collaborative entities with purposes such as to establish knowledge exchange
(some ASAs may produce knowledge or monitor metrics that would be useful for other ASAs)
Operation PhaseDuring the operation phase, the operator can:
activate/deactivate ASAs: enable/disable their autonomic loops
modify ASA targets: set different technical objectives
modify ASAs managed resources: update the Instance Mandate to specify a different set of resources to
manage (only applicable to decoupled ASAs)
During the operation phase, running ASAs can interact with other ASAs:
in order to exchange knowledge (e.g., an ASA providing traffic
predictions to a load balancing ASA)
in order to collaboratively reach an objective (e.g., ASAs
pertaining to the same autonomic function will collaborate, e.g., in
the case of a load balancing function, by modifying link metrics
according to neighboring resource loads)
During the operation phase, running ASAs are expected to apply
coordination schemes as per .
Removal PhaseWhen an ASA is removed from service and uninstalled, the above steps
are reversed. It is important
that its data, especially any security key material, is purged.Coordination and Data ModelsCoordination between Autonomic FunctionsSome autonomic functions will be completely independent of each
other. However, others are at risk of interfering with each other; for
example, two different optimization functions might both attempt to
modify the same underlying parameter in different ways. In a complete
system, a method is needed for identifying ASAs that might
interfere with each other and coordinating their actions when
necessary.Coordination with Traditional Management FunctionsSome ASAs will have functions that overlap with existing
configuration tools and network management mechanisms such as
command-line interfaces, DHCP, DHCPv6, SNMP, NETCONF, and RESTCONF.
This is, of course, an existing problem whenever multiple configuration
tools are in use by the NOC. Each ASA designer will need to consider
this issue and how to avoid clashes and inconsistencies in various
deployment scenarios. Some specific considerations for interaction with
OAM tools are given in . As another example,
describes how autonomic management of IPv6
prefixes can interact with prefix delegation via DHCPv6. The description
of a GRASP objective and of an ASA using it should include a discussion
of any such interactions.Data ModelsManagement functions often include a shared data model, quite likely
to be expressed in a formal notation such as YANG. This aspect should
not be an afterthought in the design of an ASA. To the contrary, the
design of the ASA and of its GRASP objectives should match the data
model; as noted in , YANG serialized as CBOR may
be used directly as the value of a GRASP objective.RobustnessIt is of great importance that all components of an autonomic system
are highly robust. Although ASA designers should aim for their
component to never fail, it is more important to design the ASA to
assume that failures will happen and to gracefully recover from those
failures when they occur. Hence, this section lists various aspects of
robustness that ASA designers should consider:
If despite all precautions, an ASA does encounter a fatal error,
it should in any case restart automatically and try again. To
mitigate a loop in case of persistent failure, a suitable pause should
be inserted before such a restart. The length of the pause depends on
the use case; randomization and exponential backoff should be
considered.
If a newly received or calculated value for a parameter falls out
of bounds, the corresponding parameter should be either left unchanged
or restored to a value known to be safe in all configurations.
If a GRASP synchronization or negotiation session fails for any
reason, it may be repeated after a suitable pause. The length of the
pause depends on the use case; randomization and exponential backoff
should be considered.
If a session fails repeatedly, the ASA should consider that its
peer has failed, and it should cause GRASP to flush its discovery cache and
repeat peer discovery.
In any case, it may be prudent to repeat discovery periodically,
depending on the use case.
Any received GRASP message should be checked. If it is wrongly
formatted, it should be ignored. Within a unicast session, an Invalid
message (M_INVALID) may be sent. This function may be provided by the
GRASP implementation itself.
Any received GRASP objective should be checked. Basic formatting
errors like invalid CBOR will likely be detected by GRASP itself, but
the ASA is responsible for checking the precise syntax and semantics
of a received objective. If it is wrongly formatted, it should be
ignored. Within a negotiation session, a Negotiation End message
(M_END) with a Decline option (O_DECLINE) should be sent. An ASA may
log such events for diagnostic purposes.
On the other hand, the definitions of GRASP objectives are very
likely to be extended, using the flexibility of CBOR or JSON.
Therefore, ASAs should be able to deal gracefully with unknown components
within the values of objectives. The specification of an objective should
describe how unknown components are to be handled (ignored, logged and
ignored, or rejected as an error).
If an ASA receives either an Invalid message (M_INVALID) or a Negotiation End
message (M_END) with a Decline option (O_DECLINE), one possible reason is that
the peer ASA does not support a new feature of either GRASP or the objective
in question. In such a case, the ASA may choose to repeat the operation concerned
without using that new feature.
All other possible exceptions should be handled in an orderly
way. There should be no such thing as an unhandled exception (but see
point 1 above).
At a slightly more general level, ASAs are not services in themselves,
but they automate services. This has a fundamental impact on how to design
robust ASAs. In general, when an ASA observes a particular state (1) of
operations of the services/resources it controls, it typically aims to
improve this state to a better state, say (2). Ideally, the ASA is built
so that it can ensure that any error encountered can still lead to
returning to (1) instead of a state (3), which is worse than (1). One
example instance of this principle is "make-before-break" used in
reconfiguration of routing protocols in manual operations. This principle
of operations can accordingly be coded into the operation of an ASA. The
GRASP dry run option mentioned in is another tool
helpful for this ASA design goal of "test-before-make".
Security ConsiderationsASAs are intended to run in an environment that is protected by the
Autonomic Control Plane , admission to which
depends on an initial secure bootstrap process such as BRSKI . Those documents describe security considerations
relating to the use of and properties provided by the ACP and BRSKI,
respectively. Such an ACP can provide keying material for mutual
authentication between ASAs as well as confidential communication
channels for messages between ASAs. In some deployments, a secure
partition of the link layer might be used instead. GRASP itself has
significant security considerations . However,
this does not relieve ASAs of responsibility for security. When ASAs
configure or manage network elements outside the ACP, potentially in a
different physical node, they must interact with other non-autonomic
software components to perform their management functions. The details
are specific to each case, but this has an important security
implication. An ASA might act as a loophole by which the managed entity
could penetrate the security boundary of the ANI. Thus, ASAs must be
designed to avoid loopholes such as passing on executable code or
proxying unverified commands and should, if possible, operate in an
unprivileged mode. In particular, they must use secure coding
practices, e.g., carefully validate all incoming information and avoid
unnecessary elevation of privilege. This will apply in particular when
an ASA interacts with a management component such as a NETCONF
server.A similar situation will arise if an ASA acts as a gateway between
two separate autonomic networks, i.e., it has access to two separate
ACPs. Such an ASA must also be designed to avoid loopholes and to
validate incoming information from both sides.As a reminder, GRASP does not intrinsically provide transactional
integrity ().As appropriate to their specific functions, ASAs should take account
of relevant privacy considerations .
The initial version of the autonomic infrastructure assumes that all
autonomic nodes are trusted by virtue of their admission to the ACP.
ASAs are therefore trusted to manipulate any GRASP objective simply
because they are installed on a node that has successfully joined the
ACP. In the general case, a node may have multiple roles, and a role may
use multiple ASAs, each using multiple GRASP objectives. Additional
mechanisms for the fine-grained authorization of nodes and ASAs to
manipulate specific GRASP objectives could be designed. Meanwhile, we
repeat that ASAs should run without special privilege if possible.
Independently of this, interfaces between ASAs and the router
configuration and monitoring services of the node can be subject to
authentication that provides more fine-grained authorization for
specific services. These additional authentication parameters could be
passed to an ASA during its instantiation phase.IANA ConsiderationsThis document has no IANA actions.ReferencesNormative ReferencesConcise Binary Object Representation (CBOR)The Concise Binary Object Representation (CBOR) is a data format whose design goals include the possibility of extremely small code size, fairly small message size, and extensibility without the need for version negotiation. These design goals make it different from earlier binary serializations such as ASN.1 and MessagePack.This document obsoletes RFC 7049, providing editorial improvements, new details, and errata fixes while keeping full compatibility with the interchange format of RFC 7049. It does not create a new version of the format.GeneRic Autonomic Signaling Protocol (GRASP)This document specifies the GeneRic Autonomic Signaling Protocol (GRASP), which enables autonomic nodes and Autonomic Service Agents to dynamically discover peers, to synchronize state with each other, and to negotiate parameter settings with each other. GRASP depends on an external security environment that is described elsewhere. The technical objectives and parameters for specific application scenarios are to be described in separate documents. Appendices briefly discuss requirements for the protocol and existing protocols with comparable features.An Autonomic Control Plane (ACP)Autonomic functions need a control plane to communicate, which depends on some addressing and routing. This Autonomic Control Plane should ideally be self-managing and be as independent as possible of configuration. This document defines such a plane and calls it the "Autonomic Control Plane", with the primary use as a control plane for autonomic functions. It also serves as a "virtual out-of-band channel" for Operations, Administration, and Management (OAM) communications over a network that provides automatically configured, hop-by-hop authenticated and encrypted communications via automatically configured IPv6 even when the network is not configured or is misconfigured.Bootstrapping Remote Secure Key Infrastructure (BRSKI)This document specifies automated bootstrapping of an Autonomic Control Plane. To do this, a Secure Key Infrastructure is bootstrapped. This is done using manufacturer-installed X.509 certificates, in combination with a manufacturer's authorizing service, both online and offline. We call this process the Bootstrapping Remote Secure Key Infrastructure (BRSKI) protocol. Bootstrapping a new device can occur when using a routable address and a cloud service, only link-local connectivity, or limited/disconnected networks. Support for deployment models with less stringent security requirements is included. Bootstrapping is complete when the cryptographic identity of the new key infrastructure is successfully deployed to the device. The established secure connection can be used to deploy a locally issued certificate to the device as well.Informative ReferencesA Day in the Life of an Autonomic Function While autonomic functions are often pre-installed and integrated with
the network elements they manage, this is not a mandatory condition.
Allowing autonomic functions to be dynamically installed and to
control resources remotely enables more versatile deployment
approaches and enlarges the application scope to virtually any legacy
equipment. The analysis of autonomic functions deployment schemes
through the installation, instantiation and operation phases allows
constructing a unified life-cycle and identifying new required
functionality. Thus, the introduction of autonomic technologies will
be facilitated, the adoption much more rapid and broad. Operators
will benefit from multi-vendor, inter-operable autonomic functions
with homogeneous operations and superior quality, and will have more
freedom in their deployment scenarios.
Work in ProgressCBOR Encoding of Data Modeled with YANGWork in ProgressTowards an Agent Model for Future Autonomic CommunicationsAutonomic network engineering for the self-managing Future Internet (AFI); Generic Autonomic Network Architecture (An Architectural Reference Model for Autonomic Networking, Cognitive Networking and Self-Management)ETSIGS AFI 002V1.1.1A survey of autonomic computing - degrees, models, and applicationsACM Computing Surveys (CSUR)Volume 40, Issue 3Intent-Based Networking - Concepts and DefinitionsFutureweiRakuten MobileFederal University of Rio Grande do Sul (UFRGS)Microsoft Intent and Intent-Based Networking (IBN) are taking the industry by
storm. At the same time, IBN-related terms are often used loosely
and inconsistently, in many cases overlapping and confused with other
concepts such as "Policy." This document clarifies the concept of
"Intent" and provides an overview of the functionality that is
associated with it. The goal is to contribute towards a common and
shared understanding of terms, concepts, and functionality that can
be used as the foundation to guide further definition of associated
research and engineering problems and their solutions.
This document is a product of the IRTF Network Management Research
Group (NMRG). It reflects the consensus of the research group,
having received many detailed and positive reviews by RG
participants. It is published for informational purposes.
Work in ProgressAutonomic Networking Gets SeriousThe Internet Protocol JournalVolume 24, Issue 3, Page(s) 2 - 18ISSN 1944-1134A Survey of Autonomic Network Architectures and Evaluation CriteriaIEEE Communications Surveys & TutorialsVolume 14, Issue 2, Pages 464 - 490Network Functions VirtualisationETSISDN and OpenFlow World CongressNetwork Configuration Protocol (NETCONF)The Network Configuration Protocol (NETCONF) defined in this document provides mechanisms to install, manipulate, and delete the configuration of network devices. It uses an Extensible Markup Language (XML)-based data encoding for the configuration data as well as the protocol messages. The NETCONF protocol operations are realized as remote procedure calls (RPCs). This document obsoletes RFC 4741. [STANDARDS-TRACK]Privacy Considerations for Internet ProtocolsThis document offers guidance for developing privacy considerations for inclusion in protocol specifications. It aims to make designers, implementers, and users of Internet protocols aware of privacy-related design choices. It suggests that whether any individual RFC warrants a specific privacy considerations section will depend on the document's content.Autonomic Networking: Definitions and Design GoalsAutonomic systems were first described in 2001. The fundamental goal is self-management, including self-configuration, self-optimization, self-healing, and self-protection. This is achieved by an autonomic function having minimal dependencies on human administrators or centralized management systems. It usually implies distribution across network elements.This document defines common language and outlines design goals (and what are not design goals) for autonomic functions. A high-level reference model illustrates how functional elements in an Autonomic Network interact. This document is a product of the IRTF's Network Management Research Group.Service Function Chaining (SFC) ArchitectureThis document describes an architecture for the specification, creation, and ongoing maintenance of Service Function Chains (SFCs) in a network. It includes architectural concepts, principles, and components used in the construction of composite services through deployment of SFCs, with a focus on those to be standardized in the IETF. This document does not propose solutions, protocols, or extensions to existing protocols.Using an Autonomic Control Plane for Stable Connectivity of Network Operations, Administration, and Maintenance (OAM)Operations, Administration, and Maintenance (OAM), as per BCP 161, for data networks is often subject to the problem of circular dependencies when relying on connectivity provided by the network to be managed for the OAM purposes.Provisioning while bringing up devices and networks tends to be more difficult to automate than service provisioning later on. Changes in core network functions impacting reachability cannot be automated because of ongoing connectivity requirements for the OAM equipment itself, and widely used OAM protocols are not secure enough to be carried across the network without security concerns.This document describes how to integrate OAM processes with an autonomic control plane in order to provide stable and secure connectivity for those OAM processes. This connectivity is not subject to the aforementioned circular dependencies.GeneRic Autonomic Signaling Protocol Application Program Interface (GRASP API)This document is a conceptual outline of an Application Programming Interface (API) for the GeneRic Autonomic Signaling Protocol (GRASP). Such an API is needed for Autonomic Service Agents (ASAs) calling the GRASP protocol module to exchange Autonomic Network messages with other ASAs. Since GRASP is designed to support asynchronous operations, the API will need to be adapted according to the support for asynchronicity in various programming languages and operating systems.Autonomic IPv6 Edge Prefix Management in Large-Scale NetworksThis document defines two autonomic technical objectives for IPv6 prefix management at the edge of large-scale ISP networks, with an extension to support IPv4 prefixes. An important purpose of this document is to use it for validation of the design of various components of the Autonomic Networking Infrastructure.A Reference Model for Autonomic NetworkingThis document describes a reference model for Autonomic Networking for managed networks. It defines the behavior of an autonomic node, how the various elements in an autonomic context work together, and how autonomic services can use the infrastructure.Zero-touch network and Service Management (ZSM); Closed-Loop Automation; Part 1: EnablersETSIGS ZSM 009-1Version 1.1.1Example Logic FlowsThis appendix describes generic logic flows that combine to act as an
Autonomic Service Agent (ASA) for resource management. Note that these
are illustrative examples and are in no sense requirements. As long as
the rules of GRASP are followed, a real implementation could be
different. The reader is assumed to be familiar with GRASP and its conceptual API .
A complete autonomic function for a distributed resource will consist
of a number of instances of the ASA placed at relevant points in a
network. Specific details will, of course, depend on the resource
concerned. One example is IP address prefix management, as specified in
. In this case, an instance of the ASA will
exist in each delegating router.
An underlying assumption is that there is an initial source of the resource in
question, referred to here as an origin ASA. The other ASAs, known as
delegators, obtain supplies of the resource from the origin, delegate
quantities of the resource to consumers that request it, and recover it when
no longer needed.
Another assumption is there is a set of network-wide policy parameters, which
the origin will provide to the delegators. These parameters will control how
the delegators decide how much resource to provide to consumers. Thus, the
ASA logic has two operating modes: origin and delegator. When running as an
origin, it starts by obtaining a quantity of the resource from the NOC, and it
acts as a source of policy parameters, via both GRASP flooding and GRASP
synchronization. (In some scenarios, flooding or synchronization alone might
be sufficient, but this example includes both.)
When running as a delegator, it starts with an empty resource pool,
acquires the policy parameters by GRASP synchronization, and delegates
quantities of the resource to consumers that request it. Both as an origin and as a delegator, when its pool is low,
it seeks quantities of the resource by
requesting GRASP negotiation with peer ASAs. When its pool is sufficient, it
hands out resource to peer ASAs in response to negotiation requests. Thus,
over time, the initial resource pool held by the origin will be shared among
all the delegators according to demand.
In theory, a network could include any number of origins and any number of
delegators, with the only condition being that each origin's initial resource
pool is unique. A realistic scenario is to have exactly one origin and as many
delegators as you like. A scenario with no origin is useless.
An implementation requirement is that resource pools are kept in stable storage. Otherwise, if a delegator exits for any reason, all the resources it has obtained or delegated are lost. If an origin exits, its entire spare pool is lost. The logic for using stable storage and for crash recovery is not included in the pseudocode below, which focuses on communication between ASAs. Since GRASP operations are not intrinsically idempotent, data integrity during failure scenarios is the responsibility of the ASA designer. This is a complex topic in its own right that is not discussed in the present document.
The description below does not implement GRASP's dry run function. That would require temporarily marking any resource handed out in a dry run negotiation as reserved, until either the peer obtains it in a live run, or a suitable timeout occurs.
The main data structures used in each instance of the ASA are:
resource_pool: an ordered list of available resources, for example. Depending on the
nature of the resource, units of resource are split when appropriate, and a
background garbage collector recombines split resources if they are returned
to the pool.
delegated_list: where a delegator stores the resources it has given to subsidiary devices.
Possible main logic flows are below, using a threaded implementation model. As noted above, alternative approaches to asynchronous operations are possible. The transformation to an event loop model should be apparent; each thread would correspond to one event in the event loop.
The GRASP objectives are as follows:
["EX1.Resource", flags, loop_count, value], where the value depends on the resource concerned but will typically include its size and
identification.
["EX1.Params", flags, loop_count, value], where the value will be, for example, a JSON object defining the applicable parameters.
In the outline logic flows below, these objectives are represented simply by their names.
MAIN PROGRAM:
Create empty resource_pool (and an associated lock)
Create empty delegated_list
Determine whether to act as origin
if origin:
Obtain initial resource_pool contents from NOC
Obtain value of EX1.Params from NOC
Register ASA with GRASP
Register GRASP objectives EX1.Resource and EX1.Params
if origin:
Start FLOODER thread to flood EX1.Params
Start SYNCHRONIZER listener for EX1.Params
Start MAIN_NEGOTIATOR thread for EX1.Resource
if not origin:
Obtain value of EX1.Params from GRASP flood or synchronization
Start DELEGATOR thread
Start GARBAGE_COLLECTOR thread
good_peer = none
do forever:
if resource_pool is low:
Calculate amount A of resource needed
Discover peers using GRASP M_DISCOVER / M_RESPONSE
if good_peer in peers:
peer = good_peer
else:
peer = #any choice among peers
grasp.request_negotiate("EX1.Resource", peer)
#i.e., send negotiation request
Wait for response (M_NEGOTIATE, M_END or M_WAIT)
if OK:
if offered amount of resource sufficient:
Send M_END + O_ACCEPT #negotiation succeeded
Add resource to pool
good_peer = peer #remember this choice
else:
Send M_END + O_DECLINE #negotiation failed
good_peer = none #forget this choice
sleep() #periodic timer suitable for application scenario
MAIN_NEGOTIATOR thread:
do forever:
grasp.listen_negotiate("EX1.Resource")
#i.e., wait for negotiation request
Start a separate new NEGOTIATOR thread for requested amount A
NEGOTIATOR thread:
Request resource amount A from resource_pool
if not OK:
while not OK and A > Amin:
A = A-1
Request resource amount A from resource_pool
if OK:
Offer resource amount A to peer by GRASP M_NEGOTIATE
if received M_END + O_ACCEPT:
#negotiation succeeded
elif received M_END + O_DECLINE or other error:
#negotiation failed
Return resource to resource_pool
else:
Send M_END + O_DECLINE #negotiation failed
#thread exits
DELEGATOR thread:
do forever:
Wait for request or release for resource amount A
if request:
Get resource amount A from resource_pool
if OK:
Delegate resource to consumer #atomic
Record in delegated_list #operation
else:
Signal failure to consumer
Signal main thread that resource_pool is low
else:
Delete resource from delegated_list
Return resource amount A to resource_pool
SYNCHRONIZER thread:
do forever:
Wait for M_REQ_SYN message for EX1.Params
Reply with M_SYNCH message for EX1.Params
FLOODER thread:
do forever:
Send M_FLOOD message for EX1.Params
sleep() #periodic timer suitable for application scenario
GARBAGE_COLLECTOR thread:
do forever:
Search resource_pool for adjacent resources
Merge adjacent resources
sleep() #periodic timer suitable for application scenario
AcknowledgementsValuable comments were received from , , , ,
, ,
, ,
, , and other IESG members.Authors' AddressesSchool of Computer ScienceUniversity of AucklandPB 92019Auckland1142NZbrian.e.carpenter@gmail.comRakuten MobileParisFRlaurent.ciavaglia@rakuten.comHuawei Technologies Co., LtdQ14 Huawei Campus156 Beiqing RoadHai-Dian DistrictBeijing100095CNjiangsheng@huawei.comNokiaVillarceaux91460NozayFRpierre.peloso@nokia.com