requirement of iterative methods, facilitating that the
system behaves almost like a reactive system with
reduced response time. Another relevant feature of
RL exploited by the LINKER Architecture is that it
explicitly considers the whole problem of a goal-
directed agent interacting with an uncertain
environment. Moreover, this is in contrast with many
approaches that consider sub-problems without
addressing how they might fit into a larger picture.
Even more, this is important for the LINKER
Architecture because it is distributed control
architecture of LSS, where some of its control
variables overlap between sub-systems; this issue is
aboard in the next section.
This paper aims to explain how using the
proposed learning techniques and the LINKER
Architecture, and is possible to integrate agents of a
distributed system with LINKER agents trained with
the proposed planning techniques. Each LINKER
agent calculates the value of shared variables between
overlapping systems looking for the global optimum
of the relation and coordinating its process with the
other agents of the system obtaining an overall good
performance. This work also proposes a solution that
makes possible to achieve the benefits of RL
techniques in critical systems that cannot afford to
pay the learning curve of a learner agent. Even more,
this is made using a meaningful reinforcement given
by the distributed agents that try the actions in its
internal model in offline training. Once all the
functions learned are evaluated and approved, the
LINKER agents use an online optimization algorithm
that can also have adaptation properties.
Another contribution of this paper is to compare
two learning techniques. In the first one, the actions
used in training are dictated by a teacher that, in this
case, is the centralized MPC (Model Predictive
Control) controller. In second one a learning
technique where actions are randomly selected. The
LINKER agent explores actions trying and evaluating
it, through the interaction with the agents that directly
control the model. An illustrative example is
developed using both techniques.
The structure of the paper is as follows: Section 2
introduces the problem statement. Section 3 presents
the model driven control and the model driven
integrated learning. Section 4 presents the planning
by instruction while Section 5 presents the planning
by exploration. Section 6 uses an application case
study to illustrate the performance of the proposed
architecture and approaches. Finally, Section 7
summarizes the main conclusions and describes the
future line of research.
2 PROBLEM STATEMENT
In order to describe the learning techniques
mentioned above, it is necessary to explain the
underlying problem, which is the distributed control
problem that the LINKER architecture addresses.
This architecture is applied to a LSS.
In order to control an LSS in a distributed way,
some assumptions have to be made on its dynamics,
i.e. on the way the system behaves. Let us assume first
that the system can be decomposed into n sub-
systems, where each sub-system consists of a subset
of the system equations and the interconnections with
other sub-systems. The problem of determining the
partitions of the system is not addressed in this work.
The set of partitions should be complete. This means
that all system states and control variables should be
included at least in one of the partitions.
Definition 1. System partitions. P is the set of
system partitions and is defined by
Where each system partition (subsystem)
,
is described by a model. In this example, a
deterministic linear time-invariant (LTI) model is
used to represent a drinking water distribution
network; this type of model can also be used for other
type of LSS where there is a network of connected
nodes and an element that flows in the network that
should be distributed to fulfill certain demands. This
model is expressed in discrete-time as follows
The model describes the topology and dynamics
of the network. Variables x, y, u, d are the state,
output, input and disturbance vectors (for this case,
the demands) of appropriate dimensions,
respectively; A, B, C and D are the state, output, input
and direct matrices, respectively. Sub-indexes u and
d refer to the type of inputs the matrices model, either
control inputs or exogenous inputs (disturbances).
Control variables are classified as internal or shared
variables.
Reinforcement Learning Approach for Cooperative Control of Multi-Agent Systems
81