Authors:
Valeria Javalera-Rincon
1
;
Vicenc Puig Cayuela
2
;
Bernardo Morcego Seix
2
and
Fernando Orduña-Cabrera
1
Affiliations:
1
Advanced Systems Analisys and Ecosystem Services and Management Programs, International Institute for Applied Systems Analysis, Schlossplatz 1, A-2361, Laxenburg and Austria
;
2
Advanced Control Systems Group, Universitat Politècnica de Catalunya (UPC), Rambla Sant Nebridi, 10, 08222 Terrassa and Spain
Keyword(s):
Distributed Control, Intelligent Agents, Reinforcement Learning, Cooperative Agents.
Related
Ontology
Subjects/Areas/Topics:
Agent Models and Architectures
;
Agents
;
Artificial Intelligence
;
Computational Intelligence
;
Cooperation and Coordination
;
Distributed Problem Solving
;
Evolutionary Computing
;
Knowledge Discovery and Information Retrieval
;
Knowledge-Based Systems
;
Machine Learning
;
Soft Computing
;
Symbolic Systems
Abstract:
Reinforcement Learning (RL) systems are trial-and-error learners. This feature altogether with delayed reward, makes RL flexible, powerful and widely accepted. However, RL could not be suitable for control of critical systems where the learning of the control actions by trial and error is not an option. In the RL literature, the use of simulated experience generated by a model is called planning. In this paper, the planningByInstruction and planningByExploration techniques are introduced, implemented and compared to coordinate, a heterogeneous multi-agent architecture for distributed Large Scale Systems (LSS). This architecture was proposed by (Javalera 2016). The models used in this approach are part of a distributed architecture of agents. These models are used to simulate the behavior of the system when some coordinated actions are applied. This experience is learned by the so-called, LINKER agents, during an off-line training. An exploitation algorithm is used online, to coordina
te and optimize the value of overlapping control variables of the agents in the distributed architecture in a cooperative way. This paper also presents a technique that offers a solution to the problem of the number of learning steps required to converge toward an optimal (or can be sub-optimal) policy for distributed control systems. An example is used to illustrate the proposed approach, showing exciting and promising results regarding the applicability to real systems.
(More)