2.4 The Task Management Agent
The task management agent (TMA) decides which car
transports each passenger. Since the policy of TMA
greatly influences E, the designer of an agent system
should design the policy carefully.
One of traditional methods to design the optimal
policy is reinforcement learning. We can construct
the percept spaces as the combination of situations of
cars and passengers. However, it is difficult to acquire
the optimal policy by reinforcement learning, since
the percept space becomes huge. Another method to
design the optimal policy is to use a BDI reasoning
engine. However, it is difficult for the designer to de-
fine inference rules for efficient transportation. That
is, it is difficult to design the optimal policy when we
use either reinforcement learning or a BDI reasoning
engine.
Fortunately, a designer often has some rough
strategies for efficient transportation, such as “the
nearest car from a passenger should transport him.”
In our implementation, we collect a couple of rough
strategies. Then, TMA is designed as a learning
agent, where TMA learns which strategy is useful for
a current situation. The details of learning mechanism
are discussed in Section 3.
2.5 The Car Agents
We designed car agents (CAs) as rational agents,
where each CA can control one car. Five kinds of
plans are implemented for CAs. The list of plans is
shown below.
Go Plan. CA moves its car to a source or destination
floor of passengers.
Call Plan. CA evaluates the nearest source floor of
passengers waiting in an elevator hall.
Board Plan. Passengers take the car.
Transport Plan. CA evaluates the nearest destina-
tion floor of passengers being on board.
Get off Plan. Passengers get off the car.
CAs control their cars by switching these plans.
The Jadex BDI reasoning engine is used for selecting
a plan. Here, we define four inference rules (R1)–
(R4). i, j, k and m indicate passengers. call(i) is a
predicate indicating that the source floor of the pas-
senger i is the nearest from the car. transport(i) in-
dicates that the destination floor of i is the nearest
from the car. board(i) indicates that the car is stop-
ping at the source floor of i and i can board the car.
get o f f (i) indicates that the car is stopping at the des-
tination floor of i. BEL(X) indicates that CA believes
X is true. GOAL(X) indicates that CA has a goal to
make X true. U is the tense operator “until”.
(R1) BEL(call(i)) ⊃
GOAL(call(i)) U (GOAL(transport( j)) ∨
GOAL(board(k)) ∨ GOAL(get o f f (m))
(R2) BEL(transport(i)) ⊃
GOAL(transport(i)) U (GOAL(board( j)) ∨
GOAL(get o f f (k)))
(R3) BEL(board(i)) ⊃
GOAL(board(i)) U GOAL(get o f f ( j))
(R4) BEL(get o f f (i)) ⊃ GOAL(get o f f (i))
These inference rules give the first priority to pas-
sengers who get off the car, the second priority to pas-
sengers who board the car, and the third priority to
passengers who are on board for transported by the
car. The passengers waiting in an elevator hall are
given the lowest priority. When source floors of some
passengers are on the way of the car, however, the car
stops at the floors for the passengers exceptionally.
3 COOPERATIVE
REINFORCEMENT LEARNING
When a passenger resisters his destination floor in the
controller, TMA selects CA that transports the pas-
senger. Then, CA infers a schedule to transport the
passenger. That is, the elevator group control prob-
lem is considered as the problem to find the optimal
policy to assign cars to passengers for minimizing E.
In order to find the optimal policy, we should con-
sider huge kinds of situations. When we try to obtain
the policy by reinforcement learning only by TMA, it
might be difficult to find the optimal policy efficiently.
When we use a BDI reasoning engine for TMA, it is
difficult for the designer to give inference rules induc-
ing the optimal policy. Thus, we introduce coopera-
tive learning of TMA and CAs.
3.1 Framework of Cooperative
Learning
Figure 2 shows the framework of cooperative learn-
ing. The designer of the agent circumstance often has
some rough strategies for efficient transportation. In
our framework, TMA learns which strategy is useful
for a current situation. Reinforcement learning is used
for acquiring the policy of TMA.
When a passenger is created by EA, TMA evalu-
ates the value of each strategy. Each CA also evalu-
ates the value of assigning itself to the passenger on
the basis of reinforcement learning. By using values
ICAART 2010 - 2nd International Conference on Agents and Artificial Intelligence
174