table is a look-up table that stores Q-values. The input
of the Q-table is the plant state and the output is a Q-
value corresponding to the input. A movement is se-
lected with a certain probability that is calculated by
using the magnitude of Q-values. Then, the Q-value
corresponding to the selected movement is updated
based on the result of the movement. The optimal
pattern of container movements can be obtained by
selecting the movement that has the largest Q-value
at each state-movement pair, when Q-values reflect
the number of container movements to achieve the
desired layout. However, conventional Q-table has
to store evaluation-values for all the state-movement
pairs. Therefore, the conventional reinforcement
learning method, Q-learning, has great difficulties for
solving the marshaling problem, due to its huge num-
ber of learning iterations and states required to obtain
admissible operation of containers (Baum, 1999). Re-
cently, a Q-learning method that can generate mar-
shaling plan has been proposed (Hirashima et al.,
1999). Although these methods were effective several
cases, the desired layout was not achievable for every
trial so that the early-phase performances of learning
process can be degraded.
In this paper, a new reinforcement learning system
to generate a marshaling plan is proposed. The learn-
ing process in the proposed method is consisted of
two stages:
1
determination of rearrangement order,
2
selection of destination for removal containers.
Learning algorithms in these stages are independent
to each other and Q-values in one stage are referred
from the other stage. That is, Q-values are discounted
according to the number of container movement and
Q-table for rearrangement is constructed by using Q-
values for movements of container, so that Q-values
reflect the total number of container movements re-
quired to obtain a desired layout. Moreover, in the
end of stage
1
, selected container is rearranged into
the desired position so that every trial can achieve the
desired layout. In addition, in the proposed method,
each container has several desired positions in the fi-
nal layout, and the feature is considered in the learn-
ing algorithm. Thus, the early-phase performances of
the learning process can be improved. Finally, effec-
tiveness of the proposed method is shown by com-
puter simulations for several cases.
2 PROBLEM DESCRIPTION
Fig.1 shows an example of container yard terminal.
The terminal consists of containers, yard areas, yard
transfer cranes, auto-guided vehicles, and port crane.
Containers are carried by trucks and each container is
Container terminal
Port crane
Yard transfer crane
Ship
Container
Yard area
Figure 1: Container terminal.
stacked in a corresponding area called bay and a set of
bays constitutes a yard area. Each bay has n
y
stacks
that m
y
containers can be laden, the number of con-
tainers in a bay is k, and the number of bays depends
on the number of containers. Each container is recog-
nized by an unique name c
i
(i = 1, ··· , k). A position
of each container is discriminated by using discrete
position numbers, 1, ··· , n
y
· m
y
. Then, the position
of the container c
i
is described by x
i
(1 ≤ i ≤ k, 1 ≤
x
i
≤ m
y
· n
y
), and the state of a bay is determined by
the vector, x = [x
1
, · ·· , x
k
].
2.1 Grouping
The desired layout in a bay is generated based on the
loading order of containers that are moved from the
bay to a ship. In this case, the container to be loaded
into the ship can be anywhere in the bay if it is on top
of a stack. This feature yields several desired layouts
for the bay. In the addressed problem, when contain-
ers on different stacks are placed at the same height in
the bay, it is assumed that the positions of such con-
tainers can be exchanged. Fig.2 shows an example
of desired layouts, where m
y
= n
y
= 3, k = 9. In the
figure, containers are loaded in the ship in the descen-
dent order. Then, containers c
7
, c
8
, c
9
are in the same
group (Group1), and their positions are exchanged be-
cause the loading order can be kept unchanged after
the exchange of positions. In the same way, c
4
, c
5
, c
6
are in the Group2, and c
1
, c
2
, c
3
are in the Group3
where positions of containers can be exchanged. Con-
sequently several candidates for desired layout of the
bay are generated from the original desired-layout.
In addition to the grouping explained above, a
“heap shaped group” for n
y
containers at the top of
stacks in original the desired-layout (group 1) is gen-
erated as follows:
1. n
y
containers in group 1 can be placed at any
AN INTELLIGENT MARSHALING PLAN BASED ON MULTI-POSITIONAL DESIRED LAYOUT IN CONTAINER
YARD TERMINALS
235