PLANNING STACKING OPERATIONS WITH AN UNKNOWN
NUMBER OF OBJECTS
Lluis Trilla and Guillem Aleny
`
a
Institut de Rob
`
otica i Inform
`
atica Industrial, CSIC-UPC, Llorens i Artigas 4-6, 08028 Barcelona, Spain
Keywords:
Stack planification, Symbolic POMDP, Time-of-flight camera.
Abstract:
A planning framework is proposed for the task of cleaning a table and stack an unknown number of objects
of different size on a tray. We propose to divide this problem in two, and combine two different planning
algorithms. One, plan hand motions in the Euclidean space to be able to move the hand in a noisy scenario
using a novel Time-of-Flight camera (ToF) to perform the perception of the environment. The other one,
chooses the strategy to effectively clean the table, considering the symbolic position of the objects, and also
its size for stacking considerations. Our formulation does not use information about the number of objects
available, and thus is general in this sense. Also, it can deal with different object sizes, planning adequately
to stack them. The special definition of the possible actions allows a simple and elegant way of characterizing
the problem, and is one of the key ingredients of the proposed solution. Some experiments are provided in
simulated and real scenarios that validate our approach.
1 INTRODUCTION
Algorithms for planning explicitly considering uncer-
tainty have been widely used in the field of mobile
robots (LaValle, 2004; Thrun et al., 2005), but are
less common in robotic-arm manipulation and grasp-
ing (Hsiao et al., 2007). In that scenario uncertainty
is especially important and should be carefully con-
sidered because contact takes place between the robot
and the world. In this interaction, the position of the
object and the robot in the world cannot be precisely
known, even more if we consider uncertainty in sen-
sors we use to sense this world.
In this paper we want to explore object grasping
and stacking tasks, as they are interesting and chal-
lenging skills (Kemp et al., 2007).In order to deal with
these problems the partially observable Markov deci-
sion process paradigm will be used, specifically the
discrete model based POMDP. It provides the capac-
ity of dealing with uncertainty in observations and ac-
tions, usually a robot will have an approximation of
reality when is sensing the environment and evaluat-
ing the results of the actions completed. POMDP have
been used before in the context arm motion control for
grasping, however perceptions used are simpler than
here, i.e. on/off signals from pressure sensors on the
fingers of the hand (Glashan et al., 2007).
The system uncertainty is modeled by measuring
fig. 22 Robot WAM agafant un got
L'entorn en el que es desplaça s'ha basat en les simulacions fetes situant un objecte a sobre d'una
plataforma com a objectiu i deixant la resta d'espai lliure, és a dir, sense obstacles que dificultin el
moviment del robot.
La càmera de temps de vol tipus SwissRanger ofereix la possibilitat de rebre en una sola imatge
informació en 3 dimensions, per a cada píxel de la imatge afegeix informació sobre la profunditat del
mateix donant com a resultat una superfície enlloc d'una fotografia plana.
Per a mesurar aquesta distància utilitza raigs infraroigs, aquests després de rebotar contra
l'entorn tornen a ser mesurats per la càmera. El desfasament d'ona que detecta dóna la informació sobre
la profunditat a què es troba l'objecte en qüestió, amb un límit màxim de 5 metres. Aquest tipus de
tecnologia dóna informació amb molt soroll i que es veu influenciat per les condicions de il·luminació
de l'entorn.
30
Figure 1: The robotic arm used in the experiments executing
the policy computed by the planner.
it in the real system and providing the values to the
system, so it can take into account the various diffi-
cult situations it could face according to the chosen
action. One interesting characteristic of our approach
is that two different POMDPs are combined and one
of them can control the other one and get feedback
from it. We will apply this approach to solve a real
situation: to clean a table and stack an unknown num-
ber of objects of different size on a tray.
The first POMDP will control robotic arm tra-
jectory to prepare the grasping task, planning in the
space state formed by the relative coordinates to the
target state. This approach is extensively reported
in (Trilla, 2009). Here will be briefly introduced. A
348
Trilla L. and Aleny
`
a G. (2010).
PLANNING STACKING OPERATIONS WITH AN UNKNOWN NUMBER OF OBJECTS.
In Proceedings of the 7th International Conference on Informatics in Control, Automation and Robotics, pages 348-353
Copyright
c
SciTePress
naive approach to avoid the POMDP complex mecha-
nism is a simple reactive algorithm. However, within
this approach is difficult to take into account the un-
certainty in the number of stacked objects and the
probability of stacks falling down.
The second POMDP will plan symbolically the
strategy of the cleaning task and the actions chosen
for the target of the first POMDP. The objective of the
second POMDP is either to completely clean the table
or fill the tray. The planification is symbolic because
it does not rely on the coordinates of the objects or its
interaction with the world, but on an abstraction layer.
Two main considerations are important. First, obser-
vations are partial: the number of objects on the table
is unknown, i.e. because of possible occlusions, and
some objects maybe are pre-stacked on the table and
this is difficult to observe. Second, the tray surface is
limited so it has to stack objects. Here the planning
has to deal with objects of different size.
Perception in grasping applications is gener-
ally performed using artificial vision to recognize
some object characteristics, and then plan a correct
grasp (Saxena et al., 2008). Here we will use a rel-
atively new sensor, a ToF (Time of Flight) camera.
This camera delivers 3D images at 25fps, potentially
allowing fast perception algorithms and, contrarily to
stereo systems, it does not rely on computing depth on
texture or other object surface characteristics. Depth
information will be used to identify the position of the
robot hand in the space, and to easily separate objects
from background.
This article is structured as follows. POMDP
background is introduced in Section 2. The planifi-
cation strategy is introduced in Section 3, and in par-
ticular the planification of the symbolic steps that are
involved in the container-content manipulation (Sec-
tion 3.1). In Section 4 some experiments are pre-
sented, validating our approach in a general stacking
case, in the case of different object sizes, and with
occlusions between objects. Finally, Section 5 is de-
voted to the conclusions and future work.
2 POMDP BACKGROUND
A POMDP models a sequence of events in discrete
states and time where the agent chooses actions to
perform. It is represented by the tuple (S, A, T, R, O)
where S is the finite set of states and A is a discrete set
of actions. The transition model T (s, a, s
0
) describes
the probability of a transition from a state s to s
0
when
the action a is performed. The reward model R(s, a)
defines the numeric reward given to the agent when it
executes an action a being in state s. Observation
model O(z, a, s) describes the probability of an obser-
vation z when the action a is performed, and the state
is s. A POMDP handles partially observable environ-
ments, there is only an indirect representation of the
state of the world. The belief state b is the probability
distribution over all states in the model. At each time
step the belief state is updated by Bayesian forward-
filtering.
A decision about which action is most applicable
is given by the policy function which contains the in-
formation about the best action to perform for any
possible belief distribution. The policy balances the
probabilities of a future sequence of events with the
expected accumulated reward which has to maximize.
Computing a policy is highly intractable with clas-
sic exact methods like value iteration or policy iter-
ation. However, some recent work has been devoted
to find approximated solutions (Hsu et al., 2007), as
point based value iteration (Hsu et al., 2008), discrete
Perseus or HSVI are quite fast and yield good results.
3 PLANNING AND EXECUTING
THE MANIPULATION
OPERATIONS
We divide the high level task of cleaning a table of
several objects in two different levels. First, it is
important to decide which object to first manipulate.
Then, the next issue is to know the exact position of
the object and effectively manipulate it. Here we will
present our development for the planning algorithm
which deals symbolically with the problem and per-
forms high level task.
The effective manipulation of the objects can be
solved by means of a classical control algorithms. Al-
ternatively, we have recently proposed to solve the
low level task defining also a POMDP in the space
of discretized hand positions (Trilla, 2009). With this
approach we are able to deal with robots with low pre-
cision or repeatability, and also with mobile robot ma-
nipulators that naturally are not exactly placed equally
in front of a table.
3.1 Planning the Strategy
We define the problem as follows. The working area
is divided in zones: one for the tray and the rest for
the table (see Fig 2). Each zone is divided in posi-
tions where the objects can be placed. Space state is
defined as the set (zo, p, sn, t), where zo (zone) and p
(position) identify the object location symbolically, sn
indicates the number of objects stacked in the same
PLANNING STACKING OPERATIONS WITH AN UNKNOWN NUMBER OF OBJECTS
349
purely by noise in the observations. Second, the solu-
tion to the problem implies stacking objects of differ-
ent size with some restrictions, i.e, a big object cannot
be stacked onto a small one.
Thanks to the proposed space state and action
space, the codification of the different positions and
sizes of the glasses lets the planner generate a pol-
icy able to deal with the container-contents problem
stacking the glasses in the best way, maximizing the
free space on the tray. The solution we propose to
be able to handle the unknown number of objects is
based on focus the attention on only one of the ob-
jects present in the each one of the different regions
on the table, and it has turned out to be particularly
adequate and powerful.
5.1 Future Work
Here we have considered that the objects can be al-
ready stacked on the table, and that the the robot can
perform stacking actions on the table. However this
condition has hardly raised in our experiments. One
challenge we are facing now is to incentive object
stacking actions on the table, before putting them in
the tray. A new variable is needed here to balance
the cost between two transportation actions and one
stacking plus a transportation operation. In the com-
putation of this cost the trajectory from the table to the
tray in the transportation action becomes important,
as we want to stack closer objects, or more important,
stack objects that are in the transportation trajectory
to the tray. Our formulation is general in the number
defined zones on the table, so we face this new condi-
tion as a natural extent of the presented algorithm.
We have considered to add an additional action
when observations are not enough to decide for one
action, i.e. in order to gather information about
the number of glasses stacked on each position. A
promising option is an asking action to an opera-
tor where the answer could modify the agent’s belief
state (Armstrong-Crews and Veloso, 2007).
ACKNOWLEDGEMENTS
This work has been partially supported by the Span-
ish Ministry of Science and Innovation under project
DPI2008-06022 and the Generalitat de Catalunya un-
der the consolidated Robotics Group. G. Aleny
`
a was
supported by the CSIC under a JAE-Doc Fellowship.
REFERENCES
Armstrong-Crews, N. and Veloso, M. (2007). Oracular par-
tially observable markov decision processes: A very
special case. In Proc. IEEE Int. Conf. Robot. Au-
tomat., Rome., pages 2477–2482.
Glashan, R., Hsiao, K., Kaelbling, L. P., and Lozano-P
´
erez,
T. (2007). Grasping POMDPs: Theory and experi-
ments. In RSS Workshop: manip. for human env.
Hsiao, K., Kaelbling, L. P., and Lozano-p’erez, T. (2007).
Grasping POMDPs. In Proc. IEEE Int. Conf. Robot.
Automat., Rome., pages 4685–4692.
Hsu, D., Lee, W., and Rong, N. (2007). What makes some
POMDP problems easy to approximate? In Advances
in Neural Information Processing Systems (NIPS).
Hsu, D., Lee, W., and Rong, N. (2008). A point-based
pomdp planner for target tracking. In Proc. IEEE Int.
Conf. Robot. Automat., Pasadena.
Kahlmann, T., Remondino, F., and Ingensand, H. (2006).
Calibration for increased accuracy of the range imag-
ing camera Swissranger
T M
. In ISPRS Commission V
Symposium, pp. 136–141, Dresden.
Kemp, C., Edsinger, A., and Torres-Jara, E. (2007). Chal-
lenges for robot manipulation in human environments.
IEEE Robot. Automat. Mag., 14(1):20–2.
Kolb, A., Barth, E., and Koch, R. (2008). ToF-sensors:
New dimensions for realism and interactivity. In Proc.
IEEE CVPR Workshops, vol. 1-3, pp. 1518–1523.
Kuehnle, J. U., Xue, Z., Stotz, M., Zoellner, J. M., Verl,
A., and Dillmann, R. (2008). Grasping in depth maps
of time-of-flight cameras. In Proc. Int. Workshop
Robotic Sensors Environments, pp. 132–137.
LaValle, S.M. (2004). Planning Algorithms. Cambridge UP
Lindner, M. and Kolb, A. (2006). Lateral and depth calibra-
tion of PMD-distance Sensors. In Proc. 2nd Int. Sym.
Visual Computing, vol. 4292, pp. 524–533.
Saxena, A., Driemeyer, J., and Ng, A. Y. (2008). Robotic
grasping of novel objects using vision. Int. J. Robot.
Res., 27:157–173.
Thrun, S., Burgard, W., and Fox, D. (2005). Probabilistic
Robotics. MIT Press, Cambridge.
Trilla, L. (2009). Planificaci
´
o de moviments en entorns amb
incertesa per a manipulaci
´
o d’objectes. Master’s the-
sis, Universitat Polit
`
ecnica de Catalunya.
PLANNING STACKING OPERATIONS WITH AN UNKNOWN NUMBER OF OBJECTS
353