PLANNING STACKING OPERATIONS WITH AN UNKNOWN

NUMBER OF OBJECTS

Lluis Trilla and Guillem Aleny

Institut de Rob

otica i Inform

atica Industrial, CSIC-UPC, Llorens i Artigas 4-6, 08028 Barcelona, Spain

Keywords:

Stack planiﬁcation, Symbolic POMDP, Time-of-ﬂight camera.

Abstract:

A planning framework is proposed for the task of cleaning a table and stack an unknown number of objects

of different size on a tray. We propose to divide this problem in two, and combine two different planning

algorithms. One, plan hand motions in the Euclidean space to be able to move the hand in a noisy scenario

using a novel Time-of-Flight camera (ToF) to perform the perception of the environment. The other one,

chooses the strategy to effectively clean the table, considering the symbolic position of the objects, and also

its size for stacking considerations. Our formulation does not use information about the number of objects

available, and thus is general in this sense. Also, it can deal with different object sizes, planning adequately

to stack them. The special deﬁnition of the possible actions allows a simple and elegant way of characterizing

the problem, and is one of the key ingredients of the proposed solution. Some experiments are provided in

simulated and real scenarios that validate our approach.

1 INTRODUCTION

Algorithms for planning explicitly considering uncer-

tainty have been widely used in the ﬁeld of mobile

robots (LaValle, 2004; Thrun et al., 2005), but are

less common in robotic-arm manipulation and grasp-

ing (Hsiao et al., 2007). In that scenario uncertainty

is especially important and should be carefully con-

sidered because contact takes place between the robot

and the world. In this interaction, the position of the

object and the robot in the world cannot be precisely

known, even more if we consider uncertainty in sen-

sors we use to sense this world.

In this paper we want to explore object grasping

and stacking tasks, as they are interesting and chal-

lenging skills (Kemp et al., 2007).In order to deal with

these problems the partially observable Markov deci-

sion process paradigm will be used, speciﬁcally the

discrete model based POMDP. It provides the capac-

ity of dealing with uncertainty in observations and ac-

tions, usually a robot will have an approximation of

reality when is sensing the environment and evaluat-

ing the results of the actions completed. POMDP have

been used before in the context arm motion control for

grasping, however perceptions used are simpler than

here, i.e. on/off signals from pressure sensors on the

ﬁngers of the hand (Glashan et al., 2007).

The system uncertainty is modeled by measuring

fig. 22 Robot WAM agafant un got

L'entorn en el que es desplaça s'ha basat en les simulacions fetes situant un objecte a sobre d'una

plataforma com a objectiu i deixant la resta d'espai lliure, és a dir, sense obstacles que dificultin el

moviment del robot.

La càmera de temps de vol tipus SwissRanger ofereix la possibilitat de rebre en una sola imatge

informació en 3 dimensions, per a cada píxel de la imatge afegeix informació sobre la profunditat del

mateix donant com a resultat una superfície enlloc d'una fotografia plana.

Per a mesurar aquesta distància utilitza raigs infraroigs, aquests després de rebotar contra

l'entorn tornen a ser mesurats per la càmera. El desfasament d'ona que detecta dóna la informació sobre

la profunditat a què es troba l'objecte en qüestió, amb un límit màxim de 5 metres. Aquest tipus de

tecnologia dóna informació amb molt soroll i que es veu influenciat per les condicions de il·luminació

de l'entorn.

Figure 1: The robotic arm used in the experiments executing

the policy computed by the planner.

it in the real system and providing the values to the

system, so it can take into account the various difﬁ-

cult situations it could face according to the chosen

action. One interesting characteristic of our approach

is that two different POMDPs are combined and one

of them can control the other one and get feedback

from it. We will apply this approach to solve a real

situation: to clean a table and stack an unknown num-

ber of objects of different size on a tray.

The ﬁrst POMDP will control robotic arm tra-

jectory to prepare the grasping task, planning in the

space state formed by the relative coordinates to the

target state. This approach is extensively reported

in (Trilla, 2009). Here will be brieﬂy introduced. A

348

Trilla L. and Aleny

a G. (2010).

PLANNING STACKING OPERATIONS WITH AN UNKNOWN NUMBER OF OBJECTS.

In Proceedings of the 7th International Conference on Informatics in Control, Automation and Robotics, pages 348-353

 SciTePress

naive approach to avoid the POMDP complex mecha-

nism is a simple reactive algorithm. However, within

this approach is difﬁcult to take into account the un-

certainty in the number of stacked objects and the

probability of stacks falling down.

The second POMDP will plan symbolically the

strategy of the cleaning task and the actions chosen

for the target of the ﬁrst POMDP. The objective of the

second POMDP is either to completely clean the table

or ﬁll the tray. The planiﬁcation is symbolic because

it does not rely on the coordinates of the objects or its

interaction with the world, but on an abstraction layer.

Two main considerations are important. First, obser-

vations are partial: the number of objects on the table

is unknown, i.e. because of possible occlusions, and

some objects maybe are pre-stacked on the table and

this is difﬁcult to observe. Second, the tray surface is

limited so it has to stack objects. Here the planning

has to deal with objects of different size.

Perception in grasping applications is gener-

ally performed using artiﬁcial vision to recognize

some object characteristics, and then plan a correct

grasp (Saxena et al., 2008). Here we will use a rel-

atively new sensor, a ToF (Time of Flight) camera.

This camera delivers 3D images at 25fps, potentially

allowing fast perception algorithms and, contrarily to

stereo systems, it does not rely on computing depth on

texture or other object surface characteristics. Depth

information will be used to identify the position of the

robot hand in the space, and to easily separate objects

from background.

This article is structured as follows. POMDP

background is introduced in Section 2. The planiﬁ-

cation strategy is introduced in Section 3, and in par-

ticular the planiﬁcation of the symbolic steps that are

involved in the container-content manipulation (Sec-

tion 3.1). In Section 4 some experiments are pre-

sented, validating our approach in a general stacking

case, in the case of different object sizes, and with

occlusions between objects. Finally, Section 5 is de-

voted to the conclusions and future work.

2 POMDP BACKGROUND

A POMDP models a sequence of events in discrete

states and time where the agent chooses actions to

perform. It is represented by the tuple (S, A, T, R, O)

where S is the ﬁnite set of states and A is a discrete set

of actions. The transition model T (s, a, s

) describes

the probability of a transition from a state s to s

when

the action a is performed. The reward model R(s, a)

deﬁnes the numeric reward given to the agent when it

executes an action a being in state s. Observation

model O(z, a, s) describes the probability of an obser-

vation z when the action a is performed, and the state

is s. A POMDP handles partially observable environ-

ments, there is only an indirect representation of the

state of the world. The belief state b is the probability

distribution over all states in the model. At each time

step the belief state is updated by Bayesian forward-

ﬁltering.

A decision about which action is most applicable

is given by the policy function which contains the in-

formation about the best action to perform for any

possible belief distribution. The policy balances the

probabilities of a future sequence of events with the

expected accumulated reward which has to maximize.

Computing a policy is highly intractable with clas-

sic exact methods like value iteration or policy iter-

ation. However, some recent work has been devoted

to ﬁnd approximated solutions (Hsu et al., 2007), as

point based value iteration (Hsu et al., 2008), discrete

Perseus or HSVI are quite fast and yield good results.

3 PLANNING AND EXECUTING

THE MANIPULATION

OPERATIONS

We divide the high level task of cleaning a table of

several objects in two different levels. First, it is

important to decide which object to ﬁrst manipulate.

Then, the next issue is to know the exact position of

the object and effectively manipulate it. Here we will

present our development for the planning algorithm

which deals symbolically with the problem and per-

forms high level task.

The effective manipulation of the objects can be

solved by means of a classical control algorithms. Al-

ternatively, we have recently proposed to solve the

low level task deﬁning also a POMDP in the space

of discretized hand positions (Trilla, 2009). With this

approach we are able to deal with robots with low pre-

cision or repeatability, and also with mobile robot ma-

nipulators that naturally are not exactly placed equally

in front of a table.

3.1 Planning the Strategy

We deﬁne the problem as follows. The working area

is divided in zones: one for the tray and the rest for

the table (see Fig 2). Each zone is divided in posi-

tions where the objects can be placed. Space state is

deﬁned as the set (zo, p, sn, t), where zo (zone) and p

(position) identify the object location symbolically, sn

indicates the number of objects stacked in the same

PLANNING STACKING OPERATIONS WITH AN UNKNOWN NUMBER OF OBJECTS

349

purely by noise in the observations. Second, the solu-

tion to the problem implies stacking objects of differ-

ent size with some restrictions, i.e, a big object cannot

be stacked onto a small one.

Thanks to the proposed space state and action

space, the codiﬁcation of the different positions and

sizes of the glasses lets the planner generate a pol-

icy able to deal with the container-contents problem

stacking the glasses in the best way, maximizing the

free space on the tray. The solution we propose to

be able to handle the unknown number of objects is

based on focus the attention on only one of the ob-

jects present in the each one of the different regions

on the table, and it has turned out to be particularly

adequate and powerful.

5.1 Future Work

Here we have considered that the objects can be al-

ready stacked on the table, and that the the robot can

perform stacking actions on the table. However this

condition has hardly raised in our experiments. One

challenge we are facing now is to incentive object

stacking actions on the table, before putting them in

the tray. A new variable is needed here to balance

the cost between two transportation actions and one

stacking plus a transportation operation. In the com-

putation of this cost the trajectory from the table to the

tray in the transportation action becomes important,

as we want to stack closer objects, or more important,

stack objects that are in the transportation trajectory

to the tray. Our formulation is general in the number

deﬁned zones on the table, so we face this new condi-

tion as a natural extent of the presented algorithm.

We have considered to add an additional action

when observations are not enough to decide for one

action, i.e. in order to gather information about

the number of glasses stacked on each position. A

promising option is an asking action to an opera-

tor where the answer could modify the agent’s belief

state (Armstrong-Crews and Veloso, 2007).

ACKNOWLEDGEMENTS

This work has been partially supported by the Span-

ish Ministry of Science and Innovation under project

DPI2008-06022 and the Generalitat de Catalunya un-

der the consolidated Robotics Group. G. Aleny

a was

supported by the CSIC under a JAE-Doc Fellowship.

REFERENCES

Armstrong-Crews, N. and Veloso, M. (2007). Oracular par-

tially observable markov decision processes: A very

special case. In Proc. IEEE Int. Conf. Robot. Au-

tomat., Rome., pages 2477–2482.

Glashan, R., Hsiao, K., Kaelbling, L. P., and Lozano-P

erez,

T. (2007). Grasping POMDPs: Theory and experi-

ments. In RSS Workshop: manip. for human env.

Hsiao, K., Kaelbling, L. P., and Lozano-p’erez, T. (2007).

Grasping POMDPs. In Proc. IEEE Int. Conf. Robot.

Automat., Rome., pages 4685–4692.

Hsu, D., Lee, W., and Rong, N. (2007). What makes some

POMDP problems easy to approximate? In Advances

in Neural Information Processing Systems (NIPS).

Hsu, D., Lee, W., and Rong, N. (2008). A point-based

pomdp planner for target tracking. In Proc. IEEE Int.

Conf. Robot. Automat., Pasadena.

Kahlmann, T., Remondino, F., and Ingensand, H. (2006).

Calibration for increased accuracy of the range imag-

ing camera Swissranger

T M

. In ISPRS Commission V

Symposium, pp. 136–141, Dresden.

Kemp, C., Edsinger, A., and Torres-Jara, E. (2007). Chal-

lenges for robot manipulation in human environments.

IEEE Robot. Automat. Mag., 14(1):20–2.

Kolb, A., Barth, E., and Koch, R. (2008). ToF-sensors:

New dimensions for realism and interactivity. In Proc.

IEEE CVPR Workshops, vol. 1-3, pp. 1518–1523.

Kuehnle, J. U., Xue, Z., Stotz, M., Zoellner, J. M., Verl,

A., and Dillmann, R. (2008). Grasping in depth maps

of time-of-ﬂight cameras. In Proc. Int. Workshop

Robotic Sensors Environments, pp. 132–137.

LaValle, S.M. (2004). Planning Algorithms. Cambridge UP

Lindner, M. and Kolb, A. (2006). Lateral and depth calibra-

tion of PMD-distance Sensors. In Proc. 2nd Int. Sym.

Visual Computing, vol. 4292, pp. 524–533.

Saxena, A., Driemeyer, J., and Ng, A. Y. (2008). Robotic

grasping of novel objects using vision. Int. J. Robot.

Res., 27:157–173.

Thrun, S., Burgard, W., and Fox, D. (2005). Probabilistic

Robotics. MIT Press, Cambridge.

Trilla, L. (2009). Planiﬁcaci

o de moviments en entorns amb

incertesa per a manipulaci

o d’objectes. Master’s the-

sis, Universitat Polit

ecnica de Catalunya.

PLANNING STACKING OPERATIONS WITH AN UNKNOWN NUMBER OF OBJECTS

353