ter sets can be chosen from simulation and executed
in real world. A longer test should be conducted to
verify that the selected set of parameters is robust to
small changes occurring over the lifetime of the setup.
7 SUMMARY
In this paper we proposed an iterative learning method
which differs from other methods by being able to
take into account part variations and process uncer-
tainty. This property is very important to find ac-
tion parameters which are guaranteed to succeed ev-
ery time in precision demanding assemblies as for the
Peg-in-Hole task discussed in this paper.
Experiments showed that our iterative learning
method is able to quickly reduce of the parameter
space using Kernel Density Estimation which takes
the neighbourhood region into account. This ap-
proach was faster than both neighbourhood indepen-
dent representation Wilson Score and simple Na
¨
ıve
Sampling. It was shown that KDE is able to find good
sample points much faster than a representation which
individually estimates the success probability of each
of the sample points.
In the conducted experiment, promising sets of pa-
rameters were found by the iterative learning method
using KDE through simulation. From the knowledge
obtained by the method a promising sample point was
selected for a real world Peg-in-Hole experiment. The
experiment was repeated 100 times with a success
rate of 100%. Moreover, real world experiments also
showed that the iterative learning method converged
successfully towards the promising region of the pa-
rameter space.
Future work will study the effect of different band-
width matrices which might speed up the iterative
learning method and potentially improve the quality
of the points found. The result when applying the it-
erative learning method with KDE showed both an
over- and underestimation of the regression values in
certain spots. These peak spots are probably an ef-
fect of a too narrow kernel. An adaptive kernel size
should also be investigated since the bandwidth ma-
trix is known to change with the number of samples.
ACKNOWLEDGEMENTS
This work was supported by The Danish Council for
Strategic Research through the CARMEN project.
REFERENCES
Agresti, A. and Coull, B. A. (1998). Approximate Is Better
than ”Exact” for Interval Estimation of Binomial Pro-
portions. The American Statistician, 52(2):119–126.
Auer, P., Cesa-Bianchi, N., and Fischer, P. (2002). Finite-
time analysis of the multiarmed bandit problem.
Mach. Learn., 47(2-3):235–256.
Bodenhagen, L., Fugl, A., Jordt, A., Willatzen, M., An-
dersen, K., Olsen, M., Koch, R., Petersen, H., and
Kruger, N. (2014). An adaptable robot vision system
performing manipulation actions with flexible objects.
Automation Science and Engineering, IEEE Transac-
tions on, 11(3):749–765.
Brochu, E., Cora, V. M., and de Freitas, N. (2010). A
tutorial on bayesian optimization of expensive cost
functions, with application to active user model-
ing and hierarchical reinforcement learning. CoRR,
abs/1012.2599.
Buch, J., Laursen, J., Sørensen, L., Ellekilde, L.-P., Kraft,
D., Schultz, U., and Petersen, H. (2014). Apply-
ing simulation and a domain-specific language for an
adaptive action library. In Simulation, Modeling, and
Programming for Autonomous Robots, pages 86–97.
Springer International Publishing.
Deisenroth, M. P., Neumann, G., and Peters, J. (2011). A
survey on policy search for robotics. Foundations and
Trends in Robotics, 2(1–2):1–142.
Detry, R., Kraft, D., Kroemer, O., Bodenhagen, L., Peters,
J., Kr
¨
uger, N., and Piater, J. (2011). Learning grasp
affordance densities. Paladyn, 2(1):1–17.
EU Robotics aisbl (2014). Robotics 2020 multi-annual
roadmap for robotics in europe.
Even-Dar, E., Mannor, S., and Mansour, Y. (2006). Ac-
tion elimination and stopping conditions for the multi-
armed bandit and reinforcement learning problems.
The Journal of Machine Learning Research, 7:1079–
1105.
Gams, A., Petric, T., Nemec, B., and Ude, A. (2014). Learn-
ing and adaptation of periodic motion primitives based
on force feedback and human coaching interaction. In
Humanoid Robots (Humanoids), 2014 14th IEEE-RAS
International Conference on, pages 166–171.
H
¨
ardle, W., Werwatz, A., M
¨
uller, M., and Sperlich, S.
(2004). Nonparametric and semiparametric models.
Springer Berlin Heidelberg.
Heidrich-Meisner, V. and Igel, C. (2009). Hoeffding and
bernstein races for selecting policies in evolutionary
direct policy search. In Proceedings of the 26th An-
nual International Conference on Machine Learning,
ICML ’09, pages 401–408. ACM.
Ijspeert, A. J., Nakanishi, J., Hoffmann, H., Pastor, P., and
Schaal, S. (2012). Dynamical movement primitives:
Learning attractor models for motor behaviors. Neural
Computation, 25(2):328–373.
Jørgensen, T. B., Debrabant, K., and Kr
¨
uger, N. (2016).
Robust optimizing of robotic pick and place opera-
tions for deformable objects through simulation. In
Robotics and Automation (ICRA), 2016 IEEE Inter-
national Conference on. (accepted).
ICINCO 2016 - 13th International Conference on Informatics in Control, Automation and Robotics
176