MODEL-FREE LEARNING FROM DEMONSTRATION

Erik A. Billing, Thomas Hellström, Lars-Erik Janlert

2010

Abstract

A novel robot learning algorithm called Predictive Sequence Learning (PSL) is presented and evaluated. PSL is a model-free prediction algorithm inspired by the dynamic temporal difference algorithm S-Learning. While S-Learning has previously been applied as a reinforcement learning algorithm for robots, PSL is here applied to a Learning from Demonstration problem. The proposed algorithm is evaluated on four tasks using a Khepera II robot. PSL builds a model from demonstrated data which is used to repeat the demonstrated behavior. After training, PSL can control the robot by continually predicting the next action, based on the sequence of passed sensor and motor events. PSL was able to successfully learn and repeat the first three (elementary) tasks, but it was unable to successfully repeat the fourth (composed) behavior. The results indicate that PSL is suitable for learning problems up to a certain complexity, while higher level coordination is required for learning more complex behaviors.

References

  1. Arkin, R. C. (1998). Behaviour-Based Robotics. MIT Press.
  2. Begleiter, R. and Yona, G. (2004). On prediction using variable order markov models. Journal of Artificial Intelligence Research, 22:385-421.
  3. Billard, A., Calinon, S., Dillmann, R., and Schaal, S. (2008). Robot programming by demonstration. In Siciliano, B. and Khatib, O., editors, Handbook of Robotics. Springer.
  4. Billard, A., Epars, Y., Cheng, G., and Schaal, S. (2003). Discovering imitation strategies through categorization of multi-dimensional data. In Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, volume 3, pages 2398-2403 vol.3.
  5. Billard, A. and Mataric, M. J. (2001). Learning human arm movements by imitation:: Evaluation of a biologically inspired connectionist architecture. Robotics and Autonomous Systems, 37(2-3):145-160.
  6. Billing, E. A. (2007). Representing behavior - distributed theories in a context of robotics. Technical report, UMINF 0725, Department of Computing Science, Ume University.
  7. Billing, E. A. (2009). Cognition http://www.cognitionreversed.com.
  8. Billing, E. A. and Hellstrm, T. (2008a). Behavior recognition for segmentation of demonstrated tasks. In IEEE SMC International Conference on Distributed Human-Machine Systems, pages 228 - 234, Athens, Greece.
  9. Billing, E. A. and Hellstrm, T. (2008b). Formalising learning from demonstration. Technical report, UMINF 0810, Department of Computing Science, Ume University.
  10. Brooks, R. A. (1986). A robust layered control system for a mobile robot. In IEEE Journal of Robotics and Automation RA-2, volume 1, pages 14-23.
  11. Brooks, R. A. (1990). Elephants don't play chess. Robotics and Autonomous Systems, 6:3-15.
  12. Brooks, R. A. (1991a). Intelligence without reason. Proceedings, 1991 Int. Joint Conf. on Artificial Intelligence, pages 569-595.
  13. Brooks, R. A. (1991b). New approaches to robotics. Science, 253(13):1227-1232.
  14. Calinon, S., Guenter, F., and Billard, A. (2007). On learning, representing and generalizing a task in a humanoid robot. IEEE Transactions on Systems, Man and Cybernetics, Part B. Special issue on robot learning by observation, demonstration and imitation, 37(2):286-298.
  15. Delson, N. and West, H. (1994). Robot programming by human demonstration: The use of human inconsistency in improving 3D robot trajectories. In Proceedings of the IEEE/RSJ/GI International Conference on Intelligent Robots and Systems 7894. Advanced Robotic Systems and the Real World, IROS 7894., volume 2, pages 1248-1255, Munich, Germany.
  16. Demiris, J. and Hayes, G. R. (2002). Imitation as a dual-route process featuring predictive and learning components: a biologically plausible computational model. In Imitation in animals and artifacts, pages 327-361. MIT Press.
  17. Demiris, Y. and Johnson, M. (2003). Distributed, predictive perception of actions: a biologically inspired robotics architecture for imitation and learning. Connection Science, 15(4):231-243.
  18. Demiris, Y. and Simmons, G. (2006). Perceiving the unusual: Temporal properties of hierarchical motor representations for action perception. Neural Networks, 19(3):272-284.
  19. Feder, M. and Merhav, N. (1994). Relations between entropy and error probability. IEEE Transactions on Information Theory, 40(1):259-266.
  20. Friston, K. J. (2003). Learning and inference in the brain. Neural Networks: The Official Journal of the International Neural Network Society, 16(9):1325-52. PMID: 14622888.
  21. George, D. (2008). How the Brain might work: A Hierarchical and Temporal Model for Learning and Recognition. PhD thesis, Stanford University, Department of Electrical Engineering.
  22. George, D. and Hawkins, J. (2005). A hierarchical bayesian model of invariant pattern recognition in the visual cortex. In Neural Networks, 2005. IJCNN 7805. Proceedings. 2005 IEEE International Joint Conference on, volume 3, pages 1812-1817 vol. 3.
  23. Guenter, F., Hersch, M., Calinon, S., and Billard, A. (2007). Reinforcement learning for imitating constrained reaching movements. RSJ Advanced Robotics, Special Issue on Imitative Robots, 21(13):1521-1544.
  24. Haruno, M., Wolpert, D. M., and Kawato, M. (2003). Hierarchical MOSAIC for movement generation. In International Congress Series 1250, pages 575- 590. Elsevier Science B.V.
  25. Haruno, M., Wolpert, D. M., and Kawato, M. M. (2001). MOSAIC model for sensorimotor learning and control. Neural Comput., 13(10):2201-2220.
  26. Hawkins, J. and Blakeslee, S. (2002). Times Books.
  27. Jordan, M. and Rumelhart, D. (1992). Forward models: Supervised learning with a distal teacher. Cognitive Science: A Multidisciplinary Journal, 16(3):354, 307.
  28. K-Team (2007). Khepera robot. http://www.k-team.com.
  29. Kawato, M., Furukawa, K., and Suzuki, R. (1987). A hierarchical neural-network model for control and learning of voluntary movement. Biological Cybernetics, 57(3):169-185. PMID: 3676355.
  30. Lee, T. and Mumford, D. (2003). Hierarchical bayesian inference in the visual cortex. J Opt Soc Am A Opt Image Sci Vis, 20(7):1448, 1434.
  31. Miall, R. C. and Wolpert, D. M. (1996). Forward models for physiological motor control. Neural Netw., 9(8):1265-1279.
  32. Nehaniv, C. L. and Dautenhahn, K. (2000). Of hummingbirds and helicopters: An algebraic framework for interdisciplinary studies of imitation and its applications. In Demiris, J. and Birk, A., editors, Learning Robots: An Interdisciplinary Approach, volume 24, pages 136-161. World Scientific Press.
  33. Pfeifer, R. and Scheier, C. (1997). Sensory-motor coordination: the metaphor and beyond. Robotics and Autonomous Systems, 20(2):157-178.
  34. Pfeifer, R. and Scheier, C. (2001). Understanding Intelligence. MIT Press. Cambrage, Massachusetts.
  35. Poggio, T. and Bizzi, E. (2004). Generalization in vision and motor control. Nature, 431(7010):768-774.
  36. Riesenhuber, M. and Poggio, T. (1999). Hierarchical models of object recognition in cortex. Nature Neuroscience, 2(11):1019-25. PMID: 10526343.
  37. Rohrer, B. (2007). S-Learning: a biomimetic algorithm for learning, memory, and control in robots. In CNE apos;07. 3rd International IEEE/EMBS Conference on Natural Engineering, pages 148 - 151, Kohala Coast, Hawaii.
  38. Rohrer, B. (2009). S-learning: A model-free, case-based algorithm for robot learning and control. In Eighth International Conference on Case-Based Reasoning, Seattle Washington.
  39. Rohrer, B., Bernard, M., Morrow, J. D., Rothganger, F., and Xavier, P. (2009). Model-free learning and control in a mobile robot. In Fifth International Conference on Natural Computation, Tianjin, China.
  40. Rohrer, B. and Hulet, S. (2006a). BECCA - a brain emulating cognition and control architecture. Technical report, Cybernetic Systems Integration Department, Rohrer, B. and Hulet, S. (2006b). A learning and control approach based on the human neuromotor system. In Proceedings of Biomedical Robotics and Biomechatronics, BioRob.
  41. Schaal, S., Ijspeert, A., and Billard, A. (2003). Computational approaches to motor learning by imitation. Philosophical Transactions of the Royal Society B: Biological Sciences, 358(1431):537-547. PMC1693137.
  42. Simon, H. A. (1969). The Sciences of the Artificial. MIT Press, Cambridge, Massachusetts.
  43. Wolpert, D. M. (2003). A unifying computational framework for motor control and social interaction. Phil. Trans. R. Soc. Lond., B(358):593-602.
  44. Wolpert, D. M. and Flanagan, J. R. (2001). Motor prediction. Current Biology: CB, 11(18):729-732.
  45. Wolpert, D. M. and Ghahramani, Z. (2000). Computational principles of movement neuroscience. Nature Neuroscience, 3:1212-1217.
  46. Ziv, J. and Lempel, A. (1978). Compression of individual sequences via variable-rate coding. IEEE Transactions on Information Theory, 24(5):530-536.
Download


Paper Citation


in Harvard Style

A. Billing E., Hellström T. and Janlert L. (2010). MODEL-FREE LEARNING FROM DEMONSTRATION . In Proceedings of the 2nd International Conference on Agents and Artificial Intelligence - Volume 2: ICAART, ISBN 978-989-674-022-1, pages 62-71. DOI: 10.5220/0002729500620071


in Bibtex Style

@conference{icaart10,
author={Erik A. Billing and Thomas Hellström and Lars-Erik Janlert},
title={MODEL-FREE LEARNING FROM DEMONSTRATION},
booktitle={Proceedings of the 2nd International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,},
year={2010},
pages={62-71},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002729500620071},
isbn={978-989-674-022-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 2nd International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,
TI - MODEL-FREE LEARNING FROM DEMONSTRATION
SN - 978-989-674-022-1
AU - A. Billing E.
AU - Hellström T.
AU - Janlert L.
PY - 2010
SP - 62
EP - 71
DO - 10.5220/0002729500620071