LEARNING BY EXAMPLE - Reinforcement Learning Techniques for Real Autonomous Underwater Cable Tracking

Andres El-Fakdi; Marc Carreras; Javier Antich; Alberto Ortiz

doi:10.5220/0001490500610068

LEARNING BY EXAMPLE - Reinforcement Learning Techniques for Real Autonomous Underwater Cable Tracking

Andres El-Fakdi, Marc Carreras, Javier Antich, Alberto Ortiz

2008

Abstract

This paper proposes a field application of a high-level Reinforcement Learning (RL) control system for solving the action selection problem of an autonomous robot in cable tracking task. The learning system is characterized by using a Direct Policy Search method for learning the internal state/action mapping. Policy only algorithms may suffer from long convergence times when dealing with real robotics. In order to speed up the process, the learning phase has been carried out in a simulated environment and, in a second step, the policy has been transferred and tested successfully on a real robot. Future steps plan to continue the learning process on-line while on the real robot while performing the mentioned task. We demonstrate its feasibility with real experiments on the underwater robot ICT INEU AUV .

References

Aberdeen, D. A. (2003). Policy-Gradient Algorithms for Partially Observable Markov Decision Processes. PhD thesis, Australian National University.
Anderson, C. (2000). Approximating a policy can be easier than approximating a value function. Computer science technical report, University of Colorado State.
Atkenson, C., Moore, A., and Schaal, S. (1997). Locally weighted learning. Artificial Intelligence Review, 11:11-73.
Bagnell, J. and Schneider, J. (2001). Autonomous helicopter control using reinforcement learning policy search methods. In Proceedings of the IEEE International Conference on Robotics and Automation, Korea.
Baxter, J. and Bartlett, P. (1999). Direct gradient-based reinforcement learning: I. gradient estimation algorithms. Technical report, Australian National University.
El-Fakdi, A., Carreras, M., and Ridao, P. (2006). Towards direct policy search reinforcement learning for robot control. In IEEE/RSJ International Conference on Intelligent Robots and Systems.
Hammer, B., Singh, S., and Scherer, S. (2006). Learning obstacle avoidance parameters from operator behavior. Journal of Field Robotics, Special Issue on Machine Learning Based Robotics in Unstructured Environments, 23 (11/12).
Haykin, S. (1999). Neural Networks, a comprehensive foundation. Prentice Hall, 2nd ed. edition.
Kohl, N. and Stone, P. (2004). Policy gradient reinforcement learning for fast quadrupedal locomotion. In IEEE International Conference on Robotics and Automation (ICRA).
Konda, V. and Tsitsiklis, J. (2003). On actor-critic algorithms. SIAM Journal on Control and Optimization, 42, number 4:1143-1166.
Lin, L. (1992). Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning, 8(3/4):293-321.
Marbach, P. and Tsitsiklis, J. N. (2000). Gradient-based optimization of Markov reward processes: Practical variants. Technical report, Center for Communications Systems Research, University of Cambridge.
Matsubara, T., Morimoto, J., Nakanishi, J., Sato, M., and Doya, K. (2005). Learning sensory feedback to CPG with policy gradient for biped locomotion. In Proceedings of the International Conference on Robotics and Automation ICRA, Barcelona, Spain.
Meuleau, N., Peshkin, L., and Kim, K. (2001). Exploration in gradient based reinforcement learning. Technical report, Massachusetts Institute of Technology, AI Memo 2001-003.
Ortiz, A., Simo, M., and Oliver, G. (2002). A vision system for an underwater cable tracker. International Journal of Machine Vision and Applications, 13 (3):129-140.
Ribas, D., Palomeras, N., Ridao, P., Carreras, M., and Hernandez, E. (2007). Ictineu auv wins the first sauce competition. In IEEE International Conference on Robotics and Automation.
Ridao, P., Tiano, A., El-Fakdi, A., Carreras, M., and Zirilli, A. (2004). On the identification of non-linear models of unmanned underwater vehicles. Control Engineering Practice, 12:1483-1499.
Rosenstein, M. and Barto, A. (2001). Robot weightlifting by direct policy search. In Proceedings of the International Joint Conference on Artificial Intelligence.
Smart, W. (2002). Making Reinforcement Learning Work on Real Robots. PhD thesis, Department of Computer Science at Brown University, Rhode Island.
Sutton, R. and Barto, A. (1998). Reinforcement Learning, An Introduction. MIT Press.
Sutton, R., McAllester, D., Singh, S., and Mansour, Y. (2000). Policy gradient methods for reinforcement learning with function approximation. Advances in Neural Information Processing Systems, 12:1057- 1063.
Tedrake, R., Zhang, T. W., and Seung, H. S. (2004). Stochastic policy gradient reinforcement learning on a simple 3D biped. In IEEE/RSJ International Conference on Intelligent Robots and Systems IROS'04, Sendai, Japan.

Download

Paper Citation

in Harvard Style

El-Fakdi A., Carreras M., Antich J. and Ortiz A. (2008). LEARNING BY EXAMPLE - Reinforcement Learning Techniques for Real Autonomous Underwater Cable Tracking . In Proceedings of the Fifth International Conference on Informatics in Control, Automation and Robotics - Volume 2: ICINCO, ISBN 978-989-8111-31-9, pages 61-68. DOI: 10.5220/0001490500610068

in Bibtex Style

@conference{icinco08,
author={Andres El-Fakdi and Marc Carreras and Javier Antich and Alberto Ortiz},
title={LEARNING BY EXAMPLE - Reinforcement Learning Techniques for Real Autonomous Underwater Cable Tracking},
booktitle={Proceedings of the Fifth International Conference on Informatics in Control, Automation and Robotics - Volume 2: ICINCO,},
year={2008},
pages={61-68},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001490500610068},
isbn={978-989-8111-31-9},
}

in EndNote Style

TY - CONF
JO - Proceedings of the Fifth International Conference on Informatics in Control, Automation and Robotics - Volume 2: ICINCO,
TI - LEARNING BY EXAMPLE - Reinforcement Learning Techniques for Real Autonomous Underwater Cable Tracking
SN - 978-989-8111-31-9
AU - El-Fakdi A.
AU - Carreras M.
AU - Antich J.
AU - Ortiz A.
PY - 2008
SP - 61
EP - 68
DO - 10.5220/0001490500610068