REALWORLD ROBOT NAVIGATION BY TWO DIMENSIONAL EVALUATION REINFORCEMENT LEARNING

Hiroyuki Okada

doi:10.5220/0001136602490255

REALWORLD ROBOT NAVIGATION BY TWO DIMENSIONAL EVALUATION REINFORCEMENT LEARNING

Hiroyuki Okada

2004

Abstract

The trade-off of exploration and exploitation is present for a learnig method based on the trial and error such as reinforcement learning. We have proposed a reinforcement learning algorism using reward and punishment as repulsive evaluation(2D-RL). In the algorithm, an appropriate balance between exploration and exploitation can be attained by using interest and utility. In this paper, we applied the 2D-RL to a navigation learning task of mobile robot, and the robot found a better path in real world by 2D-RL than by traditional actor-critic model.

References

L.P.Kaelbling, K.L.Littman and A.W.Moore(1996). Reinforcement learning :A survey. In Journal of Arti cial Intelligence Research. vol.4, pp237-285.
T.Unemi(1994). Reinforcement Learning. In Journal of Japanese Society for Arti cial Intelligence. vol.9, no.6, pp830-836.
M.Yamamura, K.MIyazaki and S.Kobayashi(1995). A Survey on Learning for Agents. In Journal of Japanese Society for Arti cial Intelligence. vol.10, no.5, pp23- 29.
C.J.Watkins and P.Dayan(1992). Learning. In Machine Learning. vol.8, pp.55-68.
R.S.Sutton and A.G.Barto(1998). Reinforcement Learning. MIT Press.
J.J.Grefenstette(1998) Credit Assignment in Rule Discovery Systems Based on Genetic Algorithms. In Machine Learning. vol.3, pp225-245.
K.Miyazaki, M.Yamamura and S.Kobayashi(1997) A Theory of Pro t Sharing in Reinforcement Learning. In Journal of Japanese Society for Arti cial Intelligence. vol.9, no.4, pp104-111.
K.Miyazaki, M.Yamamura and S.Kobayashi(1998) kCertainty Exploration Method: An Action Selector on Reinforcement Learning to Identify the Environment. In Journal of Japanese Society for Arti cial Intelligence. vol.10, no.3, pp124-133.
K.Miyazaki, M.Yamamura and S.Kobayashi(1997) MarcoPolo: A Reinforcement Learning System Considering Tradeoff Expolitation and Exploration under Markovian Environment In Journal of Japanese Society for Arti cial Intelligence. vol.12, no.1, pp'-89.
E.Uchibe and M.Asada(1999) Reinforcement Learning based on Multiple Reward Function for Cooperative Behavior Acquisition in a Multiagent Environment In RSJ'99. vol3, pp.983-984.
N.E.Miller(1959). Liberalization of basic S-R concepts:extensions to con ict behavior, motivation and social learning. In Koch.S(Ed), Psychology:A Study of a Science. New York:McFraw-Hill.
J.R.Ison and A.J.Rosen(1967). The effect of amobarbital sodium on differential instrumental conditioning and subsequent extinction. In Psyhopharmacologia. vol.10, pp417-425.
B.Milner(1963). Effects of different brain lesions on card sorting. In Archives of Neurology. vol.9, pp10-100.
H.Okada and H.Yamakawa(1997). Neuralnetowrk model for attention and reinforcement learning. In SIG-CII9710. pp4-14.
A.G.Barto, R.S.Suttond and C.W.Anderson(1983). Neuronlike Adaptive Elements That Can Solve Dif cut Learning Control Problems. In IEEE Transaction on Systems, Man and Cybernetics. vol.13, no.5, pp834- 846.
H.Okada, O.Ito, Y.Hagihara, K.Niki and T.Omori(1999) Multilevel Environment for Mobile Robotics Capability Experiments (MEMORABLE). In Journal of the Robotics Society of Japan. vol.17, no.6, pp142-150.

Download

Paper Citation

in Harvard Style

Okada H. (2004). REALWORLD ROBOT NAVIGATION BY TWO DIMENSIONAL EVALUATION REINFORCEMENT LEARNING . In Proceedings of the First International Conference on Informatics in Control, Automation and Robotics - Volume 2: ICINCO, ISBN 972-8865-12-0, pages 249-255. DOI: 10.5220/0001136602490255

in Bibtex Style

@conference{icinco04,
author={Hiroyuki Okada},
title={REALWORLD ROBOT NAVIGATION BY TWO DIMENSIONAL EVALUATION REINFORCEMENT LEARNING},
booktitle={Proceedings of the First International Conference on Informatics in Control, Automation and Robotics - Volume 2: ICINCO,},
year={2004},
pages={249-255},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001136602490255},
isbn={972-8865-12-0},
}

in EndNote Style

TY - CONF
JO - Proceedings of the First International Conference on Informatics in Control, Automation and Robotics - Volume 2: ICINCO,
TI - REALWORLD ROBOT NAVIGATION BY TWO DIMENSIONAL EVALUATION REINFORCEMENT LEARNING
SN - 972-8865-12-0
AU - Okada H.
PY - 2004
SP - 249
EP - 255
DO - 10.5220/0001136602490255