implemented on a mobile robot. So, the work
considered here improves (Prescott, 2006), in two
aspects, reinforcement learning process is
implemented on Khepera II and goal-directed
behaviour is realized. The task considered could be
easily upgraded for more complex scenarios.
Here the choices of the robot are determined only
by saliencies depending on sensor data. So the action
selection is due to environmental inputs. In (Shultz
1997, Dayan 2009), it has been discussed that the
action selection is affected also by the dopamine
value which is determined by emotional processes.
Thus the choices of the robot should also be
determined by
r
W
parameter. So the adaptation of
r
W
could be considered to model the emotional
drives.
ACKNOWLEDGEMENTS
The mobile robot Khepera II used for simulations
and implementations belong to I.T.U. Artificial
Intelligence and Robotics Laboratory. The authors
would like to thank the laboratory staff and
especially the coordinator Sanem Sarel Talay for
their guidance and for their sharing knowledge. This
work is partially supported by I.T.U. BAP project.
REFERENCES
Gurney, K., Prescott, T. J., Redgrave, P., 2001.
Computational Model of Action Selection in the Basal
Ganglia I: A New Functional Anatomy. Biological
Cybernetics, vol.84, 401-410.
Taylor, J. G., Taylor, N. R., 2000. Analysis of Recurrent
Cortico-Basal Ganglia-Thalamic Loops for Working
Memory. Biological Cybernetics, vol.82, 415-432.
Schultz, W., Dayan, P., Montague, P. R., 1997. A Neural
Substrate of Prediction and Reward. Science 275,
1593-1599.
Dayan, P., 2009. Dopamine, Reinforcement Learning, and
Addiction. Pharmacopsychiatry. Vol.42, 56-65.
Gillies, A., Arbuthnott, G., 2000. Computational Models
of the Basal Ganglia. Movement Disorders. 15, no. 5,
762-770.
Sengor, N. S., Karabacak, O., Steinmetz, U., 2008. A
Computational Model of Cortico- Striato-Thalamic
Circuits in Goal-Directed Behavior. LNCS 5163,
Proceedings of ICANN, 328-337.
Gutkin, B. S., Dehaene, S., Changeux, J. P., 2006. A
Neurocomputational Hypothesis for Nicotine
Addiction. PNAS, vol.103, no.4, 1106-1111.
Saeb, S., Weber, C., Triesh, J., 2009. Goal-directed
learning of features and forward models. Neural
Networks, vol.22, 586-592.
Webb, B., 2000. What does robotics offer animal
behavior? Animal Behavior, Vol. 60, 545-558
Fleischer, J. G., Edelman, G. M., 2009. Brain-based
devices. IEEE Robotics and Automation Magazine,
33-41.
Prescott, T. J., Montes-Gonzalez, F. M., Gurney, K.,
Humpries, M. D., Redgrave, P., 2006. A Robot Model
of the Basal Ganglia: Behaviour and Intrinsic
Processing. Neural Networks, 1-31.
Haber, S. N., 2010, The reward circuit: Linking primate
anatomy and human imaging. Neuropsychopharmaco-
logy Reviews, 35, 4-26.
Alexander, G. E., Crutcher, M. D., DeLong, M. R., 1990.
Basal ganglia-thalamocortical circuits: Parallel
substrates for motor, oculomotor, “prefrontal” and
“limbic” functions. Progress in Brain Research, 85,
119-146.
Humphrys, M. “Action Selection Methods Using
Reinforcement Learning”, Ph.D. Thesis, Trinity Hall,
Cambridge, 1997.
APPENDIX
The algorithm corresponding to the model
considered is summarized as follows:
Begin
SetCoefficients
SetInitialCond
GetSensorData
ScaleSensorData
1. ReinforcementLearning
If ∀DistSen=0&&grip=0&&wheels=0
EvaluationOfEquation 6-10
Update a
11
If DistSen2&&DistSen3=0
EvaluationOfEquation 6-10
Update a
22
If ∀LightSen!=0&&∀DistSen!=0
EvaluationOfEquation 6-10
Update a
33
2. Saliencies
S
i
=
∑
;i=1,2,3
3. Action Selection
For IterationStep<200
EvaluationOfEquation 1-5
4. RobotMotion
If e1>0.67&&e2<0.67&&e3<0.67
DoSalience1
If e1<0.67&&e2>0.67&&e3<0.67
DoSalience2
If e1<0.67&&e2<0.67&&e3>0.67
DoSalience3
// e
i
are cortex output values
End.
ICAART 2012 - International Conference on Agents and Artificial Intelligence
294