
 
implemented on a mobile robot. So, the work 
considered here improves (Prescott, 2006), in two 
aspects, reinforcement learning process is 
implemented on Khepera II and goal-directed 
behaviour is realized. The task considered could be 
easily upgraded for more complex scenarios. 
Here the choices of the robot are determined only 
by saliencies depending on sensor data. So the action 
selection is due to environmental inputs. In (Shultz 
1997, Dayan 2009), it has been discussed that the 
action selection is affected also by the dopamine 
value which is determined by emotional processes. 
Thus the choices of the robot should also be 
determined by 
r
W
 parameter. So the adaptation of 
r
W
could be considered to model the emotional 
drives. 
ACKNOWLEDGEMENTS 
The mobile robot Khepera II used for simulations 
and implementations belong to I.T.U. Artificial 
Intelligence and Robotics Laboratory. The authors 
would like to thank the laboratory staff and 
especially the coordinator Sanem Sarel Talay for 
their guidance and for their sharing knowledge. This 
work is partially supported by I.T.U. BAP project.  
REFERENCES 
Gurney, K., Prescott, T. J., Redgrave, P., 2001. 
Computational Model of Action Selection in the Basal 
Ganglia I: A New Functional Anatomy. Biological 
Cybernetics, vol.84, 401-410. 
Taylor, J. G., Taylor, N. R., 2000. Analysis of Recurrent 
Cortico-Basal Ganglia-Thalamic Loops for Working 
Memory. Biological Cybernetics, vol.82, 415-432. 
Schultz, W., Dayan, P., Montague, P. R., 1997. A Neural 
Substrate of Prediction and Reward. Science 275, 
1593-1599. 
Dayan, P., 2009. Dopamine, Reinforcement Learning, and 
Addiction. Pharmacopsychiatry. Vol.42, 56-65. 
Gillies, A., Arbuthnott, G., 2000. Computational Models 
of the Basal Ganglia. Movement Disorders. 15, no. 5, 
762-770. 
Sengor, N. S., Karabacak, O., Steinmetz, U., 2008. A 
Computational Model of Cortico- Striato-Thalamic 
Circuits in Goal-Directed Behavior. LNCS 5163, 
Proceedings of ICANN, 328-337. 
Gutkin, B. S., Dehaene, S., Changeux, J. P., 2006. A 
Neurocomputational Hypothesis for Nicotine 
Addiction. PNAS, vol.103, no.4, 1106-1111. 
Saeb, S., Weber, C., Triesh, J., 2009. Goal-directed  
learning of features and forward models. Neural 
Networks, vol.22, 586-592. 
Webb, B., 2000. What does robotics offer animal 
behavior? Animal Behavior, Vol. 60, 545-558  
Fleischer, J. G., Edelman, G. M., 2009. Brain-based 
devices.  IEEE Robotics and Automation Magazine, 
33-41. 
Prescott, T. J., Montes-Gonzalez, F. M., Gurney, K., 
Humpries, M. D., Redgrave, P., 2006. A Robot Model 
of the Basal Ganglia: Behaviour and Intrinsic 
Processing. Neural Networks, 1-31. 
Haber, S. N., 2010, The reward circuit: Linking primate 
anatomy and human imaging. Neuropsychopharmaco-
logy Reviews, 35, 4-26. 
Alexander, G. E., Crutcher, M. D., DeLong, M. R., 1990. 
Basal ganglia-thalamocortical circuits: Parallel 
substrates for motor, oculomotor, “prefrontal” and 
“limbic” functions. Progress in Brain Research, 85, 
119-146.  
Humphrys, M. “Action Selection Methods Using 
Reinforcement Learning”, Ph.D. Thesis, Trinity Hall, 
Cambridge, 1997. 
APPENDIX 
The algorithm corresponding to the model 
considered is summarized as follows: 
Begin 
SetCoefficients 
SetInitialCond 
GetSensorData 
ScaleSensorData 
1.  ReinforcementLearning 
If ∀DistSen=0&&grip=0&&wheels=0 
 EvaluationOfEquation 6-10 
Update a
11 
If DistSen2&&DistSen3=0 
 EvaluationOfEquation 6-10 
Update a
22
 
If ∀LightSen!=0&&∀DistSen!=0 
 EvaluationOfEquation 6-10 
     Update a
33 
2.  Saliencies 
S
i
=
∑
;i=1,2,3 
3.  Action Selection 
For IterationStep<200 
 EvaluationOfEquation 1-5 
4.  RobotMotion 
If e1>0.67&&e2<0.67&&e3<0.67 
 DoSalience1 
If e1<0.67&&e2>0.67&&e3<0.67 
 DoSalience2 
If e1<0.67&&e2<0.67&&e3>0.67 
 DoSalience3 
// e
i
 are cortex output values 
End.
 
ICAART 2012 - International Conference on Agents and Artificial Intelligence
294