When a poster is recognized, R is normalized to ob-
tain a discrete probability distribution, and H is cal-
culated for this distribution. We consider three main
categories to evaluate the confidency on poster classi-
fication: high, medium and low assigned accordingly
to threshold values of H defined by experimentation.
To evaluate the selection of a region we use vector
P described in Section 4.3. If the bin with the maxi-
mum number of image frames is above 15 and below
30, then the robot warns the user that it is not com-
pletely sure about the selection. If the bin with the
maximum is below 15, the robot tells the user that the
identification of a section was not possible, and asks
for another attempt.
5.2 Results
We have tested our approach with 5 different people
in several demonstrations in our Lab. All these people
are either students or Professors of our Department.
In all cases, the robot was able to correctly identify the
desired poster, or to ask the user in case of doubt. Al-
most all people were able to select the desired section
of the poster within a single trial, and they seemed
to be satisfied with the corresponding explanations.
However, we also observed that not all people points
to the poster immediately after the alert sound is emit-
ted by the robot, mainly because they had not yet de-
cided which section to select. For this case, evalua-
tion of the pointing output has proved to be an useful
tool to add flexibility to our system. From initial us-
ability tests performed with these users, we found that
evaluating confidence of the visual analysis improves
considerably the perceived naturalness of the spoken
language of the robot.
6 CONCLUSIONS
In this paper we presented our work on the integration
of pointing gestures into a spoken dialog system in
Spanish for a conversational service robot. The dialog
system is composed by a dialog manager, that inter-
prets a dialog model which defines the spoken dialog
and robot actions, accordingly to the user intentions
and its environment. We developeda tour–guide robot
that navigates in its environment, visually identify in-
formational posters, and explain sections of the poster
pointed by the user with its arm. The robot is able to
qualify its confidence in its visual outcomes and to
start error–prevention dialogs with the user. Our re-
sults showed the effectiveness of the overall approach
and the suitability of our dialog system to model com-
plex multimodal human–robot interactions.
REFERENCES
Aguilar, W. and Pineda, L. (2009). Integrationg Graph–
Based Vision Perception to Spoken Conversation
in Human–Robot interaction. In 10th Interna-
tional Work–Conference on Artificial Neural Net-
works, pages 789–796.
Aviles, H., Sucar, E., Vargas, B., Sanchez, J., and Corona,
E. (2009). Markovito: A Flexible and General Service
Robot, chapter 19, pages 401–423. Studies in Compu-
tational Intelligence. Springer Berlin / Heidelberg.
Burger, B., Lerasle, F., Ferrane, I., and Clodic, A.
(2008). Mutual assistance between speech and vi-
sion for human–robot interaction. In IEEE/RSJ Inter-
national Conference on Intelligent Robotics and Sys-
tems, pages 4011–4016.
Pineda, A., Villase˜nor, L., Cuetara, J., Castellanos, H., and
Lopez, I. (2004). Dimex100: A new phonetic and
speech corpus for Mexican Spanish. In Iberamia-
2004, Lectures Notes in Artificial Intelligence 3315,
pages 974–983.
Rogalla, O., Ehrenmann, O., Zoellner, M., Becher, R., and
Dillmann, R. (2002). Using gesture and speech con-
trol for commanding a robot assistant. In Proceedings
of the 11th IEEE Workshop on Robot and Human in-
teractive Communication, pages 454–459.
Stiefelhagen, R., Ekenel, H., C. Fugen, P. G., Holzapfel, H.,
Kraft, F., Nickel, K., Voit, M., and Waibel, A. (2006).
Enabling multimodal human–robot interaction for the
karlsruhe humanoid robot. In Proceedings of the IEEE
Transactions on Robotics: Special Issue on Human–
Robot Interaction, pages 1–11.
Tojo, T., Matsusaka, Y., and Ishii, T. (2000). A Conversa-
tional Robot Utilizing Facial and Body Expressions.
In International Conference on Systems, Man and Cy-
bernetics, (SMC2000), pages 858–863.
Toptsis, I., Haasch, A., H¨uwel, S., Fritsch, J., and Fink, G.
(2005). Modality integration and dialog management
for a robotic assistant. In European Conference on
Speech Communication and Technology, Lisboa, Por-
tugal.
ICAART 2010 - 2nd International Conference on Agents and Artificial Intelligence
588