MULTIMODAL COMMUNICATION ERROR DETECTION FOR DRIVER-CAR INTERACTION

Sy Bor Wang; David Demirdjian; Trevor Darrell; Hedvig Kjellström

doi:10.5220/0001637603650371

MULTIMODAL COMMUNICATION ERROR DETECTION FOR DRIVER-CAR INTERACTION

Sy Bor Wang, David Demirdjian, Trevor Darrell, Hedvig Kjellström

2007

Abstract

Speech recognition systems are now used in a wide variety of domains. They have recently been introduced in cars for hand-free control of radio, cell-phone and navigation applications. However, due to the ambient noise in the car recognition errors are relatively frequent. This paper tackles the problem of detecting when such recognition errors occur from the driver’s reaction. Automatic detection of communication errors in dialogue-based systems has been explored extensively in the speech community. The detection is most often based on prosody cues such as intensity and pitch. However, recent perceptual studies indicate that the detection can be improved significantly if both acoustic and visual modalities are taken into account. To this end, we present a framework for automatic audio-visual detection of communication errors.

References

Barkhuysen, P., Krahmer, E., and Swerts, M. (2004). Audiovisual perception of communication problems. In Speech Prosody.
Boersma, P. (1993). Accurate short-term analysis of the fundamental frequency and the harmonis-to-noise ratio of a sampled sound. In IFA.
Boersma, P. (2001). Praat, a system for doing phonetics by computer. Glot International, 5(9/10):341-345.
Chen, L., Huang, T. S., Miyasato, T., and Nakatsu, R. (1998). Multimodal human emotion/expression recognition. In International Conference on Face and Gesture Recognition.
Childers, D. G. (1978). Modern Spectrum Analysis. IEEE Press.
Fischler, M. A. and Bolles, R. C. (1981). Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 24(6):381-395.
Hirschberg, J., Litman, D., and Swerts, M. (2001). Identifying user corrections automatically in spoken dialogue systems. In 2nd Meeting of the North American Chapter of the Association for Computational Linguistics on Language Technologies.
Kittler, J., Hatef, M., Duin, R., and Matas, J. (1998). On combining classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(3):226-239.
Litman, D., Hirschberg, J., and Swerts, M. (2001). Predicting user reactions to system error. In ACL.
Massaro, D. (1987). Speech Perception By Ear and Eye. Lawrence Erlbaum Associates, Hillsdale, NJ, USA.
Morency, L., Rahimi, A., and Darrell, T. (2003). Adaptive view-based appearance models. In IEEE Conference on Computer Vision and Pattern Recognition, pages 803-810.
Oviatt, S. L. and VanGent, R. (1998). Error resolution during multimodal human-computer interaction. In SpeechCommunication.
Quattoni, A., Collins, M., and Darrell, T. (2004). Conditional random fields for object recognition. In Neural Information Processing Systems.
Rabiner, L. R. (1989). A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2):257-286.
Sebe, N., Lew, M., Cohen, I., Sun, Y., Gevers, T., and Huang, T. S. (2004). Authentic facial expression analysis. In International Conference on Automatic Face and Gesture Recognition.
Summerfield, Q. (1987). Some preliminaries to a comprehensive account of audio-visual speech perception. In Dodd, B. and Campbell, R., editors, Hearing by Eye, pages 3-51. Lawrence Erlbaum Associates, Hillsdale, NJ, USA.
Viola, P. and Jones, M. J. (2004). Robust real-time face detection. International Journal of Computer Vision, 57(2):137-154.
Wang, S., Quattoni., A., Morency, L.-P., Demirdjian, D., and Darrell, T. (2006). Hidden conditional random fields for gesture recognition. In IEEE Conference on Computer Vision and Pattern Recognition.
Zeng, Z., Tu, J., Liu, M., Zhang, T., Rizzolo, N., Zhang, Z., Huang, T. S., Roth, D., and Levinson, S. (2004). Bimodal hci-related affect recognition. In International Conference on Multimodal Interfaces.

Download

Paper Citation

in Harvard Style

Bor Wang S., Demirdjian D., Darrell T. and Kjellström H. (2007). MULTIMODAL COMMUNICATION ERROR DETECTION FOR DRIVER-CAR INTERACTION . In Proceedings of the Fourth International Conference on Informatics in Control, Automation and Robotics - Volume 2: IVCS, (ICINCO 2007) ISBN 978-972-8865-83-2, pages 365-371. DOI: 10.5220/0001637603650371

in Bibtex Style

@conference{ivcs07,
author={Sy Bor Wang and David Demirdjian and Trevor Darrell and Hedvig Kjellström},
title={MULTIMODAL COMMUNICATION ERROR DETECTION FOR DRIVER-CAR INTERACTION},
booktitle={Proceedings of the Fourth International Conference on Informatics in Control, Automation and Robotics - Volume 2: IVCS, (ICINCO 2007)},
year={2007},
pages={365-371},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001637603650371},
isbn={978-972-8865-83-2},
}

in EndNote Style

TY - CONF
JO - Proceedings of the Fourth International Conference on Informatics in Control, Automation and Robotics - Volume 2: IVCS, (ICINCO 2007)
TI - MULTIMODAL COMMUNICATION ERROR DETECTION FOR DRIVER-CAR INTERACTION
SN - 978-972-8865-83-2
AU - Bor Wang S.
AU - Demirdjian D.
AU - Darrell T.
AU - Kjellström H.
PY - 2007
SP - 365
EP - 371
DO - 10.5220/0001637603650371