MULTI-MODAL FUSION OF SPEECH-GESTURE USING INTEGRATED PROBABILITY DENSITY FUNCTION

Chi-Geun Lee; Mun-Sung Han; Chang-Seok Bae; Jin-Tae Kim

doi:10.5220/0001432402110215

MULTI-MODAL FUSION OF SPEECH-GESTURE USING INTEGRATED PROBABILITY DENSITY FUNCTION

Chi-Geun Lee, Mun-Sung Han, Chang-Seok Bae, Jin-Tae Kim

2009

Abstract

Recently, multi-modal recognition has become a hot topic in the field of Ubiquitous, Speech and gesture recognition, especially, are the most important modalities of human-to-machine interaction. Although speech recognition has been explored extensively and successfully developed, it still encounters serious errors in noisy environments. In such cases, gestures, a by-product of speech, can be used to help interpret the speech. In this paper, we propose a method of multi-modal fusion recognition of speech-gesture using integrated discrete probability density function omit estimated by a histogram. The method is tested with a microphone and a 3-axis accelerator in a real-time experiment. The test has two parts : a method of add-and-accumulate speech and gesture probability density functions respectively, and a more complicated method of creating new probability density function from integrating the two PDF’s of speech and gesture.

References

Rajeev Sharma, Vladimir I. Pavlovic. Thomas S.Huang, "Toward Multimodal Human-Computer Interface", Proceedings of the IEEE Vol. 86. No5. May 1998.
D. Alissali, P. Deleglise, A. Rogozan, "Asynchronous integration of visual information in an automatic speech recognition system", in Proc. of Fourth International Conference on Spoken Language(ICSLP 96), Vol. 1, pp. 34 -37, 1996
M. Heckmann, F. Berthommier, K. Kroschel, "Optimal weighting of posteriors for audio-visual speech recognition", in Proc. of IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 1, pp. 161 -164 , 2001
H. Glotin, D. Vergyr, C. Neti, G. Potamianos, J. Luettin, "Weighting schemes for audio-visual fusion in speech recognition", in Proc. of IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 1, pp. 173 -176, 2001
A. Ghosh, A. Verma, A. Sarkar, "Using likelihood Lstatistics to measure confidence in audio-visual speech recognition", in Proc. of IEEE Fourth Workshop on Multimedia Signal Processing, pp. 27-32, 2001
S. Lucey, S. Sridharan, V. Chandran, "Improved speech recognition using adaptive audio-visual fusion via a stochastic secondary classifier", in Proc. of International Symposium on Intelligent Multimedia, Video and Speech Processing, pp. 551 -554, 2001.
D. Kim, J, Lee, J. Soh, Y. Chung. "Real-time Face Verification using Multiple Feature Combination and SVM Supervisor", ICASSP, 2003, Vol. 2, pp 353-356.
Keun-Chang Kwak, Kyu-Dae Ban, Kyung-Suk Bae, "Speech-based Human-Robot Interaction Components for URC Intelligent Service Robot", IEEE/RSJ International Conference on Intelligent Robots and Systems, Video Session, 2006.
U. Meier, R. Stiefelhagen, J. Yang, A. Waibel, "Towards Unrestricted Lipreading", International Journal of pattern Recognition and Artificial Intelligence, Vol. 14, No. 5, pp. 571-785, 2000.
Byung-Jun Yoon and P.P. Vaidynathan, "Discrete PDF estimation in the presence of noise", Proc. International Symposium on Circuits and Systems (ISCAS), Vancouver, May 2004.

Download

Paper Citation

in Harvard Style

Lee C., Han M., Bae C. and Kim J. (2009). MULTI-MODAL FUSION OF SPEECH-GESTURE USING INTEGRATED PROBABILITY DENSITY FUNCTION . In Proceedings of the International Conference on Bio-inspired Systems and Signal Processing - Volume 1: BIOSIGNALS, (BIOSTEC 2009) ISBN 978-989-8111-65-4, pages 211-215. DOI: 10.5220/0001432402110215

in Bibtex Style

@conference{biosignals09,
author={Chi-Geun Lee and Mun-Sung Han and Chang-Seok Bae and Jin-Tae Kim},
title={MULTI-MODAL FUSION OF SPEECH-GESTURE USING INTEGRATED PROBABILITY DENSITY FUNCTION},
booktitle={Proceedings of the International Conference on Bio-inspired Systems and Signal Processing - Volume 1: BIOSIGNALS, (BIOSTEC 2009)},
year={2009},
pages={211-215},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001432402110215},
isbn={978-989-8111-65-4},
}

in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Bio-inspired Systems and Signal Processing - Volume 1: BIOSIGNALS, (BIOSTEC 2009)
TI - MULTI-MODAL FUSION OF SPEECH-GESTURE USING INTEGRATED PROBABILITY DENSITY FUNCTION
SN - 978-989-8111-65-4
AU - Lee C.
AU - Han M.
AU - Bae C.
AU - Kim J.
PY - 2009
SP - 211
EP - 215
DO - 10.5220/0001432402110215