Recognition of Human Movements Using Hidden Markov Models - An Application to Visual Speech Recognition

Wai Chee Yau, Dinesh Kant Kumar, Hans Weghorn

Abstract

This paper presents a novel approach for recognition of lower facial movements using motion features and hidden Markov models (HMM) for visual speech recognition applications. The proposed technique recognizes utterances based on mouth videos without using the acoustic signals. This paper adopts a visual speech model that divides utterances into sequences of smallest, visually distinguishable units known as visemes. The proposed technique uses the viseme model of Moving Picture Experts Group 4 (MPEG-4) standard. The facial movements in the video data are represented using 2D spatial-temporal templates (STT). The proposed technique combines discrete stationary wavelet transform (SWT) and Zernike moments to extract rotation invariant features from the STTs. Continuous HMM are used as speech classifier to model the English visemes. The preliminary results demonstrate that the proposed technique is suitable for classification of visemes with a good accuracy.

References

  1. Adjoudani, A., Benoit, C., Levine, E.P.: On the Integration of Auditory and Visual Parameters in an HMM-based ASR. Speechreading by Humans and Machines: Models, Systems, and Applications, Springer,(1996) 461-472
  2. Arjunan, S. P., Kumar, D. K., Yau, W. C., Weghorn, H. : Unspoken Vowel Recognition Using Facial Electromyogram. IEEE EMBC, New York, (2006)
  3. Bobick, A. F., Davis, J. W.: The Recognition of Human Movement Using Temporal Templates. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 23 (2001) 257-267
  4. Dempster, A. P., Laird, N. M., Rubin, D. B.: Maximum likelihood from incomplete data via the EM algorithm. J. Royyal Statist. Soc., Vol. 39 (1977) 1-38
  5. Dikshit, P. S., Schubert, R. W.: Electroglottograph as an additional source of information in isolated word recognition. Fourteenth Southern Biomedical Engineering Conference (1995) 1-4
  6. Fred, A., Marques, J. S., Jorge, P. M. : Hidden Markov models vs syntactic modeling in object recognition. In Intl Conference on Image Processing, ICIP97, Santa Barbara (1997) 893-896
  7. Foo, S. W., Dong, L.: Recognition of Visual Speech Elements Using Hidden Markov Models. Lecture notes in computer science, Springer-Verlag, Vol. 2532 (2002) 607-614
  8. Goldschen, A. J., Garcia, O. N., Petajan, E.:Continuous Optical Automatic Speech Recognition by Lipreading.presented at 28th Annual Asilomar Conf on Signal Systems and Computer(1994)
  9. Gray, M. S., Movella, J. R., Sejnowski, T. J.: Dynamic features for visual speechreading : a systematic comparison. 3rd Joint Symposium on Neural Computation, La Jolla (1997)
  10. Hazen, T. J.: Visual Model Structures and Synchrony Constraints for Audio-Visual Speech Recognition. IEEE Transactions on Audio, Speech and Language Processing (2006) Vol. 14 No. 3 1082-1089
  11. Jain, A. K., Duin, R. P. W., Mao, J. : Statistical Pattern Recognition : A Review. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 22, No. 1 (2000) 4-37
  12. Kaynak, M. N., Qi, Z., Cheok, A. D., Sengupta, K., Chung, K. C. : Audio-visual modeling for bimodal speech recognition. IEEE Transactions on Systems, Man and Cybernetics, (2001) Vol. 34 564-570
  13. Khontazad, A., Hong, Y. H.: Invariant Image Recognition by Zernike Moments. IEEE Transactions on Pattern Analysis and Machine Intelligence (1990) Vol. 12 489-497
  14. Liang, L., Liu, X. , Zhao, Y., Pi, X., Nefian, A. V.: Speaker Independent Audio-Visual Continuous Speech Recognition. In IEEE Int. Conf. on Multimedia and Expo (2002)
  15. Lippmann, R. P.: Speech recognition by machines and humans. J. Speech Communication (1997) Vol. 22 1-15
  16. Mallat, S.: A Wavelet Tour of Signal Processing. Academic Press (1998)
  17. Moeslund, T. B., Hilton, A., Kruger, V. : A survey of advances in vison-based human motion capture and analysis. Computer Vision and Image Understanding, Vol. 104 (2006) 90-126
  18. Manabe, H., Hiraiwa, A. : Unvoiced speech recognition using EMG - mime speech recognition. Conference on Human Factors in Computing Systems CHI 7803,Ft. Lauderdale, Florida, USA, (2003) 794-795
  19. Mase, K., Pentland, A.: Automatic lipreading by optical-flow analysis. Systems and Computers in Japan, (1991) Vol. 22, 67-76
  20. Petajan, E. D.: Automatic Lip-reading to Enhance Speech Recognition. In GLOBECOM'84,IEEE Global Telecommunication Conference (1984)
  21. Potamianos, G., Neti, C., Gravier, G., Senior, A. W.: Recent Advances in Automatic Recognition of Audio-Visual Speech. In Proc. of IEEE, Vol. 91 (2003)
  22. Potamianos, G., Neti, C., Luettin, J., Matthews, I.: Audio-Visual Automatic Speech Recognition: An Overview. Issues in Visual and Audio-Visual Speech Processing, (2004)
  23. Rabiner, L. R. : A tutorial on HMM and selected applications in speech recognition.Proc. IEEE, Vol. 77, No. 2, Issue 2 ,(1989) 257-286
  24. Scanlon, P., Reilly, R. B., Chazal, P.D.: Visual Feature Analysis for Automatic Speechreading. Proceedings of Audio Visual Speech Processing Conf., France, (2003)
  25. Teh, C. H., Chin, R. T.: On Image Analysis by the Methods of Moments. IEEE Transactions on Pattern Analysis and Machine Intelligence,Vol. 10. (1988)496-513
  26. Yau, W. C., Kumar, D. K., Arjunan, S. P. : Visual Speech Recognition Method Using Translation, Scale and Rotation Invariant Features. IEEE International Conference on Advanced Video and Signal based Surveillance, Sydney, Australia (2006)
Download


Paper Citation


in Harvard Style

Chee Yau W., Kant Kumar D. and Weghorn H. (2007). Recognition of Human Movements Using Hidden Markov Models - An Application to Visual Speech Recognition . In Proceedings of the 7th International Workshop on Pattern Recognition in Information Systems - Volume 1: PRIS, (ICEIS 2007) ISBN 978-972-8865-93-1, pages 151-160. DOI: 10.5220/0002424901510160


in Bibtex Style

@conference{pris07,
author={Wai Chee Yau and Dinesh Kant Kumar and Hans Weghorn},
title={Recognition of Human Movements Using Hidden Markov Models - An Application to Visual Speech Recognition},
booktitle={Proceedings of the 7th International Workshop on Pattern Recognition in Information Systems - Volume 1: PRIS, (ICEIS 2007)},
year={2007},
pages={151-160},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002424901510160},
isbn={978-972-8865-93-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 7th International Workshop on Pattern Recognition in Information Systems - Volume 1: PRIS, (ICEIS 2007)
TI - Recognition of Human Movements Using Hidden Markov Models - An Application to Visual Speech Recognition
SN - 978-972-8865-93-1
AU - Chee Yau W.
AU - Kant Kumar D.
AU - Weghorn H.
PY - 2007
SP - 151
EP - 160
DO - 10.5220/0002424901510160