A Machine-learning based Technique to Analyze the Dynamic Information for Visual Perception of Consonants

Wai Chee Yau, Dinesh Kant Kumar, Hans Weghorn

Abstract

This paper proposes a machine-learning based technique to investigate the significance of the dynamic information for visual perception of consonants. The visual speech information can be described using static (facial appearance) or dynamic (movement) features. The aim of this research is to determine the saliency of dynamic information represented by the lower facial movement for visual speech perception. The experimental results indicate that the facial movement is distinguishable for nine English consonants with a success rate of 85% using the proposed approach. The results suggest that time-varying information of visual speech contained in lower facial movements is useful for machine recognition of consonants and may be an essential cue for human perception of visual speech.

References

  1. Bishop, C. M.: Neural Networks for Pattern Recognition. Oxford University Press (1995)
  2. Bobick, A. F., Davis, J. W.: The Recognition of Human Movement Using Temporal Templates. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 23 (2001) 257-267
  3. Campbell, R.: The lateralisation of lipread sounds:A first look. Brain and Cognition. Vol. 5, (1986) 1-21
  4. Campbell, R., Dodd, B., Burnham, D.:Hearing by Eye II: Advances in the Psychology of Speechreading and Auditory-visual Speech.(1998) X-XIV Vol. 91 (2003)
  5. Chen, T.: Audiovisual Speech Processing. IEEE Signal Processing Magazine, Vol. 18. (2001) 9-21
  6. Hazen, T. J.: Visual Model Structures and Synchrony Constraints for Audio-Visual Speech Recognition. IEEE Transactions on Audio, Speech and Language Processing (2006) Vol. 14 No. 3 1082-1089
  7. Jones, D.:An Outline of English Phonetics,W Jeffer and Sons Ltd(1969) 23
  8. Kaplan, H., Bally, S. J., Garretson, C.:Speechreading: A Way to Improve Understanding.Gallaudet University Press,(1999)14-16
  9. Khontazad, A., Hong , Y. H.: Invariant Image Recognition by Zernike Moments. IEEE Transactions on Pattern Analysis and Machine Intelligence (1990) Vol. 12 489-497
  10. Kulkarni, A. D.: Artificial Neural Network for Image Understanding. Van Nostrand Reinhold (1994)
  11. Mallat, S.:A Wavelet Tour of Signal Processing. Academic Press (1998)
  12. McGurk, H., MacDonald, J.: Hearing Lips and Seeing Voices. Nature,Vol. 264 (1976)746- 748
  13. Petajan, E. D.: Automatic Lip-reading to Enhance Speech Recognition. In GLOBECOM'84,IEEE Global Telecommunication Conference (2004)
  14. Potamianos, G., Neti, C., Gravier, G., Senior, A.W.: Recent Advances in Automatic Recognition of Audio-Visual Speech. In Proc. of IEEE, Vol. 91 (2003)
  15. Potamianos, G., Neti, C.: Improved ROI and Within Frame Discriminant Features For Lipreading. In Proc. of Internation Conference on Image Processing, (2001) 250-253
  16. Simoncelli, E. P., Freeman, W. T., Adelson, E. H., Heeger, D. J.:Shiftable Multiscale Transform. IEEE Transactions on Information Theory (1992) Vol. 38 587-607
  17. Rosenblum, L. D., Saldaa, H. M. : Time-varying information for visual speech perception, in Hearing by Eye: Part 2, The Psychology of Speechreading and Audiovisual Speech, R. Campbell, B. Dodd, and D. Burnham, Editors. Earlbaum: Hillsdale, NJ (1998)61-81
  18. Stork, D. G., Hennecke, M. E.: Speechreading: An Overview of Image Processing, Feature Extraction, Sensory Intergration and Pattern Recognition Techiques.In the 2nd International Conference on Automatic Face and Gesture Recognition (FG 7896), (1996)
  19. Summerfield, A. Q.: Some preliminaries to a comprehensive account of audio-visual speech perception. Hearing by Eye : The Psychology of Lipreading (1987)
  20. Sumby, W. H., Pollack, I.: Visual contributions to speech intelligibility in noise. Journal of the Acoustical Society of America, Vol. 26 (1954) 212-215
  21. Teh, C. H., Chin, R. T.: On Image Analysis by the Methods of Moments. IEEE Transactions on Pattern Analysis and Machine Intelligence,Vol. 10. (1988)496-513
  22. Teague, M. R.: Image Analysis via the General Theory of Moments. Journal of the Optical Society of America (1980) Vol. 70 920-930
  23. Yau, W. C., Kumar, D. K., Arjunan, S. P. : Visual Speech Recognition Method Using Translation, Scale and Rotation Invariant Features. IEEE International Conference on Advanced Video and Signal based Surveillance, Sydney, Australia (2006)
  24. http://www.kt.tu-cottbus.de/speech-analysis/tech.html
Download


Paper Citation


in Harvard Style

Chee Yau W., Kant Kumar D. and Weghorn H. (2007). A Machine-learning based Technique to Analyze the Dynamic Information for Visual Perception of Consonants . In Proceedings of the 4th International Workshop on Natural Language Processing and Cognitive Science - Volume 1: NLPCS, (ICEIS 2007) ISBN 978-972-8865-97-9, pages 119-128. DOI: 10.5220/0002424801190128


in Bibtex Style

@conference{nlpcs07,
author={Wai Chee Yau and Dinesh Kant Kumar and Hans Weghorn},
title={A Machine-learning based Technique to Analyze the Dynamic Information for Visual Perception of Consonants},
booktitle={Proceedings of the 4th International Workshop on Natural Language Processing and Cognitive Science - Volume 1: NLPCS, (ICEIS 2007)},
year={2007},
pages={119-128},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002424801190128},
isbn={978-972-8865-97-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 4th International Workshop on Natural Language Processing and Cognitive Science - Volume 1: NLPCS, (ICEIS 2007)
TI - A Machine-learning based Technique to Analyze the Dynamic Information for Visual Perception of Consonants
SN - 978-972-8865-97-9
AU - Chee Yau W.
AU - Kant Kumar D.
AU - Weghorn H.
PY - 2007
SP - 119
EP - 128
DO - 10.5220/0002424801190128