of visual information for the automatic speech
recognition. The major difficulty of the lip-reading
system is the extraction of the visual speech
descriptors. In fact, to ensure this task it is necessary
to carry out an automatic tracking of the labial
gestures. The lip tracking constitutes in itself an
important difficulty. This complexity consists in the
capacity to treat the immense variability of the lip
movement for the same speaker and the various lip
configurations between different speakers.
In this paper, we have presented our ALiFE
system of visual speech recognition. ALiFE is a
system for the extraction of visual speech features
and their modeling for visual speech recognition.
The system includes three principle parts: lip
localization and tracking, lip feature extraction, and
the classification and recognition of the viseme. This
system has been tested with success on our audio-
visual corpus, for the tracking of characteristic
points on lip contours and for the recognition of the
viseme.
However, more work should be carried out to
improve the efficacy of our lip-reading system. As a
perspective of this work, we propose to add other
consistent features. We also propose to enhance the
recognition stage by the adequate definition of the
feature coefficients for each viseme (use of the
principal component analysis ACP). Finally, we plan
to enlarge the content of our audio-visual corpus to
cover other French language visemes and why not to
discover other languages.
REFERENCES
Petajan, E. D., Bischoff, B., Bodoff, D., and Brooke, N.
M., “An improved automatic lipreading system to
enhance speech recognition,” CHI 88, pp. 19-25,
1988.
Daubias P., Modèles a posteriori de la forme et de
l’apparence des lèvres pour la reconnaissance
automatique de la parole audiovisuelle. Thèse à
l’Université de Maine France 05-12-2002.
Goecke R., A Stereo Vision Lip Tracking Algorithm and
Subsequent Statistical Analyses of the Audio-Video
Correlation in Australian English. Thesis Research
School of Information Sciences and Engineering. The
Australian National University Canberra, Australia,
January 2004.
McGurck et Mcdonald J., Hearing lips and seeing voice.
Nature, 264 : 746-748, Decb 1976.
Matthews I., J. Andrew Bangham, and Stephen J. Cox.
Audiovisual speech recognition using multiscale
nonlinear image decomposition. Proc . 4th ICSLP,
volume1, page 38-41, Philadelphia, PA, USA, Octob
1996.
Meier U., Rainer Stiefelhagen, Jie Yang et Alex Waibe.
Towards unrestricted lip reading. Proc 2nd
International conference on multimodal Interfaces
(ICMI), Hong-kong, Jan 1999.
Prasad, K., Stork, D., and Wolff, G., “Preprocessing video
images for neural learning of lipreading,” Technical
Report CRC-TR-9326, Ricoh California Research
Center, September 1993.
Rao, R., and Mersereau, R., “On merging hidden Markov
models with deformable templates,” ICIP 95,
Washington D.C., 1995.
Delmas P., Extraction des contours des lèvres d’un visage
parlant par contours actif (Application à la
communication multimodale). Thèse à l’Institut
National de polytechnique de Grenoble, 12-04-2000.
Potamianos G., Hans Peter Graft et eric Gosatto. An
Image transform approach For HM based automatic
lipreading. Proc, ICIP, Volume III, pages 173-177,
Chicago, IL, USA Octb 1998.
Matthews I., J. Andrew Bangham, and Stephen J. Cox. A
comparaison of active shape models and scale
decomposition based features for visual speech
recognition. LNCS, 1407 514-528, 1998.
Eveno N., “Segmentation des lèvres par un modèle
déformable analytique”, Thèse de doctorat de l’INPG,
Grenoble, Novembre 2003.
Eveno N., A. Caplier, and P-Y Coulon, “Accurate and
Quasi-Automatic Lip Tracking”, IEEE Transaction on
circuits and video technology, Mai 2004.
Miyawaki T, Ishihashi I, Kishino F. Region separation in
color images using color information. Tech Rep IEICE
1989; IE89-50.
Nakata Y., Ando M. Lipreading Method Using Color
Extraction Method and Eigenspace Technique
Systems and Computers in Japan, Vol. 35, No. 3, 2004
Zhang X., Mersereau R., Clements M. and Broun C.,
Visual Speech feature extractionfor improved speech
recognition. In Proc. ICASSP, Volume II, pages 1993-
1996, Orlondo,FL, USA, May 13-17 2002.
Werda S., Mahdi W. and Benhamadou A., “A Spatial-
Temporal technique of Viseme Extraction:
Application in Speech Recognition “, SITIS 05, IEEE,
Werda S., Mahdi W., Tmar M. and Benhamadou A.,
“ALiFE: Automatic Lip Feature Extraction: A New
Approach for Speech Recognition Application “, the
2nd IEEE International Conference on Information &
Communication Technologies: from Theory to
Applications - ICTTA’06 - Damascus, Syria. 2006.
Werda S., Mahdi W. and Benhamadou A.,
“LipLocalization and Viseme Classification for Visual
Speech Recognition”, International Journal of
Computing & Information Sciences. Vol.4, No.1,
October 2006.
AUTOMATIC LIP LOCALIZATION AND FEATURE EXTRACTION FOR LIP-READING
275