labial gestures. The lip tracking constitutes in itself
an important difficulty. This complexity consists in
the capacity to treat the immense variability of the
lip movement for the same speaker and the various
lip configurations between different speakers.
In this paper, we have presented our ALiFE
system of visual speech recognition. ALiFE is a
system for the extraction of visual speech features
and their modeling for visual speech recognition.
The system includes three principle parts: lip
localization and tracking, lip feature extraction, and
the classification and recognition of the viseme. This
system has been tested with success on our audio-
visual corpus, for the tracking of characteristic
points on lip contours and for the recognition of the
viseme.
However, more work should be carried out to
improve the efficacy of our lip-reading system. As a
perspective of this work, we propose to add other
consistent features to resolve the confusion between
some visemes. We also propose to enhance the
recognition stage by the adequate definition of the
feature coefficients for each viseme. Finally, we plan
to enlarge the content of our audio-visual corpus to
cover other French language visemes and why not to
test our system performance with other languages.
REFERENCES
Petajan, E. D., Bischoff, B., Bodoff, D., and Brooke, N.
M., “An improved automatic lipreading system to
enhance speech recognition,” CHI 88, pp. 19-25,
1988.
Philippe Daubias, Modèles a posteriori de la forme et de
l’apparence des lèvres pour la reconnaissance
automatique de la parole audiovisuelle. Thèse à
l’Université de Maine France 05-12-2002.
Roland Goecke, A Stereo Vision Lip Tracking Algorithm
and Subsequent Statistical Analyses of the Audio-
Video Correlation in Australian English. Thesis
Research School of Information Sciences and
Engineering. The Australian National University
Canberra, Australia, January 2004.
McGurck et John Mcdonald. Hearing lips and seeing
voice. Nature, 264 : 746-748, Decb 1976.
Iain Matthews, J. Andrew Bangham, and Stephen J. Cox.
Audiovisual speech recognition using multiscale
nonlinear image decomposition. Proc . 4th ICSLP,
volume1, page 38-41, Philadelphia, PA, USA, Octob
1996.
Uwe Meier, Rainer Stiefelhagen, Jie Yang et Alex Waibe.
Towards unrestricted lip reading. Proc 2nd
International conference on multimodal Interfaces
(ICMI), Hong-kong, Jan 1999.
Prasad, K., Stork, D., and Wolff, G., “Preprocessing video
images for neural learning of lipreading,” Technical
Report CRC-TR-9326, Ricoh California Research
Center, September 1993.
Rao, R., and Mersereau, R., “On merging hidden Markov
models with deformable templates,” ICIP 95,
Washington D.C., 1995.
Patrice Delmas, Extraction des contours des lèvres d’un
visage parlant par contours actif (Application à la
communication multimodale). Thèse à l’Institut
National de polytechnique de Grenoble, 12-04-2000.
Gerasimos Potamianos, Hans Peter Graft et eric Gosatto.
An Image transform approach For HM based
automatic lipreading. Proc, ICIP, Volume III, pages
173-177, Chicago, IL, USA Octb 1998.
Iain Matthews, J. Andrew Bangham, and Stephen J. Cox.
A comparaison of active shape models and scale
decomposition based features for visual speech
recognition. LNCS, 1407 514-528, 1998.
N.Eveno, “Segmentation des lèvres par un modèle
déformable analytique”, Thèse de doctorat de l’INPG,
Grenoble, Novembre 2003.
N. Eveno, A. Caplier, and P-Y Coulon, “Accurate and
Quasi-Automatic Lip Tracking”, IEEE Transaction on
circuits and video technology, Mai 2004.
Miyawaki T, Ishihashi I, Kishino F. Region separation in
color images using color information. Tech Rep IEICE
1989;IE89-50.
Nakata Y, Ando M. Lipreading Method Using Color
Extraction Method and Eigenspace Technique
Systems and Computers in Japan, Vol. 35, No. 3, 2004
X. Zhang, Russell M. Mersereau, M. Clements and C.
Charles Broun. Visual Speech feature extractionfor
improved speech recognition. In Proc. ICASSP,
Volume II, pages 1993-1996, Orlondo,FL, USA, May
13-17 2002.
S. Werda, W. Mahdi and A. Benhamadou, “A Spatial-
Temporal technique of Viseme Extraction:
Application in Speech Recognition “, SITIS 05, IEEE,
S. Werda, W. Mahdi, M. Tmar and A. Benhamadou,
“ALiFE: Automatic Lip Feature Extraction: A New
Approach for Speech Recognition Application “, the
2nd IEEE International Conference on Information &
Communication Technologies: from Theory to
Applications - ICTTA’06 - Damascus, Syria. 2006.
S. Werda, W. Mahdi, and A. Benhamadou,
“LipLocalization and Viseme Classification for Visual
Speech Recognition”, International Journal of
Computing & Information Sciences. Vol.4, No.1,
October 2006.
N. B. Karayiannis and M. M.Randolph-Gips. Non-
euclidean c-means clustering algorithms. Intelligent
Data Analysis-An International Journal, 7(5):405–425,
2003.
C.M Bishop, Neural Networks for Pattern Recognition,
Oxford: Oxford University Press, 1995.
ICEIS 2007 - International Conference on Enterprise Information Systems
36