Journal of Speech and Hearing Disorders, 40(4):481–
492.
Ezzat, T. and Poggio, T. (1998). Miketalk: A talking fa-
cial display based on morphing visemes. In Computer
Animation 98. Proceedings, pages 96–102. IEEE.
Fisher, C. G. (1968). Confusions among visually perceived
consonants. Journal of Speech, Language, and Hear-
ing Research, 11(4):796–804.
Franklin, S. B., Gibson, D. J., Robertson, P. A., Pohlmann,
J. T., and Fralish, J. S. (1995). Parallel analysis: a
method for determining significant principal compo-
nents. Journal of Vegetation Science, 6(1):99–106.
Fr
´
enay, B. and Verleysen, M. (2014). Classification in the
presence of label noise: a survey. IEEE transactions
on neural networks and learning systems, 25(5):845–
869.
Goldschen, A. J., Garcia, O. N., and Petajan, E. (1994).
Continuous optical automatic speech recognition by
lipreading. In Signals, Systems and Computers, 1994.
1994 Conference Record of the Twenty-Eighth Asilo-
mar Conference on, volume 1, pages 572–577. IEEE.
Hazen, T. J., Saenko, K., La, C.-H., and Glass, J. R.
(2004). A segment-based audio-visual speech recog-
nizer: Data collection, development, and initial exper-
iments. In Proceedings of the 6th international confer-
ence on Multimodal interfaces, pages 235–242. ACM.
Hilder, S., Harvey, R., and Theobald, B.-J. (2009). Com-
parison of human and machine-based lip-reading. In
AVSP, pages 86–89.
Jeffers, J. and Barley, M. (1980). Speechreading (lipread-
ing). Charles C. Thomas Publisher.
Khoshgoftaar, T. M., Van Hulse, J., and Napolitano, A.
(2011). Comparing boosting and bagging techniques
with noisy and imbalanced data. IEEE Transactions
on Systems, Man, and Cybernetics-Part A: Systems
and Humans, 41(3):552–568.
Lan, Y., Harvey, R., Theobald, B., Ong, E.-J., and Bow-
den, R. (2009). Comparing visual features for lipread-
ing. In International Conference on Auditory-Visual
Speech Processing 2009, pages 102–106.
Llisterri, J. and Mari
˜
no, J. B. (1993). Spanish adaptation of
sampa and automatic phonetic transcription. Reporte
t
´
ecnico del ESPRIT PROJECT, 6819.
Lowe, D. G. (2004). Distinctive image features from scale-
invariant keypoints. International journal of computer
vision, 60(2):91–110.
Luettin, J., Thacker, N. A., and Beet, S. W. (1996). Visual
speech recognition using active shape models and hid-
den markov models. In Acoustics, Speech, and Signal
Processing, 1996. ICASSP-96. Conference Proceed-
ings., 1996 IEEE International Conference on, vol-
ume 2, pages 817–820. IEEE.
McGurk, H. and MacDonald, J. (1976). Hearing lips and
seeing voices. Nature, 264:746–748.
Moll, K. L. and Daniloff, R. G. (1971). Investigation of the
timing of velar movements during speech. The Jour-
nal of the Acoustical Society of America, 50(2B):678–
684.
Nefian, A. V., Liang, L., Pi, X., Xiaoxiang, L., Mao, C.,
and Murphy, K. (2002). A coupled hmm for audio-
visual speech recognition. In Acoustics, Speech, and
Signal Processing (ICASSP), 2002 IEEE International
Conference on, volume 2, pages II–2013. IEEE.
Neti, C., Potamianos, G., Luettin, J., Matthews, I., Glotin,
H., Vergyri, D., Sison, J., and Mashari, A. (2000).
Audio visual speech recognition. Technical report,
IDIAP.
Nettleton, D. F., Orriols-Puig, A., and Fornells, A. (2010).
A study of the effect of different types of noise on the
precision of supervised learning techniques. Artificial
intelligence review, 33(4):275–306.
Ortega, A., Sukno, F., Lleida, E., Frangi, A. F., Miguel, A.,
Buera, L., and Zacur, E. (2004). Av@ car: A spanish
multichannel multimodal corpus for in-vehicle auto-
matic audio-visual speech recognition. In LREC.
Ortiz, I. d. l. R. R. (2008). Lipreading in the prelingually
deaf: what makes a skilled speechreader? The Spanish
journal of psychology, 11(02):488–502.
Pei, Y., Kim, T.-K., and Zha, H. (2013). Unsupervised
random forest manifold alignment for lipreading. In
Proceedings of the IEEE International Conference on
Computer Vision, pages 129–136.
Petrushin, V. A. (2000). Hidden markov models: Funda-
mentals and applications. In Online Symposium for
Electronics Engineer.
Potamianos, G., Neti, C., Gravier, G., Garg, A., and Se-
nior, A. W. (2003). Recent advances in the automatic
recognition of audiovisual speech. Proceedings of the
IEEE, 91(9):1306–1326.
Rabiner, L. R. (1989). A tutorial on hidden markov models
and selected applications in speech recognition. Pro-
ceedings of the IEEE, 77(2):257–286.
Ronquest, R. E., Levi, S. V., and Pisoni, D. B. (2010). Lan-
guage identification from visual-only speech signals.
Attention, Perception, & Psychophysics, 72(6):1601–
1613.
Saenko, K., Livescu, K., Siracusa, M., Wilson, K., Glass,
J., and Darrell, T. (2005). Visual speech recognition
with loosely synchronized feature streams. In Tenth
IEEE International Conference on Computer Vision
(ICCV’05) Volume 1, volume 2, pages 1424–1431.
Sahu, V. and Sharma, M. (2013). Result based analysis
of various lip tracking systems. In Green High Per-
formance Computing (ICGHPC), 2013 IEEE Interna-
tional Conference on, pages 1–7. IEEE.
Seymour, R., Stewart, D., and Ming, J. (2008). Comparison
of image transform-based features for visual speech
recognition in clean and corrupted videos. Journal on
Image and Video Processing, 2008:14.
Sui, C., Bennamoun, M., and Togneri, R. (2015). Listen-
ing with your eyes: Towards a practical visual speech
recognition system using deep boltzmann machines.
In Proceedings of the IEEE International Conference
on Computer Vision, pages 154–162.
Sukno, F. M., Ordas, S., Butakoff, C., Cruz, S., and Frangi,
A. F. (2007). Active shape models with invariant op-
timal features: application to facial analysis. IEEE
Transactions on Pattern Analysis and Machine Intel-
ligence, 29(7):1105–1117.
VISAPP 2017 - International Conference on Computer Vision Theory and Applications
62