loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Paper Unlock

Authors: Adriana Fernandez-Lopez and Federico M. Sukno

Affiliation: Pompeu Fabra University, Spain

Keyword(s): Lip-reading, Speech Recognition, Visemes, Confusion Matrix.

Abstract: Speech is the most common communication method between humans and involves the perception of both auditory and visual channels. Automatic speech recognition focuses on interpreting the audio signals, but it has been demonstrated that video can provide information that is complementary to the audio. Thus, the study of automatic lip-reading is important and is still an open problem. One of the key challenges is the definition of the visual elementary units (the visemes) and their vocabulary. Many researchers have analyzed the importance of the phoneme to viseme mapping and have proposed viseme vocabularies with lengths between 11 and 15 visemes. These viseme vocabularies have usually been manually defined by their linguistic properties and in some cases using decision trees or clustering techniques. In this work, we focus on the automatic construction of an optimal viseme vocabulary based on the association of phonemes with similar appearance. To this end, we construct an automatic sys tem that uses local appearance descriptors to extract the main characteristics of the mouth region and HMMs to model the statistic relations of both viseme and phoneme sequences. To compare the performance of the system different descriptors (PCA, DCT and SIFT) are analyzed. We test our system in a Spanish corpus of continuous speech. Our results indicate that we are able to recognize approximately 58% of the visemes, 47% of the phonemes and 23% of the words in a continuous speech scenario and that the optimal viseme vocabulary for Spanish is composed by 20 visemes. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 18.119.111.9

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Fernandez-Lopez, A. and M. Sukno, F. (2017). Automatic Viseme Vocabulary Construction to Enhance Continuous Lip-reading. In Proceedings of the 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2017) - Volume 5: VISAPP; ISBN 978-989-758-226-4; ISSN 2184-4321, SciTePress, pages 52-63. DOI: 10.5220/0006102100520063

@conference{visapp17,
author={Adriana Fernandez{-}Lopez. and Federico {M. Sukno}.},
title={Automatic Viseme Vocabulary Construction to Enhance Continuous Lip-reading},
booktitle={Proceedings of the 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2017) - Volume 5: VISAPP},
year={2017},
pages={52-63},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006102100520063},
isbn={978-989-758-226-4},
issn={2184-4321},
}

TY - CONF

JO - Proceedings of the 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2017) - Volume 5: VISAPP
TI - Automatic Viseme Vocabulary Construction to Enhance Continuous Lip-reading
SN - 978-989-758-226-4
IS - 2184-4321
AU - Fernandez-Lopez, A.
AU - M. Sukno, F.
PY - 2017
SP - 52
EP - 63
DO - 10.5220/0006102100520063
PB - SciTePress