loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Paper Unlock

Authors: Luca Cappelletta and Naomi Harte

Affiliation: Trinity College Dublin, Ireland

Keyword(s): AVSR, Viseme, PCA, DCT, Optical flow.

Related Ontology Subjects/Areas/Topics: Applications ; Artificial Intelligence ; Audio and Speech Processing ; Cardiovascular Imaging and Cardiography ; Cardiovascular Technologies ; Digital Signal Processing ; Health Engineering and Technology Applications ; Knowledge Engineering and Ontology Development ; Knowledge-Based Systems ; Multimedia ; Multimedia Signal Processing ; Natural Language Processing ; Pattern Recognition ; Signal Processing ; Software Engineering ; Symbolic Systems ; Telecommunications

Abstract: Phonemes are the standard modelling unit in HMM-based continuous speech recognition systems. Visemes are the equivalent unit in the visual domain, but there is less agreement on precisely what visemes are, or how many to model on the visual side in audio-visual speech recognition systems. This paper compares the use of 5 viseme maps in a continuous speech recognition task. The focus of the study is visual-only recognition to examine the choice of viseme map. All the maps are based on the phoneme-to-viseme approach, created either using a linguistic method or a data driven method. DCT, PCA and optical flow are used to derive the visual features. The best visual-only recognition on the VidTIMIT database is achieved using a linguistically motivated viseme set. These initial experiments demonstrate that the choice of visual unit requires more careful attention in audio-visual speech recognition system development.

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 3.236.214.123

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Cappelletta, L. and Harte, N. (2012). PHONEME-TO-VISEME MAPPING FOR VISUAL SPEECH RECOGNITION. In Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM; ISBN 978-989-8425-99-7; ISSN 2184-4313, SciTePress, pages 322-329. DOI: 10.5220/0003731903220329

@conference{icpram12,
author={Luca Cappelletta. and Naomi Harte.},
title={PHONEME-TO-VISEME MAPPING FOR VISUAL SPEECH RECOGNITION},
booktitle={Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM},
year={2012},
pages={322-329},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003731903220329},
isbn={978-989-8425-99-7},
issn={2184-4313},
}

TY - CONF

JO - Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM
TI - PHONEME-TO-VISEME MAPPING FOR VISUAL SPEECH RECOGNITION
SN - 978-989-8425-99-7
IS - 2184-4313
AU - Cappelletta, L.
AU - Harte, N.
PY - 2012
SP - 322
EP - 329
DO - 10.5220/0003731903220329
PB - SciTePress