Authors:
Tsang-Long Pao
and
Wen-Yuan Liao
Affiliation:
Tatung University, Taiwan
Keyword(s):
Audio-visual database, Audio-visual speech recognition, Hidden Markov model.
Related
Ontology
Subjects/Areas/Topics:
Applications
;
Computer Vision, Visualization and Computer Graphics
;
Feature Extraction
;
Features Extraction
;
Image and Video Analysis
;
Informatics in Control, Automation and Robotics
;
Pattern Recognition
;
Signal Processing, Sensors, Systems Modeling and Control
;
Software Engineering
;
Video Analysis
Abstract:
For past several decades, visual speech signal processing has been an attractive research topic for overcoming certain audio-only recognition problems. In recent years, there have been many automatic speech-reading systems proposed that combine audio and visual speech features. For all such systems, the objective of these audio-visual speech recognizers is to improve recognition accuracy, particularly in the difficult condition. In this paper, we will focus on visual feature extraction for the audio-visual recognition. We create a new audio-visual database which was recorded in two languages, English and Mandarin. The audio-visual recognition consists of two main steps, the feature extraction and recognition.We extract the visual motion feature of the lip using the front end processing. The Hidden Markov model (HMM) is used for the audio-visual speech recognition. We will describe our audio-visual database and use this database in our proposed system, with some preliminary experiments.