Efficient Multi-angle Audio-visual Speech Recognition using Parallel WaveGAN based Scene Classifier

Shinnosuke Isobe; Satoshi Tamura; Yuuto Gotoh; Masaki Nose

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

Efficient Multi-angle Audio-visual Speech Recognition using Parallel WaveGAN based Scene Classifier

Topics: Audio and Speech Analysis; Classification and Clustering; Deep Learning and Neural Networks; Image and Video Analysis and Understanding; Signal Processing

In Proceedings of the 11th International Conference on Pattern Recognition Applications and Methods ICPRAM - Volume 1, 449-460, 2022

Authors: Shinnosuke Isobe ¹ ; Satoshi Tamura ² ; Yuuto Gotoh ³ and Masaki Nose ³

Affiliations: ¹ Graduate School of Natural Science and Technology, Gifu University, Gifu, Japan ; ² Faculty of Engineering, Gifu University, Gifu, Japan ; ³ Ricoh Company, Ltd., Kanagawa, Japan

Keyword(s): Scene Classification, Audio-visual Speech Recognition, Multi-angle Lipreading, Anomaly Detection, Neural Vocoder.

Abstract: Recently, Audio-Visual Speech Recognition (AVSR), one of robust Automatic Speech Recognition (ASR) methods against acoustic noise, has been widely researched. AVSR combines ASR and Visual Speech Recognition (VSR). Considering real applications, we need to develop VSR that can accept frontal and non-frontal face images, and reduce computational time for image processing. In this paper, we propose an efficient multi-angle AVSR method using a Parallel-WaveGAN-based scene classifier. The classifier estimates whether given speech data were recorded in clean or noisy environments. Multi-angle AVSR is conducted if our scene classification detected noisy environments to enhance the recognition accuracy, whereas only ASR is performed if the classifier predicts clean speech data to avoid the increase of processing time. We evaluated our framework using two multi-angle audio-visual database: an English corpus OuluVS2 having 5 views and a Japanese phrase corpus GAMVA consisting of 12 views. Expe rimental results show that the scene classifier worked well, and using multi-angle AVSR achieved higher recognition accuracy than ASR. In addition, our approach could save processing time by switching recognizers according to noise condition. (More)

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 3.147.72.34

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Isobe, S.; Tamura, S.; Gotoh, Y. and Nose, M. (2022). Efficient Multi-angle Audio-visual Speech Recognition using Parallel WaveGAN based Scene Classifier. In Proceedings of the 11th International Conference on Pattern Recognition Applications and Methods - ICPRAM; ISBN 978-989-758-549-4; ISSN 2184-4313, SciTePress, pages 449-460. DOI: 10.5220/0010846000003122

@conference{icpram22,
author={Shinnosuke Isobe. and Satoshi Tamura. and Yuuto Gotoh. and Masaki Nose.},
title={Efficient Multi-angle Audio-visual Speech Recognition using Parallel WaveGAN based Scene Classifier},
booktitle={Proceedings of the 11th International Conference on Pattern Recognition Applications and Methods - ICPRAM},
year={2022},
pages={449-460},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010846000003122},
isbn={978-989-758-549-4},
issn={2184-4313},
}

TY - CONF

JO - Proceedings of the 11th International Conference on Pattern Recognition Applications and Methods - ICPRAM
TI - Efficient Multi-angle Audio-visual Speech Recognition using Parallel WaveGAN based Scene Classifier
SN - 978-989-758-549-4
IS - 2184-4313
AU - Isobe, S.
AU - Tamura, S.
AU - Gotoh, Y.
AU - Nose, M.
PY - 2022
SP - 449
EP - 460
DO - 10.5220/0010846000003122
PB - SciTePress