Language-Aware and Language-Agnostic Multilingual Speech Recognition with a Single Model

Karol Nowakowski, Michal Ptaszynski

2025

Abstract

In recent years, there has been increasing interest in multilingual speech recognition systems, where a single model can transcribe speech in multiple languages. Additional benefit of multilingual learning is that it allows for cross-lingual transfer, often leading to better performance, especially in low-resource languages. On the other hand, multilingual models suffer from errors caused by confusion between languages. This problem can be mitigated by providing the information about language identity as an additional input to the model. In this research, we carry out experiments using a modern state-of-the-art ASR system architecture based on a pretrained multilingual wav2vec 2.0 model and adapter modules trained for the downstream task, and confirm that multilingual supervised learning with language identifiers is a viable method for improving the system’s overall performance. Furthermore, we find that training with language identifiers still yields a model with better average performance than the model trained without such information, even if language identity is unknown at inference time.

Download


Paper Citation


in Harvard Style

Nowakowski K. and Ptaszynski M. (2025). Language-Aware and Language-Agnostic Multilingual Speech Recognition with a Single Model. In Proceedings of the 14th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM; ISBN 978-989-758-730-6, SciTePress, pages 808-813. DOI: 10.5220/0013319500003905


in Bibtex Style

@conference{icpram25,
author={Karol Nowakowski and Michal Ptaszynski},
title={Language-Aware and Language-Agnostic Multilingual Speech Recognition with a Single Model},
booktitle={Proceedings of the 14th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM},
year={2025},
pages={808-813},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013319500003905},
isbn={978-989-758-730-6},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 14th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM
TI - Language-Aware and Language-Agnostic Multilingual Speech Recognition with a Single Model
SN - 978-989-758-730-6
AU - Nowakowski K.
AU - Ptaszynski M.
PY - 2025
SP - 808
EP - 813
DO - 10.5220/0013319500003905
PB - SciTePress