Speaker Verification Enhancement via Speaking Rate Dynamics in Persian Speechprints

Nina Hosseini-Kivanani; Homa Asadi; Christoph Schommer

doi:10.5220/0013189100003905

Speaker Verification Enhancement via Speaking Rate Dynamics in Persian Speechprints

Nina Hosseini-Kivanani, Homa Asadi, Christoph Schommer

2025

Abstract

This paper investigates the impact of speaking rate variation on speaker verification using a hybrid feature approach that combines Mel-Frequency Cepstral Coefficients (MFCCs), their dynamic derivatives (delta and delta-delta), and vowel formants. To enhance system robustness, we also applied data augmentation techniques such as time-stretching, pitch-shifting, and noise addition. The dataset comprises recordings of Persian speakers at three distinct speaking rates: slow, normal, and fast. Our results show that the combined model integrating MFCCs, delta-delta features, and formant frequencies significantly outperforms individual feature sets, achieving an accuracy of 75% with augmentation, compared to 70% without augmentation. This highlights the benefit of leveraging both spectral and temporal features for speaker verification under varying speaking conditions. Furthermore, data augmentation improved the generalization of all models, particularly for the combined feature set, where precision, recall, and F1-score metrics showed substantial gains. These findings underscore the importance of feature fusion and augmentation in developing robust speaker verification systems. Our study contributes to advancing speaker identification methodologies, particularly in real-world applications where variability in speaking rate and environmental conditions presents a challenge.

Download

Paper Citation

in Harvard Style

Hosseini-Kivanani N., Asadi H. and Schommer C. (2025). Speaker Verification Enhancement via Speaking Rate Dynamics in Persian Speechprints. In Proceedings of the 14th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM; ISBN 978-989-758-730-6, SciTePress, pages 665-672. DOI: 10.5220/0013189100003905

in Bibtex Style

@conference{icpram25,
author={Nina Hosseini-Kivanani and Homa Asadi and Christoph Schommer},
title={Speaker Verification Enhancement via Speaking Rate Dynamics in Persian Speechprints},
booktitle={Proceedings of the 14th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM},
year={2025},
pages={665-672},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013189100003905},
isbn={978-989-758-730-6},
}

in EndNote Style

TY - CONF

JO - Proceedings of the 14th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM
TI - Speaker Verification Enhancement via Speaking Rate Dynamics in Persian Speechprints
SN - 978-989-758-730-6
AU - Hosseini-Kivanani N.
AU - Asadi H.
AU - Schommer C.
PY - 2025
SP - 665
EP - 672
DO - 10.5220/0013189100003905
PB - SciTePress