Authors:
Dijana Petrovska-Delacrétaz
1
and
Houssemeddine Khemiri
2
Affiliations:
1
Télécom SudParis, SAMOVAR CNRS and Université Paris-Saclay, France
;
2
PW Consultants, France
Keyword(s):
Unsupervised Data-driven Modeling, Hidden Markov Models, Text-dependent Speaker Verification, Concurrent Scoring.
Related
Ontology
Subjects/Areas/Topics:
Applications
;
Biomedical Engineering
;
Biomedical Signal Processing
;
Biometrics
;
Biometrics and Pattern Recognition
;
Classification
;
Computer Vision, Visualization and Computer Graphics
;
Image and Video Analysis
;
Multimedia
;
Multimedia Signal Processing
;
Pattern Recognition
;
Software Engineering
;
Telecommunications
;
Theory and Methods
;
Video Analysis
Abstract:
We present a text-dependent speaker verification system based on unsupervised data-driven Hidden Markov
Models (HMMs) in order to take into account the temporal information of speech data. The originality of
our proposal is to train unsupervised HMMs with only raw speech without transcriptions, that provide pseudo
phonetic segmentation of speech data. The proposed text-dependent system is composed of the following
steps. First, generic unsupervised HMMs are trained. Then the enrollment speech data for each target speaker
is segmented with the generic models, and further processing is done in order to obtain speaker and text
adapted HMMs, that will represent each speaker. During the test phase, in order to verify the claimed identity
of the speaker, the test speech is segmented with the generic and the speaker dependent HMMs. Finally, two
approaches based on log-likelihood ratio and concurrent scoring are proposed to compute the score between
the test utterance and the speaker’s model
. The system is evaluated on Part1 of the RSR2015 database with
Equal Error Rate (EER) on the development set, and Half Total Error Rate (HTER) on the evaluation set. An
average EER of 1.29% is achieved on the development set, while for the evaluation part the average HTER is
equal to 1.32%.
(More)