UBM/GMM system needs 0.03 second to pro-
cess 1 second of data in the adaptation phase and 0.01
second in the scoring phase. While for the proposed
HMM system, the time needed for the adaptation
to process 1 second of data is 1.2 second. For the
scoring part, the LLR and the concurrent scoring
methods require, respectively, 0.07 second and 0.6
second to process 1 second of data.
5 CONCLUSIONS AND
PERSPECTIVES
In this paper, a data-driven HMM modeling is pro-
posed for text-dependent speaker verification to ex-
ploit the temporal information of speech data. The
data-driven models are trained on raw speech data
to obtain a set of generic HMMs. This set is then
adapted to the target speaker and lexical content of
the pass-phrase. Two systems based on log-likelihood
radio and concurrent scoring are introduced. The sys-
tems are evaluated on Part1 of RSR2015 database.
This evaluation shows that concurrent scoring sys-
tem is more accurate than the one based on the log-
likelihood ratio. Moreover, the results show the rele-
vance of the proposed method when compared with
an UBM/GMM and the HiLAM systems. Future
works will be dedicated on the evaluation of the
proposed system on Part2 and 3 of the RSR2015
database. In addition, the concurrent scoring method
should be accelerated in case of integration on a mo-
bile device.
REFERENCES
Aronowitz, H. (2012). Text dependent speaker verification
using a small development set. In The IEEE Odyssey
Speaker and Language Recognition Workshop.
Bahaghighat, M. K., Sahba, F., and Tehrani, E. (2012).
Text-dependent speaker recognition by combination
of lbg vq and dtw for persian language. International
Journal of Computer Applications, 51(16):23–27.
Baum, L. E., Petrie, T., Soules, G., and Weiss, N. (1970).
A maximization technique occurring in the statistical
analysis of probabilistic functions of markov chains.
The Annals of Mathematical Statistics, 41(1):164–
171.
Boies, D., H´ebert, M., and Heck, L. (2004). Study on the
effect of lexical mismatch in text-dependent speaker
verification. In The IEEE Odyssey Speaker and Lan-
guage Recognition Workshop, pages 1–5.
Chollet, G.,
ˇ
Cernock´y, J., Constantinescu, A., Deligne, S.,
and Bimbot, F. (1999). Towards ALISP: a proposal for
Automatic Language Independent Speech Processing,
pages 375–388. NATO ASI Series. Springer Verlag.
Deligne, S. and Bimbot, F. (1997). Inference of variable-
length linguistic and acoustic units by multigrams.
Speech Communication, 23(3):223–241.
Dutta, T. (2008). Dynamic time warping based approach
to text-dependent speaker identification using spectro-
grams. In Congress on Image and Signal Processing,
volume 2, pages 354–360.
Furui, S. (1981). Cepstral analysis technique for automatic
speaker verification. IEEE Transactions on Acoustics,
Speech and Signal Processing, 29(2):254–272.
Garofolo, J., Lamel, L., Fisher, W., Fiscus, J., Pallett, D.,
Dahlgren, N., and Zue, V. (1993). Timit acoustic-
phonetic continuous speech corpus. In Linguistic Data
Consortium.
Gravier, G. (2003). Speech Signal Processing Toolkit, re-
lease 4.0.
Hannani, A. E. (2007). Text-Independant Speaker Verifica-
tion Based On High-Level Information Extracted With
Data-Driven Methods. PhD thesis, University of Fri-
bourg (Switzerland) and INT/SITEVRY (France).
H´ebert, M. (2008). Text-dependent speaker recognition. In
Springer handbook of speech processing, pages 743–
762. Springer.
Kato, T. and Shimizu, T. (2003). Improved speaker,
verification over the cellular phone network us-
ing phoneme-balanced and digit-sequence-preserving
connected digit patterns. In International Conference
on Acoustics, Speech, and Signal Processing ICASSP,
volume 2, pages 57–60.
Khemiri, H. (2013). Unified data-driven approach for audio
indexing, retrieval and recognition. Theses, T´el´ecom
ParisTech.
Khemiri, H., Petrovska-Delacr´etaz, D., and Chollet, G.
(2014). Alisp-based data compression for generic au-
dio indexing. In Data Compression Conference, pages
273–282.
Larcher, A., Bonastre, J., Fauve, B., Lee, K., L´evy, C.,
Mason, H. L. J., and Parfait, J. (2013). Alize
3.0-open source toolkit for state-of-the-art speaker
recognition. In the Annual Conference of the In-
ternational Speech Communication Association (In-
terpseech), pages 2768–2773.
Larcher, A., Lee, K., Ma, B., and Li, H. (2014). Text-
dependent speaker verification: Classifiers, databases
and RSR2015. Speech Communication, 60:56 – 77.
Linde, Y., Buzo, A., and Gray, R. (1980). An algorithm for
vector quantizer design. IEEE Transactions on Com-
munications, 28(1):84–95.
Martin, A. F. and Greenberg, C. S. (2010). The NIST 2010
speaker recognition evaluation. In the Annual Confer-
ence of the International Speech Communication As-
sociation (Interpseech), pages 2726–2729.
Matsui, T. and Furui, S. (1993). Concatenated phoneme
models for text-variable speaker recognition. In Inter-
national Conference on Acoustics, Speech, and Signal
Processing (ICASSP), volume 2, pages 391–394.
Ramasubramanian, V., Das, A., and Kumar, V. P. (2006).
Text-dependent speaker-recognition using one-pass
ICPRAM 2017 - 6th International Conference on Pattern Recognition Applications and Methods