
SPEAKER VERIFICATION SYSTEM
Based on the stochastic modeling
Valiantsin Rakush, Rauf Kh. Sadykhov
Byelorusian State University of Informatics and Radioelectronics, 6, P. Brovka str., Minsk, Belarus
Keywords: Speaker verification, vector qu
antization, Gaussian mixture models
Abstract: In this paper we propose a new speaker verification s
ystem where the new training and classification
algorithms for vector quantization and Gaussian mixture models are introduced. The vector quantizer is
used to model sub-word speech components. The code books are created for both training and test
utterances. We propose new approaches to normalize distortion of the training and test code books. The test
code book quantized over the training code book. The normalization technique includes assigning the equal
distortion for training and test code books, distortion normalization and cluster weights. Also the LBG and
K-means algorithms usually employed for vector quantization are implemented to train Gaussian mixture
models. And finally, we use the information provided by two different models to increase verification
performance. The performance of the proposed system has been tested on the Speaker Recognition
database, which consists of telephone speech from 8 participants. The additional experiments has been
performed on the subset of the NIST 1996 Speaker Recognition database which include .
1 INTRODUCTION
The speaker verification systems so far has been
based on the different methods. There is a category of
the algorithms that are using back-end models to
facilitate the speaker traits extraction (Roberts and
Wilmore, 1999) (Burton, 1987) (Pelecanos, 2000)
(Homayounpour and Challet, 1995). The neural
networks, vector quantization (VQ), and Gaussian
mixture models (GMM) are constructed directly or
indirectly for subword or subspeech units modeling.
Those units can be compared to make a verification
decision. Also there is a class of the speaker
verification systems that employ long term statistics
computation over the speech phrase (Zilca, 2001)
(Moonsar and Venayagamorthy, 2001). In some
systems authors use a combination of the methods to
improve system performance. The methods can be
combined in two ways. First way is to use one model
to improve performance of another one (Hsu, 2003)
(Singh et Al., 2003) (Sadykhov and Rakush, 2003).
Second way is to use recognition decision from both
models to perform a data fusion to calculate a final
score (Farrell et Al, 1998) (Farrell et AL., 1997). The
data fusion methods can be interpreted using
normalization and/or Bayessian approach.
Units comparison requires normalization to be
ap
plied. In case of VQ models the test and the
reference codebooks have different structure,
different distortion as well as units of measure for
distortion. To compare two codebooks, which were
created on different phrases, we need to normalize
distortions and their units of measure. In the (Rakush
and Sadykho, 1999) authors proposed to create
reference and test codebooks with equal distortion.
Here we investigate two additional approaches that
transform distortions so they can be compared.
The GMM model has the problem with
p
arameters initialization. We propose to solve that
problem using VQ codebook or applying LBG
algorithm to split Gaussian mixture model starting
from the single component. Also we use VQ
codebook for GMM parameters initialization.
This paper is organized as follows. The following
sect
ion describes modelling approach using VQ and
GMM models. We will propose new algorithms
combining VQ and GMM. Then we will discuss
several techniques for data normalization and fusion,
and will describe the structure of the experimental
system, speech corpus and performance measures.
Finally, we will show our experimental results, that
will be followed by summary and conclusions.
183
Rakush V. and Sadykhov R. (2004).
SPEAKER VERIFICATION SYSTEM - Based on the stochastic modeling.
In Proceedings of the First International Conference on Informatics in Control, Automation and Robotics, pages 183-189
DOI: 10.5220/0001132901830189
Copyright
c
SciTePress