SPEAKER RECOGNITION USING DECISION FUSION
M. Chenafa, D. Istrate
RMSE, ESIGETEL, 1 Rue du Port de Valvins, 77215 Avon-Fontainebleau, France
V. Vrabie, M. Herbin
CReSTIC, Universit
´
e de Reims Champagne-Ardenne, Chauss
´
ee du Port, 51000 Ch
ˆ
alons-en-Champagne, France
Keywords:
Biometrics, Speaker recognition, Speech recognition, Decision fusion, GMM/UBM.
Abstract:
Biometrics systems have gained in popularity for the automatic identification of persons. The use of the voice
as a biometric characteristic offers advantages such as: is well accepted, it works with regular microphones, the
hardware costs are reduced, etc. However, the performance of a voice-based biometric system easily degrades
in the presence of a mismatch between training and testing conditions due to different factors. This paper
presents a new speaker recognition system based on decision fusion. The fusion is based on two identification
systems: a speaker identification system (text-independent) and a keywords identification system (speaker-
independent). These systems calculate the likelihood ratios between the model of a test signal and the different
models of the database. The fusion uses these results to identify the couple speaker/password corresponding
to the test signal. A verification system is then applied on a second test signal in order to confirm or infirm the
identification. The fusion step improves the false rejection rate (FRR) from 21,43% to 7, 14% but increase also
the false acceptation rate (FAR) from 21, 43% to 28,57%. The verification step makes however a significant
improvement on the FAR (from 28, 57% to 14.28%) while it keeps constant the FRR (to 7,14%).
1 INTRODUCTION
Biometric recognition systems, which identify a per-
son on his/her physical or behavioral characteristics
(voice, fingerprints, face, iris, etc.), have gained in
popularity among researchers in signal processing
during recent years. Biometric systems are also use-
ful in forensic work (where the task is whether a given
biometric sample belongs to a given suspect) and law
enforcement applications (Atkins, 2001). The use of
the voice as a biometric characteristic offers the ad-
vantage to be well accepted by users whatever his cul-
ture. There are two categories in voice-based biomet-
ric systems: speaker verification and speaker identifi-
cation. In identification systems, an unknown speaker
is compared to the N known speakers stored in the
database and the best matching speaker is returned
as the recognition decision. Whereas in verification
systems, an identity is claimed by a speaker, so the
system compares the voice sample to the claimed
speaker’s voice template. If the similarity exceeds a
predefined threshold, the speaker is accepted, other-
wise is rejected. For each system two methods can be
distinguished: text-dependent and text-independent.
In the first case, the text pronounced by the speaker is
known beforehand by the system, while in the second
case the system does not have any information on the
pronounced text (Kinnunen, 2003).
It is well known that the performances of voice-
based biometric systems easily degrade in the pres-
ence of a mismatch between the training and testing
conditions (channel distortions, ambient noise, etc.).
One method that can be used to improve the perfor-
mances of these systems is to merge various infor-
mation carried by the speech signal. Several studies
on information fusion were led to improve the per-
formances of automatic speakers recognition system
(Higgins et al., 2001)(Mami, 2003)(Kinnunen et al.,
2004). However, the results are less successful com-
pared to biometric systems based on other modalities
(fingerprint, iris, face, etc).
In this paper a new fusion approach is proposed
by using two kinds of information contained in the
speech signal: the speaker (who spoke ?) and the key-
word pronounced (what was said ?). The aim of this
method is to use a first test signal to identify a couple
speaker/password corresponding to this signal. This
step is done by combining two identification systems
based on likelihood ratio approach: a speaker identi-
fication system (text-independent) and a speech iden-
267
Chenafa M., Istrate D., Vrabie V. and Herbin M. (2008).
SPEAKER RECOGNITION USING DECISION FUSION.
In Proceedings of the First International Conference on Bio-inspired Systems and Signal Processing, pages 267-272
DOI: 10.5220/0001065502670272
Copyright
c
SciTePress