60
65
70
75
80
85
90
95
100
1 2 3 5 7 10 20 30 50 70 100 120 150 200
Sensitivity [%]
Sequence Length [frames]
dim12 (DB1)
MFCC (DB1)
(a)
60
65
70
75
80
85
90
95
100
1 2 3 5 7 10 20 30 50 70 100 120 150 200
Sensitivity [%]
Sequence Length [frames]
dim12 (DB2)
MFCC (DB2)
(b)
60
65
70
75
80
85
90
95
100
1 2 3 5 7 10 20 30 50 70 100 120 150 200
Sensitivity [%]
Sequence Length [frames]
dim12 (DB3)
MFCC (DB3)
(c)
Figure 4: Overall sensitivity of MFCC features and trun-
cated DKLT for (a) DB1, (b) DB2, and (c) DB3 databases.
5 CONCLUSION
In this paper we have proposed a new speaker identi-
fication approach based on truncated DKLT represen-
tation, that behaves better than conventional MFCC-
based methods. This is motivated by the fact that al-
though MFCCs have demonstrated particularly suit-
able for speech recognition, they present some draw-
backs for speaker recognition.
Several experimental results show that with short
sequences of speech frames, that is with utterance du-
ration of less than 1 s, the performance of truncated
DKLT are always better than MFCC.
REFERENCES
Bhardwaj, S., Srivastava, S., Hanmandlu, M., and Gupta, J.
R. P. (2013). GFM-based methods for speaker identi-
fication. IEEE Trans. Cybernetics, 43(3):1047–1058.
Bimbot, F. et al. (2004). A tutorial on text-independent
speaker verification. EURASIP Journal on Applied
Signal Processing, 2004:430–451.
Campbell, J. P., J. (1997). Speaker recognition: A tutorial.
Proceedings of the IEEE, 85(9):1437–1462.
Figueiredo, M. A. F. and Jain, A. K. (2002). Unsupervised
learning of finite mixture models. IEEE Trans. Pattern
Analysis and Machine Intelligence, 24(3):381–396.
Fukunaga, K. (1990). Introduction to statistical pattern
recognition. Academic Press.
Gish, H. and Schmidt, M. (1994). Text-independent speaker
identification. IEEE Signal Processing Magazine,
11(4):18–32.
Jain, A. K., Duin, R. P. W., and Mao, J. (2000). Statistical
pattern recognition: A review. IEEE Trans. Pattern
Analysis and Machine Intelligence, 22(1):4–37.
Jain, A. K., Ross, A., and Prabhakar, S. (2004). An intro-
duction to biometric recognition. IEEE Trans. Circuits
and Systems for Video Technology, 14(1):4–20.
Kinnunen, T. and Li, H. (2010). An overview of text-
independent speaker recognition: From features to su-
pervectors. Speech Communication, 52(1):12 – 40.
Maina, C. W. and Walsh, J. M. (2011). Joint speech en-
hancement and speaker identification using approxi-
mate Bayesian inference. IEEE Trans. Audio, Speech,
and Language Processing, 19(6):1517–1529.
McLaughlin, N., Ming, J., and Crookes, D. (2013). Ro-
bust multimodal person identification with limited
training data. IEEE Trans. Human-Machine Systems,
43(2):214–224.
Patra, S. and Acharya, S. K. (2011). Dimension reduction of
feature vectors using WPCA for robust speaker iden-
tification system. In 2011 Int. Conf. Recent Trends in
Information Technology (ICRTIT), pages 28–32.
Reynolds, D. A. (2002). An overview of automatic speaker
recognition technology. In 2002 IEEE Int. Conf.
Acoustics, Speech, and Signal Processing (ICASSP),
volume 4, pages IV–4072–IV–4075.
Reynolds, D. A. and Rose, R. (1995). Robust text-
independent speaker identification using Gaussian
mixture speaker models. IEEE Trans. Speech and Au-
dio Processing, 3(1):72–83.
Sadjadi, S. O. and Hansen, J. H. L. (2014). Blind spec-
tral weighting for robust speaker identification under
reverberation mismatch. IEEE/ACM Trans. Audio,
Speech, and Language Processing, 22(5):937–945.
Therrien, C. W. (1992). Discrete Random Signals and Sta-
tistical Signal Processing. Prentice Hall PTR, Upper
Saddle River, NJ, USA.
Togneri, R. and Pullella, D. (2011). An overview of speaker
identification: Accuracy and robustness issues. IEEE
Circuits and Systems Magazine, 11(2):23–61.
Zhao, X., Shao, Y., and Wang, D. (2012). CASA-based
robust speaker identification. IEEE Trans. Audio,
Speech, and Language Processing, 20(5):1608–1616.
Zhao, X., Wang, Y., and Wang, D. (2014). Robust
speaker identification in noisy and reverberant con-
ditions. IEEE/ACM Trans. Audio, Speech, and Lan-
guage Processing, 22(4):836–845.
Zilca, R. D., Kingsbury, B., Navratil, J., and Ramaswamy,
G. N. (2006). Pseudo pitch synchronous analysis of
speech with applications to speaker recognition. IEEE
Trans. Audio, Speech, Lang. Process., 14(2):467–478.
SpeakerIdentificationwithShortSequencesofSpeechFrames
185