speech waveforms are parameterized by a standard
Mel-Frequency Cepstral Coefficient (MFCC) front
end. The cepstral analysis uses a 25 msec Hamming
window with a frame shift of 10 msec. Each input
pattern X
i
consists of the current frame of 12 MFCCs
and energy plus delta and acceleration coefficients,
and two context frames on each side, making a total
of (13 + 13 + 13) * 5 = 195 components. This
formulation was arrived at by experimentation with
varying numbers of context frames left and right of
the frame being classified. The training set has about
1.1 million frames and the test set has about 400
thousand frames. Each frame has an associated 1-of-
60 phonetic label derived from the TIMIT label files.
Due to the large number of training data and
large number of classes, TIMIT data set seems to be
a suitable task to evaluate our proposed classifier. In
Table 3, framewise classification error rates on the
TIMIT test set using our classifier is compared to the
existing methods.
Table 3: Framewise phoneme classification error rate on
TIMIT test set.
Classifier Error Rate
Recurrent Neural Nets (Schuster, 1997)
34.7%
Bidirectional LSTM (Graves, 2005)
29.8%
im-LDA + ad-NN
28.7%
The results show that the proposed classifier
design outperforms previous works in classification
of speech frames on TIMIT task.
5 CONCLUSIONS
In this paper, a novel classifier design based on
combination of an improved version of LDA and an
adaptive distance NN was presented. LDA, as a
preprocessing step, was used to transform input data
to a new feature space in which different classes of
data has lower degrees of overlap. In the
classification step, a novel learning algorithm was
used to assign a weight to each stored instance,
which was then contributed in the distance measure,
with the goal of improvement in generalization
ability of the basic NN. In this way, different weighs
were given to the transformed samples based on a
learning scheme which optimized the weights
according to the classification error rate. Our
proposed method was evaluated by various UCI ML
data sets. Results showed that the proposed method
improves the generalization ability of basic NN. By
using TIMIT speech data set, the effectiveness of
our approach in real problems like speech data
classification was also proved.
REFERENCES
Cover, T.M., Hart, P.E., 1967. Nearest Neighbor Pattern
Classification. IEEE Transaction on Information
Theory 13, 21-27.
Friedman, J., 1994. Flexible metric nearest neighbor
classification. Technical Report 113, Stanford
University Statistics Department.
Hastie, T., Tibshirani, R., 1996. Discriminant adaptive
nearest neighbor classification. IEEE Transactions on
Pattern Analysis and Machine Intelligence, 18: 607-
615.
Domeniconi, C., Peng, J., Gunopulos, D., 2002. Locally
adaptive metric nearest neighbor classification. IEEE
Transactions on Pattern Analysis and Machine
Intelligence, 24: 1281-1285.
Wang, J., Neskovic, P., Cooper, L.N., 2007. Improving
nearest neighbor rule with a simple adaptive distance
measure. Pattern Recogition Letters, 28: 207-213.
Fisher, R.A., 1936. The Use of Multiple Measurements in
Taxonomic Problems, Annals of Eugenics, 7:179-188.
Duda, R.O., Hart, P.E., Stork, D., 2001. Pattern
Classification 2nd Edition. Wiley, New York.
Loog, M., Duin, R.P.W., Haeb-Umbach, R., 2001.
Multiclass linear dimension reduction by weighted
pairwise fisher criteria, IEEE Transactions on Pattern
Analysis and Machine Intelligence, 23: 762-766.
Jarchi, D., Boostani, R., 2006. A New Weighted LDA
Method in Comparison to Some Versions of LDA,
Transaction on Engineering and Computational
Technology, 18: 18-45.
Garofolo, J.S., 1988. Getting started with the DARPA
TIMIT CD-ROM: An acoustic phonetic continuous
speech database, National Institute of Standards and
Technology (NIST), Gaithersburgh, MD.
Merz, C.J., Murphy, P.M., 1996. UCIRepository of
Machine Learning Databases. Irvine, CA: University
of California Irvine, Department of information and
Computer Science. Internet:
http://www.ics.uci.edu/~mlearn/MLRepository.html
Schuster, M., Paliwal, K.K., 1997. Bidirectional recurrent
neural networks. IEEE Transactions on Signal
Processing, 45: 2673-2681.
Graves, A., Schmidhuber, J., 2005. Framewise Phoneme
Classification with Bidirectional LSTM and Other
Neural Network Architectures. International Joint
Conference on Neural Networks.
AN ADAPTIVE CLASSIFIER DESIGN FOR ACCURATE SPEECH DATA CLASSIFICATION
71