REFERENCES
Arous, N. and Ellouze, N. (2003). Cooperative supervised
and unsupervised learning algorithm for phoneme
recognition in continuous speech and speaker-
independent context. Neurocomputing, 51:225–235.
Coates, A., Lee, H., and Ng, A. Y. (2011). An analysis of
single-layer networks in unsupervised feature learn-
ing. In Artificial Intelligence and Statistics (AISTATS),
page 9.
Davis, S. and Mermelstein, P. (1980). Comparison of para-
metric representations for monosyllabic word recog-
nition in continuously spoken sentences. IEEE Trans.
ASSP, 28:357–366.
Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., and
Lin, C.-J. (2008). LIBLINEAR: A library for large
linear classification. Journal of Machine Learning Re-
search, 9:1871–1874.
Galliano, S., Geoffrois, E., Gravier, G., Bonastre, J.,
Mostefa, D., and Choukri, K. (2006). Corpus descrip-
tion of the ester evaluation campaign for the rich tran-
scription of french broadcast news. In LREC, pages
315–320.
Gravier, G., Bonastre, J., Galliano, S., and Geoffrois, E.
(2004). The ester evaluation compaign of rich tran-
scription of french broadcast news. In LREC.
Hammer, B., Strickert, M., and Villmann, T. (2004). Rel-
evance lvq versus svm. In Artificial Intelligence and
Softcomputing, springer lecture notes in artificial in-
telligence, volume 3070, pages 592–597. Springer.
Hsieh, C.-J., Chang, K.-W., Lin, C.-J., and Keerthi, S. S.
(2008). A dual coordinate descent method for large-
scale linear svm.
Huang, X., Acero, A., and Hon, H. (2001). Spoken Lan-
guage Processing: A Guide to Theory, Algorithm and
System Development. Prentice Hall.
Hyv¨arinen, A. and Oja, E. (2000). Independent component
analysis: algorithms and applications. Neural Netw.,
13:411–430.
Illina, I., Fohr, D., Mella, O., and Cerisara, C. (2004). The
automatic news transcription system : Ants, some real
time experiments. In ICSLP, pages 377–380.
Joachims, T., Finley, T., and Yu, C.-N. (2009). Cutting-
plane training of structural svms. Machine learning,
77(1):27–59.
Lin, C.-J., Weng, R. C., and Keerthi, S. S. (2008). Trust
region newton method for logistic regression. J. Mach.
Learn. Res., 9.
Mairal, J., Bach, F., Ponce, J., and Sapiro, G. (2009). On-
line dictionary learning for sparse coding. In Proceed-
ings of the 26th Annual International Conference on
Machine Learning, ICML ’09, pages 689–696, New
York, NY, USA. ACM.
Mairal, J., Bach, F., Ponce, J., and Sapiro, G. (2010). On-
line learning for matrix factorization and sparse cod-
ing. Journal of Machine Learning Research, 11:19–
60.
Mairal, J., Bach, F., Ponce, J., Sapiro, G., and Zisserman,
A. (2008). Supervised dictionary learning. Advances
Neural Information Processing Systems, pages 1033–
1040.
Maji, S., Berg, A. C., and Malik, J. (2009). Classification
using intersection kernel support vector machines is
efficient. In CVPR.
Paris, S. (2011). Scenes/objects classification toolbox.
http://www.mathworks.com/matlabcentral/fileexchan
ge/29800-scenesobjects-classification-toolbox.
Rabiner, L. and Juang, B. (1993). Fundamentals of Speech
Recognition. Prentice Hall PTR.
Ranzato, M., Krizhevsky, A., and Hinton, G. (2010). Fac-
tored 3-way restricted boltzmann machines for mod-
eling natural images. In International Conference on
Artificial Intelligence and Statistics AISTATS.
Razik, J., Mella, O., Fohr, D., and Haton, J.-P. (2011).
Frame-synchronous and local confidence measures
for automatic speech recognition. IJPRAI, 25(2):157–
182.
Rudin, C., Schapire, R. E., and Daubechies, I. (2007). Anal-
ysis of boosting algorithms using the smooth margin
function. The Annals of Statistics, 35(6):2723–2768.
Sch¨olkopf, B., Platt, J. C., Shawe-Taylor, J. C., Smola, A. J.,
and Williamson, R. C. (2001). Estimating the support
of a high-dimensional distribution. Neural Comput.,
13:1443–1471.
Shalev-Shwartz, S., Singer, Y., Srebro, N., and Cotter, A.
(2007). Pegasos: Primal estimated sub-gradient solver
for svm.
Sivaram, G., Nemala, S., M. Elhilali, T. T., and Hermansky,
H. (2010). Sparse coding for speech recognition. In
ICASSP, pages 4346–4349.
Smit, W. J. and Barnard, E. (2009). Continuous speech
recognition with sparse coding. Computer Speech and
Language, 23:200–219.
Tan, M., Wang, L., and Tsang, I. W. (2010). Learning sparse
svm for feature selection on very high dimensional
datasets. In ICML, page 8.
Tomi Kinnunen, H. L. (2010). An overview of text-
independent speaker recognition: From features to su-
pervectors. Speech Communication, 52:12–40.
Vapnik, V. N. (1998). Statistical Learning Theory. Wiley-
Intersciences.
Vedaldi, A. and Zisserman, A. (2011). Efficient additive
kernels via explicit feature maps. IEEE PAMI.
Wang, J., Yang, J., Kai Yu, F. L., Huang, T., and Gong, Y.
(2010). Locality-constrained linear coding for image
classification. CVPR’10.
Yang, J., Yu, K., Gong, Y., and Huang, T. S. (2009). Lin-
ear spatial pyramid matching using sparse coding for
image classification. In CVPR.
Young, S., Evermann, G., Kershaw, D., Moore, G., Odell,
J., Ollason, D., Valtchev, V., and Woodland, P. (1995).
The HTK Book. Entropic Ltd., Cambridge, England.
Yu, K., Lin, Y., and Lafferty, J. (2011). Learning image
representations from the pixel level via hierarchical
sparse coding. In CVPR, pages 1713–1720.
BROADCAST NEWS PHONEME RECOGNITION BY SPARSE CODING
197