Authors:
Joseph Razik
1
;
Sébastien Paris
2
and
Hervé Glotin
1
Affiliations:
1
Université du Sud Toulon-Var, France
;
2
Université Aix-Marseille, France
Keyword(s):
MFCC, GMM, Sparse coding, Large-scale SVM, Explicit feature maps.
Related
Ontology
Subjects/Areas/Topics:
Applications
;
Audio and Speech Processing
;
Digital Signal Processing
;
Learning in Process Automation
;
Multimedia
;
Multimedia Signal Processing
;
Pattern Recognition
;
Software Engineering
;
Telecommunications
Abstract:
We present in this paper a novel approach for the phoneme recognition task that we want to extend to an automatic speech recognition system (ASR). Usual ASR systems are based on a GMM-HMM combination that represents a fully generative approach. Current discriminative methods are not tractable in large scale data set case, especially with non-linear kernel. In our system, we introduce a new scheme using jointly sparse coding and an approximation additive kernel for fast SVM training for phoneme recognition. Thus, on a broadcast news corpus, our system outperforms the use of GMMs by around 2.5% and is computationally linear to the number of samples.