BIRTH-DEATH FREQUENCIES VARIANCE OF SINUSOIDAL MODEL - A New Feature for Audio Classification
Shahrokh Ghaemmaghami, Jalil Shirazi
2008
Abstract
In this paper, a new feature set for audio classification is presented and evaluated based on sinusoidal modeling of audio signals. Variance of the birth-death frequencies in sinusoidal model of signal, as a measure of harmony, is used and compared to typical features as the input into an audio classifier. The performance of this sinusoidal model feature is evaluated through classification of audio to speech and music using both the GMM and the SVM classifiers. Classification results show that the proposed feature is quite successful in speech/music classification. Experimental comparisons with popular features for audio classification, such as HZCRR and LSTER, are presented and discussed. By using a set of three features, we achieved 96.83% accuracy, in one-sec segment based audio classification.
References
- Ei-Maleh, K., Klein, M., Petrucci, G., kabal, P. 2000. Speech/music discrimination for multimedia Applications. In Proc ICASSP- 2000, pp. 2445-2448.
- Ajmera, J., McCowan, I., Bourlard, H., 2002. Robust HMM based speech/music segmentation. In Proc ICASSP- 2002, pp. 297-300.
- Saunders, J., 1996. Real-time discrimination of broadcast speech/music. In Proc ICASSP-96, pp. 993-996.
- Scheirer, E., Slaney, M., 1997. Construction and evaluation of a robust multifeature speech/music discriminator. In Proc. ICASSP- 97, pp. 21-24.
- Lu, L., Zhang, H.-J., 2002. Content Analysis for Audio Classification and Segmentation. In IEEE Trans. Speech & Audio Proc., vol. 10, pp. 504 - 516.
- Li, S. Z., 2000. Content-based audio classification and retrieval using the nearest feature line method.In IEEE Trans. Speech & Audio Proc., vol. 8, pp. 619 - 625.
- McAulay, R., Quatieri, T., 1986. Speech analysis/synthesis based on a Sinusoidal representation. In IEEE Trans. Acous., Speech & Sig. Proc., Vol. ASSP-34, No.4, pp. 744-754.
- Smith, J. O., Serra, X., 1987. PARSHL: An analysis/synthesis program for non-harmonic sound based on Sinusoidal representation. In http://wwwccrma.stanford.edu/jos/parshl/parshl.pdf.
- Berenzweig, A. L., Ellis, D. P. W., 2001. Locating singing voice segments within music signals. In Proc IEEE WASPAA, Mohonk NY, pp. 119-122.
- Guo, G., Li, S. Z., 2003. Content-based audio classification and retrieval by support vector machines. In IEEE Trans. Neural Networks Proc., vol. 14, pp. 209-215.
Paper Citation
in Harvard Style
Ghaemmaghami S. and Shirazi J. (2008). BIRTH-DEATH FREQUENCIES VARIANCE OF SINUSOIDAL MODEL - A New Feature for Audio Classification . In Proceedings of the International Conference on Signal Processing and Multimedia Applications - Volume 1: SIGMAP, (ICETE 2008) ISBN 978-989-8111-60-9, pages 139-144. DOI: 10.5220/0001937601390144
in Bibtex Style
@conference{sigmap08,
author={Shahrokh Ghaemmaghami and Jalil Shirazi},
title={BIRTH-DEATH FREQUENCIES VARIANCE OF SINUSOIDAL MODEL - A New Feature for Audio Classification},
booktitle={Proceedings of the International Conference on Signal Processing and Multimedia Applications - Volume 1: SIGMAP, (ICETE 2008)},
year={2008},
pages={139-144},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001937601390144},
isbn={978-989-8111-60-9},
}
in EndNote Style
TY - CONF
JO - Proceedings of the International Conference on Signal Processing and Multimedia Applications - Volume 1: SIGMAP, (ICETE 2008)
TI - BIRTH-DEATH FREQUENCIES VARIANCE OF SINUSOIDAL MODEL - A New Feature for Audio Classification
SN - 978-989-8111-60-9
AU - Ghaemmaghami S.
AU - Shirazi J.
PY - 2008
SP - 139
EP - 144
DO - 10.5220/0001937601390144