Automated Segmentation of Folk Songs Using Artificial Neural Networks

Andreas Neocleous; Nicolai Petkov; Christos N. Schizas

doi:10.5220/0005049101440151

Automated Segmentation of Folk Songs Using Artificial Neural Networks

Andreas Neocleous, Nicolai Petkov, Christos N. Schizas

2014

Abstract

Two different systems are introduced, that perform automated audio annotation and segmentation of Cypriot folk songs into meaningful musical information. The first system consists of three artificial neural networks (ANNs) using timbre low-level features. The output of the three networks is classifying an unknown song as “monophonic” or “polyphonic”. The second system employs one ANN using the same feature set. This system takes as input a polyphonic song and it identifies the boundaries of the instrumental and vocal parts. For the classification of the “monophonic – polyphonic”, a precision of 0.88 and a recall of 0.78 has been achieved. For the classification of the “vocal – instrumental” a precision of 0.85 and recall of 0.83 has been achieved. From the obtained results we concluded that the timbre low-level features were able to capture the characteristics of the audio signals. Also, that the specific ANN structures were suitable for the specific classification problem and outperformed classical statistical methods.

References

Bonjyotsna A., Bhuyan M., 2014. Performance Comparison of Neural Networks and GMM for Vocal/Nonvocal segmentation for Singer Identification. International Journal of Engineering and Technology (IJET), Vol. 6, No 2.
Benediktsson J., Swain P, and Ersoy, O., 1990. Neural Network Approaches Versus Statistical Methods in Classification of Multisource Remote Sensing Data. IEEE Transactions on Geoscience and Remote Sensing, Vol. 28, No 4.
Benjamin, K., 1986. Spectral analysis and discrimination by zero-crossings. In Proceedings of the IEEE, pp. 1477-1493.
Fuhrmann, F, Herrera, P, Serra, X., 2009. Detecting Solo Phrases in Music using Spectral and Pitch-related Descriptors. Journal of New Music Research, 2009, pp. 343-356.
Lu, L, Zhang, H, J, Li, S, Z., 2003. Content-based audio classification and segmentation by using support vector machines. In Multimedia Systems.
Mankiewicz, R., 2004. The Story of Mathematics. Princeton University Press, p. 158.
Mermelstein, P., 1976. Distance measures for speech recognition, psychological and instrumental. Pattern Recognition and Artificial Intelligence, pp. 374-388.
Muller, M, Grosche, P, Wiering, F., 2009. Robust segmentation and annotation of folk song recordings. International Society for Music Information Retrieval (ISMIR), pp. 735-740.
Neocleous C.C., Nikolaides K.H., Neokleous K.C., Schizas C.N. 2011, Artificial neural networks to investigate the significance of PAPP-A and b-hCG for the prediction of chromosomal abnormalities. IJCNN - International Joint Conference on Neural Networks, San Jose, USA.
Panagiotakis, C, Tziritas, G., 2004. A Speech/Music Discriminator Based on RMS and Zero-Crossings. In IEEE Transactions on Multimedia.
Pfeiffer, S, Fischer, S, Effelsberg, W., 1996. Automatic Audio Content Analysis. In Proceedings of the fourth ACM international conference on Multimedia, pp. 21- 30.
Roads, C., 1996. The Computer Music Tutorial. MIT Press.
Scheirer, E, Slaney, M., 1997. Construction and evaluation of a robust multi feature speech/music discriminator. In IEEE International Conference On Acoustics, Speech, And Signal Processing, pp. 1331-1334.
Siegel, S., 1956. Non-parametric statistics for the behavioral sciences. New York: McGraw-Hill, pp. 75- 83.
Stephens, M, A., 1974. EDF Statistics for Goodness of Fit and Some Comparisons. Journal of the American Statistical Association (American Statistical Association), pp. 730-737.
Stevens, S, S, Volkman J, Newman, E, B., 1937. A scale for the measurement of the psychological magnitude pitch. Journal of the Acoustical Society of America, pp. 185-190.
Vembu, S, Baumann, S., 2005. Separation of vocals from polyphonic audio recordings. International Society for Music Information Retrieval (ISMIR), pp. 337-334.
Werbos P., 1974. Beyond Regression: New Tools for Prediction and Analysis in the Behavioural Sciences. Ph.D. dissertation Applied Mathematics, Harvard University.

Download

Paper Citation

in Harvard Style

Neocleous A., Petkov N. and N. Schizas C. (2014). Automated Segmentation of Folk Songs Using Artificial Neural Networks . In Proceedings of the International Conference on Neural Computation Theory and Applications - Volume 1: NCTA, (IJCCI 2014) ISBN 978-989-758-054-3, pages 144-151. DOI: 10.5220/0005049101440151

in Bibtex Style

@conference{ncta14,
author={Andreas Neocleous and Nicolai Petkov and Christos N. Schizas},
title={Automated Segmentation of Folk Songs Using Artificial Neural Networks},
booktitle={Proceedings of the International Conference on Neural Computation Theory and Applications - Volume 1: NCTA, (IJCCI 2014)},
year={2014},
pages={144-151},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005049101440151},
isbn={978-989-758-054-3},
}

in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Neural Computation Theory and Applications - Volume 1: NCTA, (IJCCI 2014)
TI - Automated Segmentation of Folk Songs Using Artificial Neural Networks
SN - 978-989-758-054-3
AU - Neocleous A.
AU - Petkov N.
AU - N. Schizas C.
PY - 2014
SP - 144
EP - 151
DO - 10.5220/0005049101440151