the literature). Thus we study four dıfferent scenarios
ın total (see Table 1) and compare the performance of
these scenarıos.
Table 1: Scenarios based on usage of songs.
Scenario
Acoustic
Analysis
Classification
1 Full Song Full Song
2 Full Song Part of a Song
3 Part of a Song Full Song
4 Part of a Song Part of a Song
The acoustic analysis in Acoustic Brainz is
conducted for both full versions of songs and 30-60
seconds intervals at the beginning. The aim is to
create acoustic features of all songs in the dataset with
the help of low-level analysis procedure. These
features are very various such as loudness, dynamic
complexity, spectral energy, dissonance and so on.
Then, a model and related parameters are provided by
Acoustic Brainz as a result of the analysis. As
providing that, Acoustic Brainz runs the SVM model
with different parameter combinations and tries to
find the best alternative, which completes the training
process in the system. The output of this process is a
history file that is a source for further predictions.
In order to predict genre classification
probabilities for a second set of songs, we use the
history file including the best combination of SVM
parameters. At this point, the test process starts.
Acoustic analysis of songs is turned into a high-level
analysis via Acoustic Brainz. The significant quality
of high-level analysis is providing the possibilities of
belonging to any genres for a related song. To
exemplify, possibilities of being blues, rock, pop
genres etc. (for all 10 genres) are listed with regarding
possibilities for a new song; the summation of
possibilities equals to 1. Also, it is significant that the
high-level data can be constructed based on the
acoustic features of songs without any need to reach
to the original audio files. Therefore, after conducting
a low-level analysis with a training dataset, training
solution is gained as a source for further predictions.
The low-level data carries the descriptors for the
acoustics which are basically characterizing
dynamics, loudness and spectral information of a
sound/signal, as well as rhythmical information such
as beats per minute and tonal information such as
scales. On the contrary, the high-level data has
information about moods, vocals, music types,
especially genres as in our case. Genres are identified
automatically as using acoustic features as base
sources with the help of trained classifiers.
Genre classification process starts when a training
solution (history file) is gained. Thanks to the history
file, new audio files' high-level analysis can be
started. The acoustic analysis process is not different
from the beginning. Each song has its own acoustic
features via Acoustic Brainz. However, the system
already contains high-level data. Based on the
training, selected best parameters from the SVM
model, the genre classification of new songs can be
realized. For each song, a special file is created (.yaml
files). They include acoustic features and genre
classifications of the related song. As a result, the
outputs of the prediction process are stored in these
files. In order to see the classification results clearly,
the files are parsed so that their full data is stored. The
stored data is utilized for creating confusion matrixes
which are beneficial visualizations that represent all
results of prediction phase in a clear format.
4 RESULTS
A dataset of selected 200 songs is split into two
groups as 160 and 40 songs. The first group, which
has 160 songs, is used for training. 16 songs are
selected randomly from each genre. The second
group of 40 songs is used for testing. Similarly, 4
songs are selected randomly from every 10 genres. In
order to start the analysis, 200 songs of a dataset are
created in .mp3 version but using songs as .mp3
without any change leads to misclassification in the
system. Therefore, song files need to be converted to
.wav formats in order to make the system work with
accurate results. The process is applied for both full
versions and pieces. In addition to file formats,
frequency is a significant factor to be reconsidered in
order to have a better analysis and outputs. All
frequencies are set as 22050 Hz.
For 160 songs, two different trainings are
processed and two diverse tests are studied. In the
first approach, full versions of songs are considered.
For these audio files, all steps explained in the
methodology part of the report are completed. As a
result of this work, a history file is gained for these
160 songs' full versions. In addition to this training
strategy, breaking songs into pieces is considered as
a second option. In this second strategy, time intervals
between 30
th
- 60
th
seconds for every song are taken.
Similarly yet separately, the methodology is applied
for these 160 different pieces.
First training approach which is using full
versions of 160 songs is used in order to test the other
40 songs as mentioned earlier. Testing is realized for
these 40 songs while considering not only their full