we are trying to achieve through this research is the
possibility to identify genres of music using
numerical features and computer algorithms, mainly
to identify Indonesian traditional music against
several other genres. From above we chose Javanese,
Balinese and Sundanese music as examples of
traditional music genres, Keroncong as an example of
contemporary local music genre, and two foreign
music genre: Classical and Latin.
Classical music is chosen because it comes from
another tradition: the so-called Western tradition and
it is widely spread throughout Europe during the 9
th
century up until now in the 21
st
century, nearly as a
common European identity. It has its root in Christian
and orchestral music, and aside from that, every
Classical composition has its own conformity of
notation, tempo, metrum, individual rhythm and
expressions which limits the room for improvisation
and ad-libitum ornamentation available in Asian
traditional music for example Japanese and Indian
traditional compositions.
Latin music, are music which have similarities to
traditional music from Portugal and Spain, which
termed música latina. This is a very wide “genre”,
covering a variety of rhythm and beats, and may come
from either the Iberian Peninsula or the “Ibero-
America”, sung in one of both languages. Even the
American music industry (RIAA) uses the term
“Latin” for any music or songs performed in Spanish
and distributed in the U.S. It is chosen because aside
from not having particularly clear features, the
biggest market for Latin music is Spain, Brazil,
Mexico and United States, and it is having foreign
elements compared to Indonesian traditional music.
This research is using time-domain features
classified with Nearest Centroid Classifier (NCC) and
k-Nearest Neighbour (k-NN) to identify 6 genres of
music: Traditional Javanese, Traditional Balinese,
Traditional Sundanese, Keroncong, Classical and
Latin.
Pattern recognition may be characterized as an
information reduction, information mapping, or
information labeling process. An Abstract view of the
PR classification/description problem is shown in
Figure 1. We postulate a mapping between class-
member space, C, and pattern space, P. This mapping
is done via a relation, G
i
, for each class, and may be
probabilistic. Each class, w
i
, generates a subset of
‘patterns’ in pattern space, where the ith patter is
denoted p
i
. Note, however, that these subspaces
overlap, to allow pattern from different classes to
share attributes. Another relation, M, maps patterns
from subspaces of P into observations or measured
pattern or featured, donated m
i
. Using this concept,
the characterization of many PR problems is simply
that, given measurement mi, we desire a method to
identify and invert mappings M and G
i
for all
i
.
Unfortunately, in practice, these mappings are not
functions. Even if they were, they are seldom 1:1,
onto or invertible. For example, Figure 1 shows that
identical measurement or observations may result
from different p
i
, which in turn correspond to
different underlying classes. This suggests a potential
problem with ambiguity. Nevertheless, it seems
reasonable to attempt to model and understand these
processes, in the hope that this leads to better
classification/description techniques (Schalkoff,
1992).
Figure 1: Mappings in an abstract representation of
pattern generation/classification/interpretation systems
NCC calculates the centroid of every feature sets
form each class of genre, then the features obtained
from the test data is compared in terms of Euclidean
distance to each feature centroids. The test data is
then classified as the nearest class. k-NN employs the
similar distance measurement, but applied not against
class centroids. The distances are calculated from test
data against every class members, and then it is voted.
The new data belongs to the class with the highest
vote. For this vote to be successful, it has to be held
in odd numbers, e.g. k=3, k=5, and so on to avoid
truce. When there is no dominant class, a random
class is assigned, opt to be classless (not correctly
classified), or the class of the nearest member is
chosen.
Attempts to combine the benefits of these two
algorithms has been conducted by many, for example
Li et.al (2017), named KNCN (k-Nearest Centroid
Neighbor) classification, and its variants.
Experimental results on twelve real data sets obtained
from UCI machine learning repository show that the
new classifiers are effective algorithms for the
classification tasks, owing to their satisfactory
classification performance and robustness over a wide
range of k. (Li et al., 2017). Due to practical reasons,
this research is employing separate k-NN and NCC
for classification.
Features extracted from the music files are three
time-domain features in vector space model: Zero
Crossing Rate (ZCR), Average Energy (E) and Silent
Ratio (SR). The audio files of the music used as
training and test data are in waveform audio format
(WAV) to maintain sound quality. The uncompressed