Table 4: Confusion Matrix. k-NN: all descriptors. Classifi-
cation Rate: 71,8%.
Melody 23 5 3 8 2 0
Melody+Solo 2 6 5 5 0 0
Solo 2 5 15 0 0 0
Harmony 5 3 1 57 3 2
Bass 1 1 0 2 39 0
Drums 0 0 1 1 0 41
k-NN gave slightly better results than NN, but not
a significant benefit. Using all descriptors, is a naive
approach, because some descriptors may successfully
distinguish between two different classes, but another
descriptor may distinguish the same classes in an op-
posite way, and confuse the final classification.
Notice the high score in Harmony, Bass and
Drums, this is mostly because these tracks are quite
different from each other, and specially from the other
three classes. Harmony is polyphonic, as are mostly
Drum tracks, in contrast to melody or solo tracks
which are monophonic. Bass is also monophonic
but usually has a lower pitch than melodies or solos,
which makes it easy, for the classifier, to distinguish.
It seems that the real problem is classifying Melody
and Solo, as these are quite similar, and may be con-
fused with Melody+Solo class.
3.2 Track Selection: Single Descriptor
Category
A different approach was used in the following ex-
periments. Instead of using the full set of descriptors,
six sets of descriptors were used, corresponding to the
descriptor categories, and also some combinations of
the best scoring sets. Both networks used, with hid-
den units set to 80 for NN, and k values ranging from
1 to 29 for kNN, using the best value achieved. The
results are shown in table 5.
With NN, surprisingly, the TI set alone provided
better results than the whole set of descriptors. Also,
the TI+P set provided the best results so far! All the
other sets yielded worse results and are clearly con-
fusing the classifier, and should not be used, at least
not in this naive way. The kNN results, using set TI,
Table 5: Single Categories Classification Rates.
Set NN Rate(%) k-NN Rate(%)
TI 71,8 63,8
P 56,7 53,9
PI 50,4 44,4
ND 50,4 42
SD 31,3 30,9
S 38,8 37,6
R 42 40,5
IH 47,2 36,9
TI+P 75,4 71,8
P+PI 63 48,8
were worse than those achieved when using the full
set of descriptors, but using TI and P combined gave
similar results. As in NN, all the other categories gave
worse results. These results proved that the descrip-
tors have to be carefully selected, not only by com-
bining categories, but combining single descriptors,
in order to achieve the best results.
3.3 Track Selection: Best Descriptors
Using the algorithm described previously a possible
best descriptor set was found. The best set is com-
posed of descriptors [1 2 4 5 7 8 9 15 18 24 31 34]
(numbers are correspondent to table 1) for NN using
60 hidden units, and [3 4 5 6 7 8 9 11 13 18 19 24 25
31 33] for kNN with k = 5. It’s clearly obvious that
using a whole category is not the best option. Instead,
using only the descriptors that work better together.
For NN, which is the main classifier, a significant 16%
gain was achieved comparing with the full descriptor
set.
As we can see, only the descriptor #3 was not cho-
sen from the TI set, which makes sense, as it was the
set that provided the best results alone. From the P
set, Highest Pitch was not chosen, but Lowest Pitch
was, as it’s used for classifying the bass tracks. From
the PI set, only the Standard Deviation was used and
from the ND set, only Mean was chosen. The SD set
was ignored by the algorithm, which means that the
silences are not significant and could be discarded. S
was also chosen, meaning that the rhythm is also an
important feature in distinguishing between classes.
The descriptors chosen from the IH set, were “Per-
fect Fourth” and “Minor Sixth”. According to music
theory, there intervals are one of the most consonant,
because they have simple pitch relationships resulting
in a high degree of consonance, which is perfect for
distinguishing between, for example, a simple slow
Melody or a fast complicated Solo.
In respect to the confusion matrix, all the misclas-
sified tracks make sense. A melody track is similar to
a bass track, although it has higher pitches. In fact,
that one misclassified bass track has higher pitches
as well. The Melody+Solo class is the worst per-
forming, mainly because a solo can be made of sev-
eral melodies, or even harmony at some point, and
be misclassified. More training data would definitely
improve the performance on this class.
ICEIS 2008 - International Conference on Enterprise Information Systems
542