provides these regions and additional information,
such as predicted bounds or beat positions to further
processing stages. In the second pass of the lullaby,
the melody is located between upper and lower tones
and is not easy to recognize for human listeners. The
framework detects this structure even without the
need of creating hypotheses for any parts.
How can we evaluate the overall performance of
our hypothesis generator? Due to the high degree of
possible musical alterations, an adequate evaluation
is no easy task. How can we rate improvisations?
How should we rate missing repetitions of an
already detected structure? How can medleys of
refrains or partially detected structures be evaluated?
We decided to consider only those regions of a
music file, for which sequential structures should be
found ideally. Then, we count the number of really
detected structures, which have to be normalized by
a weight regarding its duration within an instance.
Additionally, each predicted or detected structure
has to be examined regarding correct bounds.
Finally, we get a percentage of correct coverage.
This coverage-measure shows some advantages:
Parts of music, which cannot be detected inherently,
such as free intros, intermezzi, improvisations, or
endings do not affect the rating. If a repetitive or
composed structure cannot be correlated entirely but
one or more of its higher-level components are
detected, that structure will not be rejected. That is,
each example from Figures 6 and 7 would get
coverage of 1.0, which means that all detectable
sections have been recognized correctly. Of course,
each covered region gets an additional similarity
rating as illustrated in Section 3.2.
To automatize the evaluation of our midi-file
collection, we manually identified all contained
templates or high-level structures and their ranges.
This information has been added as metadata to the
database. For all 352 test files we got an average
coverage of 0.55. This result is characterized by a
high deviation. A lot of music had coverage of 1.0
but many files showed no coverage at all:
Table 1: Coverage of test files.
Kind of coverage
Number
of songs
Percent
age
Average
coverage
Full coverage 119 34 % 1.0
Partial coverage 143 41 % 0.52
No coverage 90 25 % 0.0
Total 352 100 % 0.55
Due to the high degree of melody-alteration
especially jazz piano music showed a bad
performance. At the moment, our predictive
algorithms are based on exact pattern matching
techniques. Therefore, the groundwork for
successful creation of hypotheses was insufficient in
some cases. For most of these unrecognized files, we
could manually adjust some accuracy parameters
and increase the recall at the expense of precision.
To get more results, the overall need of
computation time and memory grows exponentially
in most cases. It would be pointless to broaden the
search at this development stage. The quality of the
structure prediction will most likely improve if
hypothesized components could be evaluated by
future similarity processing stages. At this time we
parameterized our system by optimizing the tradeoff
between computation time and recall while getting
precision as high as possible. Indeed, for top-level
prediction, precision equals one for the entire
database.
Further work should complete the framework
conceptually first. That is, integrating similarity
measures for chords, chord progressions and
melodic alterations into the framework. Next, to
improve pattern recognition, recent matching
techniques should be extended to “fuzzy” matching.
Using a more contour-oriented representation of
melody would compensate slight variations such as
altered or missing tones, e.g. playing a motif in
minor instead of major. Finally, by implementing
symbolic extraction algorithms, the framework
should be able to guide even an audio identification
process.
REFERENCES
Bello, J. P., 2009. Grouping Recorded Music By
Structural Similarity. In Proceedings of ISMIR,
Kobe, Japan.
Dowling, W. J., Harwood, D. L., 1986. Music Cognition,
Academic Press.
Miotto, R., Orio, N., 2008. A Music Identification System
Based on Chroma Indexing and Statistical Modeling.
In Proceedings of ISMIR, Philadelphia, USA.
Mohri, M., Moreno, P., Weinstein, E., 2007. Robust
Music Identification, Detection, and Analysis. In
Proceedings of ISMIR, Vienna, Austria.
Montecchio, N., Orio, N., 2009. A Discrete Filter Bank
Approach to Audio to Score Matching for Polyphonic
Music. In Proceedings of ISMIR, Kobe, Japan.
Pearce, M. T., Müllensiefen, D., Wiggins, G. A., 2008. A
Comparison of Statistical and Rule-Based Models of
Melodic Segmentation. In Proceedings of ISMIR,
Philadelphia, USA.
KNOWLEDGE-BASED MINING OF PATTERNS AND STRUCTURE OF SYMBOLIC MUSIC FILES
363