audio. Peeters (2011) presents an approach that is
also based on audio. First, the onset positions are
evaluated by an energy function. Based on this
function, vector representations of rhythm
characteristics are computed. For classifying these
rhythms, four feature sets of these vectors are
studied which are derived by applying DFT and
ACF. Next, various ratios of the local tempo are
applied to these vectors. Finally, a classification task
measures the ability of these periodicity
representations to describe the rhythm characteristics
of audio items.
Pattern-based Approaches: Ellis and Arroyo
(2004) present an approach that uses Principal
Components Analysis (PCA) to classify drum
patterns. First, measure length and downbeat
position are estimated for each track of a collection
of 100 drum beat sequences given in General MIDI
files. From each of these input patterns, a short
sequence is passed to the PCA resulting in as set of
basic patterns. A classification task is performed
with them producing about 20 % correctly classified
results. Murakami and Miura (2008) present an
approach to classify drum-rhythm patterns into
“basic rhythm” and “fill-in” patterns. Based on
symbolic representations of music, i.e. General
MIDI tracks, instruments are grouped by their
estimated importance on playing roles in either
“basic rhythm” patterns or “fill-in” patterns or both.
These three groups model drum rhythm patterns.
Expecting a minimum input of one measure in 4/4
beat the classification is performed based on
neighbourhood comparison. They achieve
classification result of up to 76 %.
Source Separation based Approaches: Tsunoo,
Ono & Sagayama (2009) propose a method to
describe rhythm by classifying track spectrograms
based on audio. Thus, percussive and harmonic
components of a track are first separated by the
method described in Ono et al. (2008) followed by
clustering the percussive part in a combination of
One-Pass Dynamic Programming algorithm and k-
means clustering. Finally, the frame of each track is
assigned to a cluster. The corresponding track’s
spectrogram is used to classify the rhythms. They
achieve accuracies of up to 97.8 % for House music.
Psychoacoustic-based Approach: Rauber,
Pampalk and Merkl (2002) propose a method to
automatically create a hierarchical organization of
music archives based on perceived sound similarity.
First, several pre-processing steps are applied. All
tracks of the archives are divided into segments of
fixed length followed by the extraction of frequency
spectra based on the Bark scale in order to reproduce
human perception of frequency. Finally, the specific
loudness sensation in Sone is calculated. After these
pre-processing steps a time invariant representation
of each piece of music is generated. In the last step
of processing, these patterns are used for
classification via Self-Organizing Maps. The method
is based on audio.
Although approaches on solving the problem of
rhythm classification have already been presented
yet the success rates can only be regarded as
satisfying for specific genres, e.g. Popular music or
House music (Ono et al., 2008) or ballroom dance
music (Peeters, 2011). Furthermore, the majority of
approaches (Paulus and Klapuri, 2002; Tzanetakis
and Cook, 2002; Peeters, 2011; Tsunoo, Ono &
Sagayama, 2009; Ono et. Al, 2008; Rauber, Pampalk
and Merkl, 2002) rely on audio. Thus, further effort
is required to improve classification methods that
address symbolic data.
3 CLASSIFYING RHYTHM
PATTERNS
In this paper we present an approach for the
classification of music rhythms that treats rhythm as
a sequence of N notes with a time difference
between the onsets of adjacent notes. Our method is
based on symbolic data in order to be able to access
all necessary information for each note directly.
Thus, by not using audio, we can exclude further
sources of error, e.g. detecting the onset positions of
notes. Although numerous onset detection
approaches are known their reliability is still
inadequate for excluding them as a possible source
of error (Collins, 2005).
We compare and classify rhythms in four steps.
Step one covers all necessary preliminary
computations; in step two all possible, i.e.
hypothetical rhythm patterns are extracted; step
three reduces the number of rhythm hypotheses and
finally, step four performs the classification task
utilizing a knowledge base. Fig. 1 illustrates this
concept.
However, to limit the number of possible sources
of error, we only focus on drum rhythm patterns and
limit our method to the use of temporal information
and accentuation as features for the classification
task. Furthermore, evaluated sequences are limited
to a length of 30 s in order to reduce computational
complexity.
ICPRAM2014-InternationalConferenceonPatternRecognitionApplicationsandMethods
748