First, a threshold is applied to the input image, so
that the only non-null pixels are those of the guitar
and finger markers. Then, using the contour detec-
tion algorithm and contour data structure provided by
OpenCV, guitar and finger markers can be separated.
Note that guitar fiducials and finger markers are, re-
spectively, contours with and without a hole. Once
the positions of the four guitar fiducials are known in
the image, by using their actual positions in guitar fin-
gerboard coordinates a projective transformation (ho-
mography) can be determined and applied in order to
“immobilize” the guitar and easily extract the ROI.
This homography is then applied to the north-most
extreme of the finger rods, so we get the rough posi-
tion of fingertips in guitar fretboard coordinates, since
the distal phalanges are, in general, nearly perpendic-
ular to the fingerboard.
We use a supervised Machine Learning technique
to train the machine with the guitar chords we want
it to identify. The chord a musician plays is viewed
by the system as an eight-dimensional vector com-
posed by the coordinates (after projective transfor-
mation) of the four fingertips, from the little to the
index finger. By analogy with the PCP, we call this
eight-dimensional vector the Visual Pitch Class Pro-
file (VPCP).
Summarizing, the proposed algorithm for real-
time guitar chord detection has two phases. In the
first (the training phase), the musician chooses the
chords that must be identified and takes some samples
from each one of them, where by sample we mean
the eight-dimensional vector formed with the posi-
tions of the north-most extreme of the finger rods, i.e.,
the VPCP. In the second (the identification phase), the
system receives the vector corresponding to the chord
to be identified and classifies it using the K Nearest
Neighbor algorithm.
5 COMPARISON AND
DISCUSSION
Before talking about quantitative comparisons, let’s
address some theoretical aspects. Video methods,
even knowledge-based, are immune to wrong tuning
of the instrument. Despite not being desirable to play
a wrong tuned instrument, this feature is good for be-
ginners, that are not able to have a precisely regu-
lated guitar. On the other hand, it can be argued that
knowledge-based methods only work properly when
trained by the final user itself, since the shapes of
some given chord are slightly different from person to
person. This is a fact, but the knowledge-based tech-
niques using audio data also have to face with this
problem, since different instruments, with different
strings, produce slightly different songs for the same
chord shape.
Seeking quantitative comparisons, we take 100
samples from each one of the 14 major and minor
chords in the keys of C, D, E, F, G, A, B, choosing
just one shape per chord (in the guitar there are many
realizations of the same chord). The video samples
were taken by fixing a given chord and, while mov-
ing a little bit the guitar, waiting until 100 samples
were saved. For the audio samples, for each chord we
recorded nearly 10 seconds of a track consisting of
strumming in some rhythm keeping fixed the chord.
The audio data was then pre-processed in order to re-
move parts corresponding to strumming (where there
is high noise). Then, at regular intervals of about 12
milliseconds an audio chunk of about 45 milliseconds
was processed to get its Pitch Class Profile, as de-
scribed in Section 3.
These audio and video samples tend to form clus-
ters in R
12
and R
8
, respectively. Figure 3 provides
some analysis of them. Note that in both cases the
samples are placed very close to the mean of the re-
spective cluster, but there are more outliers in the au-
dio data.
Regarding classification performance, both meth-
ods behaved similarly in the tests we have conducted.
The difference is that the audio-based algorithm is
sensitive to the noise caused by strumming, while the
video-based method don’t care about it. This is il-
lustrated in Figure 4, where the same chord sequence
(played twice) was performed and analyzed by the
two methods, using 20 Nearest Neighbors for clas-
sification. Note how more stable is the video-based
method. It can also be seen that both algorithms have
problems with chord transitions.
6 CONCLUSIONS AND FUTURE
WORK
We have seen that both methods have similar classifi-
cation performance, but the VPCP algorithm is more
stable in the sense that (1) the clusters formed at the
training phase are better defined and (2) the visual
method is not sensitive to the noise caused by strum-
ming.
Given the high similarity between the classical
audio-based method and our proposed video-based al-
gorithm, a natural direction of research is to combine
both classifiers using some data fusion technique.
There is also some issues of the VPCP method
which have to be treated. The first is to eliminate the
need of the middle-phalanges gloves. Although they
VISUAL PITCH CLASS PROFILE - A Video-based Method for Real-time Guitar Chord Identification
489