
complexity and performance are not clearly correlated when using k-NN. These
results support the learning hypothesis for the nets.
Event detection errors are concentrated in the transitions, at both ends of the
note activations. This fact also conditions the detection of very short notes, one
or two events long. This kind of situations can be solved by a post-processing
stage over the net outputs along time. In a music score not every moment is
equally probable. The onsets of the notes occur in times that are conditioned by
the musical tempo, that determines the position in time for the rhythm beats, so
a note in a score is more likely to start in a multiple of the beat duration (quarter
note) or some fractions of it (eighth note, sixteenth note, etc.). The procedure
that establishes tight temporal constraints to the duration and starting point
of the notes is usually named quantization. From the tempo value (that can be
extracted from the MIDI file) a set of preferred points in time can be set to
assign beginnings and endings of notes. This transformation from STFT timing
to musical timing should correct most of these errors.
False note positives and negatives are harder to prevent and it should be
done using a model of melody. This is a complex issue. Using stochastic models,
a probability can be assigned to each note in order to remove those that are really
unlike. For example, in a given melodic line is very unlike that a non-diatonic
note two octaves higher or lower than its neighbours appears.
The building of these models and the compilation of extensive training sets
are two challenging problems to pose in the future works.
Acknowledgements
This work has been funded by the Spanish CICYT project TIRIG; code TIC
2003–08496–C04. The authors want to thank Dr. Juan Carlos P´erez-Cort´es for
his valuable ideas and Francisco Moreno-Seco for his help and support.
References
[1] K. Martin. A blackboard system for automatic transcription of simple polyphonic
music. Technical Report 385, MIT Media Lab, July 1996.
[2] A. Klapuri. Automatic transcription of music, 1998. Master thesis, Tampere Uni-
versity of Technology, Department of Information Technology.
[3] W.J. Hess. Algorithms and Devic es for Pitch Determination of Speech-Signals.
Springer-Verlag, Berlin, 1983.
[4] T. Shuttleworth and R.G. Wilson. Note recognition in polyphonic music using
neural networks. Technical report, University of Warwick, 1993. CS-RR-252.
[5] M. Marolt. Sonic : transcription of polyphonic piano music with neural networks.
In Proceedings of Workshop on Current Research Directions in Computer Music,
November 15-17 2001.
[6] D. R. Hush and B.G. Horne. Progress in supervised neural networks. IEEE Signal
Proc essing Magazine, 1(10):8–39, 1993.
[7] MIDI. Specification. http://www.midi.org/, 2001.
[8] B. Vercoe. The CSound Reference Manual. MIT Press, Cambridge, Massachusetts,
1991.
89