
 
Although at this point there are more indications to 
use 35 MFCCs to characterize the music signal, we 
chose 40 for two reasons: 
 
1.  To make sure that the parameterisation of the 
system is within the saturation zone of 
information provided by the MFCCs. 
2.  Although this number of coefficients doesn’t 
provide the best accuracy rate, it is the one that 
gets minor substitution errors. This kind of error 
goes down significantly by increasing the 
MFCCs number.  
7.3 Third Stage 
It’s possible that the high insertion rates of the 
system at this point, could be given by the different 
temporary evolution of the notes played with 
different instruments. This fact motivates the 
following system tests, which consists of using 
several window sizes with some different overlays 
between them. 
The window sizes used in the experiment 
oscillate between 30 and 90 ms with overlays 
between 50 and 80% of the window size. 
Table 5 shows the experimental results. The 
optimum point is produced using 60 ms windows 
displaced by 12 ms. On the other hand, the 
successful outcome of the results confirms the 
validity of the HMMs topology for notes 
recognition. 
Figure 5 represents the percentage accuracy 
evolution of the recognition system through the 
successive parameterisation improvements made in 
experiments. 
Finally, we have to point out that accuracy rate 
obtained by Durey’s system is 71.7% in multi-
instrument recognition conditions. This value is 
lower than any one obtained by the proposed system 
in any experiment in all the three stages (Table 5). 
8 CONCLUSIONS 
The present work shows a study on a suitable set of 
features extracted from the signal to be used in 
musical notes recognition. Likewise, Hidden 
Markov Models have been shown to be powerful 
enough when applied to musical notes recognition. 
 
Table 4: Recognition and error rates varying the number of MFCCs. 
Number of  
MFCCs 
% correct 
notes 
% deleted 
notes 
% substituted 
notes 
% inserted 
notes 
Percent Accuracy 
14  85,16 1,96 12,88 3,79  81,37 
20 92,49 1,75 5,77 4,29  88,19 
25 95,14 1,50 3,36 4,66  90,48 
30 96,72 1,70 1,58 4,73  91,99 
35 97,39 1,97 0,64 4,93  92,46 
40 97,43 2,10 0,46 5,84  91,60 
45 97,27 2,18 0,55 6,05  91,22 
Table 5: Recognition and error rates varying the windows width and its overlapping. 
Window 
Width (ms) 
Overlapping 
(ms) 
% correct 
notes 
% deleted 
notes 
% substituted 
notes 
% inserted 
notes 
Percent 
Accuracy 
30  6  98,23 1,55 0,22 4,95 93,28 
30  7,5 98,29 1,54 0,17 6,86 91,43 
30  10  98,16 1,62 0,22 8,31 89,85 
30  15  96,49 3,12 0,39 3,59 92,97 
60  12  98,43 1,25 0,32 0,17 98,26 
60  15  98,11 1,65 0,24 0,05 98,06 
60  20  97,16 2,61 0,23 0,01 97,15 
60 30 95,94 3,63 0,42 0 95,95 
90  18  97,90 1,91 0,19 0,04 97,86 
90 22 97,10 2,65 0,25 0 97,10 
90  30  96,10 3,61 0,30 0,01 96,08 
90 45 94,14 5,49 0,37 0 94,14 
 
SIGMAP 2007 - International Conference on Signal Processing and Multimedia Applications
190