switch leads per cardiac cycle (Zhang et al., 2005). In 
order to take this difference in lead selection into 
account, the BB and BL values were also calculated.  
The BB values were computed according to the 
method adopted first in  (Martínez et al., 2004) and 
later in (Zhang et al. 2005; Vázquez-Seisdedos et al., 
2011). This method defines the T wave end per beat 
by selecting the lead in which the detection error, 
between the automatically and manually annotated T 
wave end, is minimal. The BL method selects the 
ECG lead which contains the most T wave ends, 
appointed by the previously described method. If an 
equal amount of T wave ends were appointed in both 
leads, the first lead was selected. From the viewpoint 
of a human operator, this is a more realistic 
procedure. The results of both methods can be found 
in Table 2. 
Table 2: Comparison of the overall mean and standard 
deviation (sd) of the differences, in ms, between the 
automatic and manually annotated T wave ends for all 
methods with the supplementary BB and BL protocol. 
 SEMI  TAN  TRA  INT 
Lead mean sd mean sd mean sd mean sd 
BB  -4.9 15.1 -8.0 16.3 11.8 29.7  2.2 20.0 
BL  -6.2 17.6 -7.6 19.0 14.3 37.5  3.9  22.9 
 
When applying the BB protocol, it was observed 
that the overall sd, obtained by each of the methods, 
was lower compared to the sd obtained for lead I and 
II. This was expected, since the lead in which the 
detection error is minimal was selected per beat. This 
protocol is most in accordance with the annotation 
method of the cardiologists, since they made their 
annotation by examining both leads and based their 
decision on the best lead (Martínez et al., 2004).  
In clinical practice, the best lead can be selected 
after the ECG recorder is set up. Hence, the BL results 
are clinically the most relevant, concerning everyday 
T wave end detection. We demonstrated that the TRA 
method is the least repeatable of all methods tested 
(sd=37.5ms), whilst the SEMI method is the most 
repeatable one (sd=17.6ms). The integral method 
scores best in terms of accuracy (mean=3.9ms). 
It might be noted that the mean and sd calculation 
was simplified. One value was computed per record 
and the overall mean and sd were computed as the 
average of these values. This method does not take 
the number of annotated T wave ends in each record 
into account. Therefore, we opted to generate Bland-
Altman plots. These allow a direct comparison 
between all manual annotations and the T wave end 
selections of the four algorithms. Only the BB values 
were taken into account, since this protocol is most in 
accordance with the annotation method of the 
cardiologists. Based on the Bland-Altman plots of the 
respective QT intervals, Q being manually annotated 
by the cardiologists, an evaluation of the agreement 
of the methods was performed. The results of the 
evaluation are depicted in Figure 4. 
The comparison of the TRA method shows the 
largest limits of agreement (-109.10/87.27ms). These 
results strengthen the previous findings. In 
accordance, the best agreement was determined for 
the SEMI method (-75.69/85.01ms), although the 
agreement of the INT method was only slightly worse 
(-84.48/80.70ms). The obtained biases are in the 
range of  the ones earlier reported (Panicker, Karnad, 
Natekar, et al. 2009; Vázquez-Seisdedos et al., 2011). 
The results of the SEMI method could be 
explained by the small influence of baseline wander 
and U waves on the detection of the T wave end. 
Because of their low amplitude, U waves have 
significantly less influence on the sum of the squared 
differences compared to the T waves. However, it 
should be noted that the method is very operator 
dependent. This is highlighted by the cluster forming 
of the difference points in the Bland-Altman plot. All 
QT intervals computed per record will be biased in 
accordance to the difference in end point selection of 
the template T wave end, compared to the manually 
annotated T wave end. This results in a relatively 
unaffected QT variability, but alters the QT lengths. 
This operator dependency should be taken into 
account when using this method in QT interval 
analysis. 
Besides the agreement intervals, the biggest 
difference between the algorithms could be observed 
for the cloud on the right. This cloud contains the 
longest QT intervals, including biphasic T waves and 
fusions with the U wave. Both the TAN and TRA 
method were outperformed by the INT method for the 
detection of the actual ends of these QT intervals. 
Probably, this is due to the fact that the TAN and TRA 
method rely on the detection of the T wave peak, 
making it harder to detect more complex biphasic T 
waves or fused T and U waves.