4.2 Procedures and Results
The individual algorithm performance tests,
described in Section 3, involved 400 annotations.
Implementing and testing the proposed multi-
algorithm majority agreement method took the
following additional steps:
1. Obtaining four-algorithm agreement annotations
for every possible parameter combination. With
10 parameter values per algorithm, this yielded
10
4
annotations for each sound file (10
5
in total).
In spite of their high number, the computational
cost of these annotations was relatively low, since
they could be derived from the original 400 using
a simple agreement script, with no need for
additional detection algorithm runs.
2. Calculating the SE, PPV and F indices for each
of the 10
5
annotations and the corresponding
averages across the repository: <SE>, <PPV> and
<F>. To facilitate 3D-chart visualisation (see
Figure 6), the parameters of algorithms A and B
were represented on the xx axis and those of
algorithms C and D on the yy axis, their sequence
being arranged so that only one varied between
consecutive array elements along axial directions.
The average values <SE>, <PPV> and <F> were
stored in three 100-by-100 arrays organised
accordingly.
3. Determining the point of optimal performance
i.e. of peak average index <F> – see Table 5.
Figure 6: <F> curve using multi-algorithm agreement.
Table 5: Optimal multi-algorithm performance results.
Parameter settings: 3.5(A), 0.024(B), 0.84(C) and 0.66(D).
File
True
count
Alg.
count
TP FP FN
SE
(%)
PPV
(%)
F
(%)
1 51 58 51 7 0 100 88.0 93.6
2 81 83 71 12 10 87.7 85.5 86.6
3 75 89 70 19 5 93.3 78.7 85.4
4 131 107 102 5 29 77.9 95.3 85.7
5 49 56 49 7 0 100 87.5 93.3
6 46 43 39 4 7 84.8 90.7 87.6
7 38 51 34 17 4 89.5 66.7 76.4
8 47 54 42 12 5 89.4 77.8 83.2
9 14 21 14 7 0 100 66.7 80
10 23 21 21 0 2 91.3 100 95.5
Average - - - - - 91.4 83.7 86.7
The average indices at the optimal performance
point (<SE>=91.4%, <PPV>=83.7% and
<F>=86.7%, as shown in Table 5) should be
compared to those of the four algorithms considered
individually, shown in Table 3. While multi-
algorithm sensitivity is on a par with the best
individual algorithm results, precision is about 11%
higher, resulting in a 7% performance improvement
over the best individual algorithm (B), as measured
by <F> (86.7% vs. 81%).
It is worth noting that this optimal multi-
algorithm performance point does not correspond to
the optimal parameter settings of each individual
algorithm, which are 3 for algorithm A, 0.024 for B,
0.75 for C and 0.75 for D. The average performance
with these settings would be only 84.5%.
5 DISCUSSION AND FUTURE
WORK
Replicating the algorithms proposed in the literature
poses serious difficulties, mainly due to lack of
public access to sound file and reference annotation
data used for validation tests. The creation of an
open Web platform to stimulate the development
and sharing of respiratory sound and annotation
repositories, annotation tools, gold standards,
agreement metrics and criteria, as well as detection
algorithms, is essential to advance research in this
area.
While relative performances followed the
expected trend, with FD-based algorithms
outperforming the time-domain approach of
algorithm A, the performance indices of the
algorithms implemented were generally below the
published claims for those in which they were based.
The characteristics of the repository used here
(longer files, more varied pathologies…) may
partially explain this difference, but the main factor
is probably the use of gold standards obtained
through multi-annotation using a majority agreement
criterion, which is likely to attenuate annotation bias.
The multi-algorithm agreement technique
proposed here clearly deserves further investigation,
as the initial test results – a 7% improvement over
the performance of the best individual algorithm
involved – are extremely encouraging. The previous
considerations on the absolute performance of the
individual algorithms do not weaken this conclusion.
Moreover, the algorithms were not chosen to suit
this technique; in view of the considerations
presented in Section 4.1, its potential is likely to be
Multi-algorithmRespiratoryCrackleDetection
243