establish the disease, its severity and area affected. As
Figure 1 suggests, the results can only be validated
against ground-truth data obtained through more
reliable diagnosis means (e.g. medical imaging) or
post-mortem examination.
2 AUTOMATIC ALS DETECTION
Carried out in the traditional guise (i.e. by humans),
and despite constant progress towards standardisation
and sophistication of auscultation training methods
and technology (see, for instance, Ward and Wattier’s
2011 review), the signal analysis process depicted in
Figure 1 is rather subjective; obviously, it is also
restricted to the human audible frequency range.
Computer-aided auscultation is potentially much
more objective, reliable and efficient. With the advent
of digital stethoscopy, its development became a real
prospect (reflected, for example, in the 1997 review
by Pasterkamp et al.). The EU-funded project
Computerised Respiratory Sound Analysis (CORSA),
involving a multinational task force of the European
Respiratory Society (Sovijärvi et al. 2000), marked a
research boom in this area. Naturally inspired by the
human auscultation process, depicted in Figure 1,
research efforts were primarily directed at automating
its first step – ALS detection.
The literature evidences intense work on the
development of algorithms applying pattern
recognition techniques to detect and classify the
various ALS types. Taking the example of crackle
detection (arguably the most important and certainly
one of the most challenging, given the discontinuous,
non-stationary nature of crackles), a wide variety of
signal processing techniques have been proposed,
including digital filters (Ono et al. 1989), spectrogram
analysis (Kaisla et al. 1991), auto-regressive models
(Hadjileontiadis 1996), time-domain analysis
(Vannuccini et al. 1998), fuzzy filters (Mastorocostas
et al. 2000), wavelet and wavelet-packet transform
methods (Kahya et al. 2001; Hadjileontiadis 2005; Lu
and Bahoura 2006; Lu and Bahoura 2008), fractal
dimension (FD) filtering (Hadjileontiadis and
Rekanos 2003), Hilbert transform analysis (Li and Du
2005) and empirical mode decomposition (EMD)
(Charleston-Villalobos et al. 2007; Hadjileontiadis
2007). This list is by no means exhaustive and similar
efforts have gone into the development of detection
algorithms for other ALS types, especially wheezes.
However, by and large, research publications in
this area reveal serious imbalance between
development and validation work, with insufficient
attention paid to the latter. To better characterise this
problem and support the practical solution proposed
for it in section 4, the next section discusses ALS
detection algorithm validation and its specific
requirements.
3 VA L I D AT I O N I S S U E S
ALS waveforms can be characterised qualitatively,
but establishing completely objective definitions is
not possible (if it were, developing an algorithm with
100% detection accuracy would be a simple task).
The performance of automatic ALS detection
algorithms can thus only be assessed by comparing
the annotations they generate with human expert
annotations of the same sound files, as illustrated in
Figure 2. In this context, the term annotation refers to
a complete record of the ALS of a given type
occurring in the sound file under analysis.
Figure 2: Validation of automatic ALS detection algorithm.
Given the subjectivity of human annotation, pointed
out in the previous section, it is essential to take
measures to minimise bias. For this reason, validation
references should be obtained by combining multiple
annotations of the same sound file, each carried out
independently by a different human expert, into a
single gold-standard annotation. The criteria
governing this combination or agreement process
must be explicit. For instance, the pilot study by
Quintas et al. (2013) used agreement by majority, but
other approaches can and should be explored.
Performance tests reported in the literature are
very often based on annotations by a single expert,
thus lacking credibility. In the rare instances of multi-
annotation, the criteria used for generating gold
standards are normally not clarified.
For statistical significance, both the panels of
expert annotators and the sets of annotated sound files
should be as large and diverse as possible. The
development of pattern recognition algorithms often
OntheValidationofComputerisedLungAuscultation
655