atones the reverberation effect from sound reflections
around a person’s body. Afterwards, p
lips
was con-
verted to airflow at the lips (u
lips
), using a Pressure
to Flow Conversion Model. Both models were devel-
oped in similar fashion to (Larson et al., 2012).
4.1.3 Envelope Generation
The third stage employed several methods to calcu-
late the signal envelopes, approaching different sound
characteristics to obtain a comprehensively robust
feature extraction. The algorithms’ input consisted of
both the segmented audio and the two resulting sig-
nals from the pre-processing stage, as all of them can
be considering roughly proportional to air flow.
Generic Envelope Extraction. To obtain an enve-
lope based on a time domain approach two methods
were used: the Hilbert Transform and Shannon curves
(Liang et al., 1997). The first approach consists of
calculating the signal’s harmonic conjugate with the
Hilbert Transform and to add it back to the signal,
resulting on an envelope. The second approach in-
volves calculating the Shannon Entropy and Energy
envelopes of the signal. They act as non-linear trans-
formations focusing either on the higher (Energy) and
lower (Entropy) intensities of the signal. Both ap-
proaches output highly noisy curves that need subse-
quent smoothing.
Linear Predictive Coding. The audio input is seg-
mented in windows of 31.25ms, with 50% overlap.
The white noise variance, or power, is obtained from
the LPC model outputs. While the LPC filters can
approximate the vocal tract (Wakita, 1973), the suc-
cession of power values should be proportional to the
exhalation power at the respective time and constitute
a sampled envelope of the signal. The implementation
included models of degrees 2, 4, 8, 16 and 32, which
represents increasing vocal complexity.
Mean of Resonances. Similarly to LPC, the signal
was buffered into 31.25ms frames, with 50% over-
lap. Each frame underwent a 256-point FFT opera-
tion using a hamming window, producing a spectro-
gram. All spectrogram values lower then 20% the
respective frames’ maximum were considered noise
and were consequently discarded. Resonances over
250ms, within the respective frequencies’ 2 bin neigh-
borhood were kept, preserving only relatively large
and long frequencies, and taking into account the nat-
ural occurring frequency shift. The envelope was ob-
tained by averaging the frames’ saved resonances.
4.1.4 Envelope Post-processing
The several envelopes obtained were processed using
different settings in order to find the best combination
for the application. The envelopes were smoothed by
either a regular low pass filter (LPF) or a moving aver-
age (MA) and, in parallel, were also approximated by
a 4th order polynomial. To obtain the same sampling
rate as the buffered methods, the Hilbert Transform
and Shannon envelopes’ results were downsampled
accordingly. The non-approximated envelopes were
further processed using a Savitzky-Golay filter (SG)
with order 3 and size 11 (Savitzky and Golay, 1964),
as depicted on Figure 1.
4.2 Parameter Extraction
For each recording, the spirometry parameters were
calculated from each of the final envelopes. The
measurements extracted were PEF, FVC, FEV
1
,
FEV
1
/FVC, FEF
25%−75%
, FEF
25%
, FEF
50%
, FEF
75%
and a custom parameter proposed in (van Stein,
2013). The envelopes are viewed as Flow-Time
curves, typical of spirometer reports.
PEF is defined as the Peak Expiratory Flow or
the global maximum of the audio envelope. By
integrating the envelope with respect to time the
Volume-Time curve can be obtained. FVC is de-
fined as the total volume expired of a FEM. FEV
1
is the total volume expired during the first second.
FEF
25%−75%
corresponds to
1
/2FVC / (t
75%
−t
25%
), in
which t
x%
is the time at which the volume corresponds
to x% of the FVC. FEF
x%
is the instantaneous flow
value at x% of the total volume. Due to the highly
noisy nature of the recordings, these last measure-
ments were calculated as the average flow during an
interval of 5% the total sound’s duration, around the
corresponding time instant.
4.3 Machine Learning
The system’s machine learning pipeline can be di-
vided into two stages: the parameter regression and
the classification. The first uses the parameters ex-
tracted from the curves to obtain an estimation of
the respective clinical values as given by spirometers.
The second devises models that can discern between
the possible illness states, initially addressing the dis-
tinction of normal from abnormal lung function and
then, normal from 3 types of pathologies.
4.3.1 Regression Stage
Every recording produces several envelopes and each
one is used to extract clinical measurements. This in-
LungFunctionClassificationofSmartphoneRecordings-ComparisonofSignalProcessingandMachineLearning
CombinationSets
125