
tion and diagnosis, outperforming conventional ma-
chine learning approaches.
Real-time monitoring of respiratory diseases can
be easily achievable with the above-mentioned meth-
ods at a time when telehealth has moved towards
generic readily available sensors, available for dif-
ferent applications, and wearables. Recent advances
in smartphone and smartwatch technology allow the
use of these everyday devices as smart systems for
cough monitoring. These devices can easily acquire
the audio signal for analysis as well as incorporate
other sensors to capture movement associated with
a cough episode. However, in order to use a smart-
phone as a cough monitor, all the other features of
the device need to be kept functional without being
compromised by the monitoring application. No user
would want to use an application that reduces battery
life to 2 or 3 hours. The complexity of the opera-
tions required for cough detection and analysis must
therefore be taken into account, especially when deep
learning is involved. Existing mobile solutions to date
have not focused on efficient implementations to re-
duce battery consumption and do not guarantee con-
tinuous real-time monitoring.
This paper proposes an efficient cough detector
designed for real-time monitoring on low-end compu-
tational devices, including smartphones. Explainable
Artificial Intelligence (XAI) is initially employed to
identify salient regions in audio spectrograms that a
convolutional neural network (CNN) considers mean-
ingful for cough detection. After that, an optimized
CNN is designed based on these salient regions as
inputs. Results show that the detection performance
achieved by the optimized models is comparable to
that of the non-optimized ones, while computation
times and memory footprint are significantly reduced.
The structure of the paper is organized as fol-
lows: Section 2 presents the materials employed in
the study. section 3 explains the methodology applied
for cough identification and the different optimiza-
tions applied Results are presented and discussed in
section 4. Finally, section 5 summarizes the conclu-
sions extracted from the study.
2 MATERIALS
Our group of subjects consists of 20 patients aged be-
tween 23 and 87 years (9 women, 11 men) with the
following respiratory pathologies: Acute respiratory
disease (ARD, 3), pneumonia (4), chronic obstruc-
tive pulmonary disease (COPD,6), lung cancer (3),
and others such as asthma, bronchiectasis or sarcoido-
sis (remaining patients). An observational study of
cough evolution during 24 hours of a patient’s nor-
mal life was carried out. Twenty-four hours of audio
from ambulatory patients in the Palencia Health Area
(Spain) were prospectively recorded. The database
consists of approximately 15,000 cough events cor-
responding to the subjects mentioned above. A Sony
Xperia Z2 Android smartphone was used to collect
the data using 16-bit WAV format at 44.1 kHz. Each
patient was instructed to store the device as they
would normally do to capture samples in a real en-
vironment where noise may be encountered. These
noisy signals were used as non-cough events for com-
parison with the recorded coughs, getting two sepa-
rate sets. The study was carried out in accordance
with the Declaration of Helsinki and was approved by
the
´
Area de Salud de Palencia Research Ethics Com-
mittee (REC number: 2023/027). Subjects provided
their informed consent before the recordings
3 METHODS
Cough detection was initially carried out using a non-
optimized CNN in two steps:
1. Audio Signal Preprocessing: The 44.1 kHz au-
dio signal collected by the smartphone was
transformed into spectrograms to obtain a time-
frequency representation of the signal. The fol-
lowing process was carried out for this purpose:
• The audio signal is first 5x downsampled to
8.82 kHz. Then, the power spectral den-
sity (PSD) is calculated, using 10 ms non-
overlapping windows with Hanning weighting.
• Once these PSDs are obtained, they are con-
catenated over 1s intervals, forming a set of
45x100 spectrograms.
• Finally, these time-frequency representations
undergo logarithmic normalization.
2. Cough Window Identification: To distinguish
spectrograms corresponding to cough events, we
devised a custom CNN from scratch. This net-
work begins with a convolutional layer contain-
ing 32 filters, each with 2 × 2 kernels, and uti-
lizes a ReLU activation function. Following this
is a 2 × 2 Max-Pool layer to reduce dimension-
ality, accompanied by a dropout layer to mitigate
overfitting. This architecture is then repeated with
the number of filters doubled. The final sequence
of layers before the output consists of a convo-
lutional layer with 128 filters, a dropout layer,
another convolutional layer with 256 filters, and
a concluding Max-Pool layer. The output from
this setup is resized to fit the output architecture,
HEALTHINF 2025 - 18th International Conference on Health Informatics
492