Optimization of a Deep-Learning-Based Cough Detector Using

eXplainable Artiﬁcial Intelligence for Implementation on Mobile Devices

P. Amado-Caballero

1 a

, I. Varona-Pe

, B. Guti

errez-Garc

ıa

J. M. Aguiar-P

erez

, M. Rodriguez-Cayetano

, J. G

omez-Gil

, J. R. Garmendia-Leiza

2 b

and P. Casaseca-De-la-higuera

1 c

Departamento de Teor

ıa de la Se

nal y Comunicaciones e Ingenier

ıa Telem

atica, E.T.S. Ingenieros de Telecomunicaci

on,

Universidad de Valladolid, Valladolid, Spain

Centro de Salud Los Jardinillos, SACYL, Palencia, Spain

Keywords:

Respiratory Diseases, Cough, Audio Analysis, CNN, XAI, Occlusion Maps, Optimization.

Abstract:

Respiratory diseases, including COPD and cancer, are among the leading causes of mortality worldwide, of-

ten resulting in prolonged dependency and impairment. Telemedicine offers immense potential for managing

respiratory diseases, but its effectiveness is hindered by the lack of reliable objective measures for symptoms.

Recent advances in deep learning have signiﬁcantly enhanced the detection and analysis of coughing episodes,

a key symptom of respiratory conditions, by leveraging audio signals and pattern recognition techniques. This

paper introduces an efﬁcient cough detection system tailored for real-time monitoring on low-end compu-

tational devices, such as smartphones. By integrating Explainable Artiﬁcial Intelligence (XAI), we identify

salient regions in audio spectrograms that are crucial for cough detection, enabling the design of an optimized

Convolutional Neural Network (CNN). The optimized CNN maintains high detection performance while sig-

niﬁcantly reducing computation time and memory usage.

1 INTRODUCTION

Respiratory diseases, such as COPD and cancer, rank

among the leading causes of death worldwide (World

Health Organisation, 2017). These chronic illnesses

frequently result in long-term dependence and impair-

ment. Recent research (Belli et al., 2020) emphasize

the necessity for monitoring and support at home post

COVID-19 regardless of whether the individual was

hospitalized. The interaction between COVID-19 and

chronic respiratory diseases like COPD and cancer

signiﬁcantly increases the likelihood of hospitaliza-

tion and mortality (World Health Organisation, 2021).

Therefore, continuous monitoring of respiratory con-

ditions is essential for detecting and managing exac-

erbations in these individuals.

The European Commission’s study on

telemedicine (European Commission, 2018) con-

cluded on its high potential in managing respiratory

diseases, yet underscored a signiﬁcant gap in research

https://orcid.org/0000-0002-8773-3156

https://orcid.org/0000-0002-9573-4263

https://orcid.org/0000-0003-1565-0842

in this domain. Although telemedicine has shown

to be promising for cost-effective monitoring of

respiratory conditions (Audit Scotland, 2011), it

has not lived up to expectations due to the absence

of reliable objective measures for symptoms. The

effectiveness of remote consultations strongly relies

on being able to acquire such metrics, facilitating

early diagnoses and real-time monitoring for patients

with respiratory conditions (H. Pinnock et al., 2013).

Automatic detection of coughing episodes has

been investigated before, and there are proposals of

systems with acceptable sensitivity and speciﬁcity

values. The very initial systems (Birring et al., 2008;

Vizel et al., 2010; Drugman et al., 2013; Amrul-

loh et al., 2015) were based on classical machine

learning. Thus, they implemented a pattern recogni-

tion module to classify a set of time and/or spectral

domain features extracted from audio signals. Re-

cently, there has been a notable increase in employ-

ing deep learning techniques to address the challenges

of cough detection and analysis for diagnostic pur-

poses. Methods such as those detailed in (Laguarta

et al., 2020; Mingyu You et al., 2022; Tena et al.,

2022) showed promising advancements in both detec-

Amado-Caballero, P., Varona-Peña, I., Gutiérrez-García, B., Aguiar-Pérez, J. M., Rodriguez-Cayetano, M., Gomez-Gil, J., Garmendia-Leiza, J. R. and Casaseca-De-la-higuera, P.

Optimization of a Deep-Learning-Based Cough Detector Using eXplainable Artiﬁcial Intelligence for Implementation on Mobile Devices.

DOI: 10.5220/0013141500003911

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 18th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2025) - Volume 2: HEALTHINF, pages 491-498

ISBN: 978-989-758-731-3; ISSN: 2184-4305

491

tion and diagnosis, outperforming conventional ma-

chine learning approaches.

Real-time monitoring of respiratory diseases can

be easily achievable with the above-mentioned meth-

ods at a time when telehealth has moved towards

generic readily available sensors, available for dif-

ferent applications, and wearables. Recent advances

in smartphone and smartwatch technology allow the

use of these everyday devices as smart systems for

cough monitoring. These devices can easily acquire

the audio signal for analysis as well as incorporate

other sensors to capture movement associated with

a cough episode. However, in order to use a smart-

phone as a cough monitor, all the other features of

the device need to be kept functional without being

compromised by the monitoring application. No user

would want to use an application that reduces battery

life to 2 or 3 hours. The complexity of the opera-

tions required for cough detection and analysis must

therefore be taken into account, especially when deep

learning is involved. Existing mobile solutions to date

have not focused on efﬁcient implementations to re-

duce battery consumption and do not guarantee con-

tinuous real-time monitoring.

This paper proposes an efﬁcient cough detector

designed for real-time monitoring on low-end compu-

tational devices, including smartphones. Explainable

Artiﬁcial Intelligence (XAI) is initially employed to

identify salient regions in audio spectrograms that a

convolutional neural network (CNN) considers mean-

ingful for cough detection. After that, an optimized

CNN is designed based on these salient regions as

inputs. Results show that the detection performance

achieved by the optimized models is comparable to

that of the non-optimized ones, while computation

times and memory footprint are signiﬁcantly reduced.

The structure of the paper is organized as fol-

lows: Section 2 presents the materials employed in

the study. section 3 explains the methodology applied

for cough identiﬁcation and the different optimiza-

tions applied Results are presented and discussed in

section 4. Finally, section 5 summarizes the conclu-

sions extracted from the study.

2 MATERIALS

Our group of subjects consists of 20 patients aged be-

tween 23 and 87 years (9 women, 11 men) with the

following respiratory pathologies: Acute respiratory

disease (ARD, 3), pneumonia (4), chronic obstruc-

tive pulmonary disease (COPD,6), lung cancer (3),

and others such as asthma, bronchiectasis or sarcoido-

sis (remaining patients). An observational study of

cough evolution during 24 hours of a patient’s nor-

mal life was carried out. Twenty-four hours of audio

from ambulatory patients in the Palencia Health Area

(Spain) were prospectively recorded. The database

consists of approximately 15,000 cough events cor-

responding to the subjects mentioned above. A Sony

Xperia Z2 Android smartphone was used to collect

the data using 16-bit WAV format at 44.1 kHz. Each

patient was instructed to store the device as they

would normally do to capture samples in a real en-

vironment where noise may be encountered. These

noisy signals were used as non-cough events for com-

parison with the recorded coughs, getting two sepa-

rate sets. The study was carried out in accordance

with the Declaration of Helsinki and was approved by

the

Area de Salud de Palencia Research Ethics Com-

mittee (REC number: 2023/027). Subjects provided

their informed consent before the recordings

3 METHODS

Cough detection was initially carried out using a non-

optimized CNN in two steps:

1. Audio Signal Preprocessing: The 44.1 kHz au-

dio signal collected by the smartphone was

transformed into spectrograms to obtain a time-

frequency representation of the signal. The fol-

lowing process was carried out for this purpose:

• The audio signal is ﬁrst 5x downsampled to

8.82 kHz. Then, the power spectral den-

sity (PSD) is calculated, using 10 ms non-

overlapping windows with Hanning weighting.

• Once these PSDs are obtained, they are con-

catenated over 1s intervals, forming a set of

45x100 spectrograms.

• Finally, these time-frequency representations

undergo logarithmic normalization.

2. Cough Window Identiﬁcation: To distinguish

spectrograms corresponding to cough events, we

devised a custom CNN from scratch. This net-

work begins with a convolutional layer contain-

ing 32 ﬁlters, each with 2 × 2 kernels, and uti-

lizes a ReLU activation function. Following this

is a 2 × 2 Max-Pool layer to reduce dimension-

ality, accompanied by a dropout layer to mitigate

overﬁtting. This architecture is then repeated with

the number of ﬁlters doubled. The ﬁnal sequence

of layers before the output consists of a convo-

lutional layer with 128 ﬁlters, a dropout layer,

another convolutional layer with 256 ﬁlters, and

a concluding Max-Pool layer. The output from

this setup is resized to ﬁt the output architecture,

HEALTHINF 2025 - 18th International Conference on Health Informatics

492

45x100x1

45x100x32

11x25x128

22x50x64

11x25x256

5x12x256

512x1

2x1

CONVOLUTIONAL LAYER. ACTIVATION:RELU FILTER SIZE: 2X2

MAX-POOL 2X2

DROPOUT 10%

FULLY CONNECTED LAYER.

ACTIVATION: RELU

SOFTMAX LAYER

INPUT SPECTROGRAM

Figure 1: CNN architecture for cough detection.

which comprises two Fully-Connected layers: the

ﬁrst with 512 neurons followed by a ReLU activa-

tion, and the second with two neurons and a soft-

max activation. Training was carried out using the

AdaMax optimizer (α = 0.002) (Kingma and Ba,

2015), batch size= 128 and 50 epochs. To avoid

overﬁtting, we employed a validation set compris-

ing 20% of the training dataset, which itself con-

stituted 80% of the entire audio clips collection.

This procedure is repeated on 5 cross-validation

folds. Figure 1 illustrates a block diagram of the

employed CNN.

To enable the CNN detector to function directly on

smartphones, we utilized TensorFlow Lite (TFLite), a

framework that converts TensorFlow models trained

in desktop or cloud environments into a smaller, op-

timized format suitable for mobile and embedded de-

vices. This conversion process reduces the model’s

size but often results in decreased performance in

terms of accuracy, sensitivity, and speciﬁcity. To mit-

igate this loss, we implemented the following opti-

mizations:

1. Using Explainable AI (XAI) to adjust the network

input.

2. Fine-tune the network’s hyperparameters.

3. Applying TFLite’s own optimization techniques

after the initial optimizations were completed.

3.1 Optimizing the CNN Input Using

XAI

Once the initial CNN was trained, we analyzed the

network’s behavior to identify signiﬁcant regions for

cough detection in the spectrograms. For this pur-

pose, we employed the methodology proposed in

(Amado-Caballero et al., 2023) for ADHD diagnosis.

We selected those spectrograms in the validation set

for which the original network output identiﬁed cough

with conﬁdence levels (soft output, posterior proba-

bility) higher than 90%.For each spectrogram meeting

this criterion, we computed its occlusion map.(Zeiler

and Fergus, 2014). The process of generating the oc-

clusion map involves deﬁning a mask which is placed

over the input spectrogram hiding part of their infor-

mation to the CNN. The class probability of the CNN

fed by this input allows estimating the importance of

the hidden area for classiﬁcation (higher probability

means lower importance). The process is repeated

until the whole spectrogram is covered. The output

values are stored in a matrix that is resized to have the

same dimensions as the original input. Once we have

these oclussion maps, we used them to adapt the input

set to improve the performance of the network. This

process of adaptation was deﬁned as follows :

• First, each spectrogram is weighted with its cor-

responding occlusion map. Those areas in the

weighted spectrogram whose values were higher

than 0.7, deﬁned a mask with the relevant spec-

trogram information. Some examples of these

weighted spectrograms are shown in ﬁgure 3.

• Once we have all the masks, we pixel-averaged

them to obtain a ﬁnal mask highlighting the rel-

evant spectrogram information for all patients.

This average mask, highlights where the highest

concentration of useful information for cough de-

tection is located.

The obtained average mask can be applied to re-

duce the dimensionality of the input spectrogram and

consequently simplify the network. The masks in ﬁg-

ures 2 and 3 show relevant areas with meaningful en-

ergy at different time stamps. We deﬁned two regions

with different energy content. In order to do that, we

Optimization of a Deep-Learning-Based Cough Detector Using eXplainable Artiﬁcial Intelligence for Implementation on Mobile Devices

493

90%

Frequency

(KHz)

Time (s)

70%

Figure 2: Illustration of the process to obtain weighted spectrograms.

Figure 3: Examples of weighted spectrograms.

set two thresholds for the weighted spectrogram val-

ues, which respectively deﬁned two timestamps af-

ter which, all the values were below those thresholds.

Speciﬁcally, after t = 0.6s, no energy above 0.3937

was present. After t = 0.8s, all the values were be-

low 0.1087. This process led to two sets of spectro-

grams with respective sizes 45x76 (trim1) and 45x59

(trim2). This process is explained in ﬁgure 4.

3.2 Fine-Tuning of Hyperparameters

This is a well-known technique in deep and machine

learning that consists of optimizing the hyperparam-

eter values of a model to improve its performance on

a speciﬁc task. Unlike model parameters, which are

learned from training data, hyperparameters are pre-

deﬁned settings that inﬂuence the overall behavior of

the model.

In this scenario, we aim at improving the perfor-

mance of the network in terms of accuracy and pro-

cessing time, in view of the implementation of Ten-

sorFlow Lite. For this purpose, we have focused on

optimizing the dropout rate, the number of ﬁlters of

the convolutional layers, the learning rate, and the

number of dense units.

To achieve this, a grid search strategy was em-

ployed to explore a predeﬁned range of hyperparam-

eters systematically. Speciﬁcally, for the ﬁrst convo-

lutional layer, the number of ﬁlters was tested with

values of 32 and 64, and the kernel size was evalu-

ated with dimensions of (3, 3) and (5, 5), alongside

dropout rates of 0.1 and 0.2. For the second convolu-

tional layer, ﬁlters ranged from 64 to 128, kernel sizes

were (3, 3) and (5, 5), and dropout rates were 0.1 and

0.2. Similarly, the third convolutional layer tested ﬁl-

ter values of 128 and 256, kernel sizes of (3, 3) and

(5, 5), and dropout rates of 0.1 and 0.3. Fully con-

nected layers were ﬁne-tuned with units of 128 and

256, and the optimizers ’adam’ and ’rmsprop’ were

considered with learning rates of 0.001 and 0.01. The

optimal combination of hyperparameters was deter-

mined based on the best validation accuracy, ensuring

that the network could generalize effectively while

maintaining efﬁcient processing for deployment.

3.3 TFLite

TFLite, short for TensorFlow Lite, is a lightweight

version of TensorFlow (Abadi et al., 2015) designed

speciﬁcally for mobile and embedded devices. It is

used to deploy machine learning models on devices

with limited computational resources, such as smart-

phones, IoT devices, and microcontrollers. TFLite

models are optimized for efﬁciency, allowing them to

HEALTHINF 2025 - 18th International Conference on Health Informatics

494

Masks

Averaged Mask

Trim 1

Trim 2

Pixel-Averaged

Oclusion Map

Figure 4: Diagram of the procedure to obtain trimmed data.

run inference tasks quickly and with minimal memory

footprint.

TFLite offers several optimization options to

achieve efﬁcient model deployment. One common

optimization is quantization, which reduces the preci-

sion of model weights and activations from ﬂoating-

point numbers to integers, thereby reducing memory

usage and improving inference speed. Another opti-

mization technique is model pruning, which involves

removing unnecessary weights and connections from

the model to reduce its size while preserving accu-

racy. Additionally, TFLite provides optimizations for

default performance metrics, minimizing model size,

and reducing inference latency, further enhancing the

efﬁciency of model deployment.

Before optimizing a model with TFLite, it is typi-

cally optimized in its original format. Once the model

is optimized and trained, it can be converted to the

TFLite format using tools provided by TensorFlow.

During the conversion process, various optimizations

can be applied to further improve the model’s efﬁ-

ciency such as default, size, and latency optimiza-

tions. Default optimizations aim to balance accu-

racy and efﬁciency, providing a good compromise be-

tween model size and inference speed. Size optimiza-

tions focus on reducing the model’s memory footprint

by applying techniques like quantization and prun-

ing. Latency optimizations prioritize reducing infer-

ence time, often at the expense of model accuracy,

by employing techniques such as model architecture

modiﬁcations and hardware acceleration utilization.

4 RESULTS AND DISCUSSION

In this section, we present the ﬁndings in tabular

format, corresponding to the optimized models,

their hyperparameters, and their performance in

terms of sensitivity, speciﬁcity, accuracy, processing

time, model size -useful for mobile application

deployment-, and area under the receiver operating

characteristic curve (AUC). These tables provide a

comprehensive overview of the models’ effectiveness

and efﬁciency, highlighting the key metrics that

indicate their suitability for mobile solutions without

losing quality properties.

In table 1, the initial CNN model described in Sec-

tion 3 is compared with two optimized models for

which, only the XAI-enabled optimization was per-

formed. The model format remains consistent; only

the input data change. The non-optimized option em-

ploys the original dataset, whereas Trim1 and Trim2

models respectively use 45 × 76 and 45 × 59 spectro-

grams. The models’ training and inference calcula-

tions were conducted using 5-fold cross-validation on

a NVIDIA RTX A5000 GPU with 24 GB of RAM.

As observed, accuracy values remain quite

consistent, while the memory footprint decreases

considerably. Furthermore, the processing time is

signiﬁcantly reduced (to less than a half for trim1.

This reduction in processing time is particularly

valuable as the aim is to integrate the model into

a mobile application. However, it is worth noting

that when converting the model to TFLite for mobile

deployment, the processing time tends to increase.

Therefore, the primary objective is to minimize

processing time to ensure that the overall system

processing time remains manageable once the model

is converted to TFLite and integrated into the mobile

application. Additionally, while there is a slight

decrease in sensitivity, both speciﬁcity and AUC

remain stable.

Optimization of a Deep-Learning-Based Cough Detector Using eXplainable Artiﬁcial Intelligence for Implementation on Mobile Devices

495

Table 1: Results after XAI optimization.

Model Accuracy MBytes Processing

time(ms)

Sensitivity Speciﬁcity AUC

Not optimized 0.899881 9.030396 0.385252803 0.901223 0.898539 0.963786

Trim 1 0.8907 6.780409 0.1172742286 0.8876 0.893853 0.9580

Trim 2 0.8967 5.280396 0.210223381 0.889677 0.9038 0.9610

Table 2: Results for optimized models after XAI and hyperparameter optimization.

Model Accuracy MBytes Processing time(ms) Sensitivity Speciﬁcity AUC

T1G 0.888663077 1.120727539 0.043483994 0.87236741 0.904958761 0.954158359

T2G 0.89151423 1.062339783 0.032692982 0.876838733 0.906189746 0.957366015

Table 2, presents results for hyperparameter op-

timization using 5-fold cross-validation on the same

RTX A5000 GPU. The table’s construction involved

utilizing Keras Tuner to ﬁne-tune hyperparameters.

The 5 models underwent training across 5 data folds,

followed by performance evaluation, leading to the

calculation of mean performance metrics for different

hyperparameter sets. Models were trained using

data from Trim1 (coded as T1G in table) and Trim2

(coded as T2G ). In this optimization process, nearly

all the performance ﬁgures remain constant except

for the model’s memory footprint and processing

time, which have both decreased drastically, yielding

highly favorable results. Among the two trims, Trim2

appears to be the better option. Given these ﬁndings,

it is logical to leverage data trims for model optimiza-

tion, with Trim2, being smaller and yielding superior

results, chosen for further optimization efforts. Table

3 summarizes the ﬁnally selected hyperparameters

for the best performing models.

Table 4 shows performance metrics after TFLite

optimization of the best Trim2 model. Metrics were

obtained using a desktop computer with Intel(R)

Core(TM) i7-7700 CPU @ 3.60GHz and 16 GB

RAM. It can be seen that the most effective approach

for optimizing the TFLite model appears to be the

one without any TFLite-speciﬁc optimizations. This

method involves optimizing the model in HDF5

format ﬁrst and then converting it to TFLite. The op-

timized models (default, size, and latency) do achieve

substantial reductions in model size, decreasing from

about 0.51 MB to approximately 0.13 MB. Although

there is a slight decrease in accuracy, which is barely

noticeable, it signiﬁcantly improves processing time

compared to other optimization methods. Specif-

ically, the “none” approach results in the fastest

processing time (0.14 ms), which is notably quicker

than the optimized models that take around 0.25

seconds. This faster processing time is critical in

applications where speed is a priority, even if it

means maintaining a larger model size. Additionally,

it is noteworthy that all optimization approaches yield

very similar values for speciﬁcity, sensitivity and

AUC. This minimal impact on accuracy and other

performance metrics across different optimizations

implies that the intrinsic quality of the model remains

robust regardless of the TFLite-speciﬁc optimizations

applied.

As discussed above, optimizing hyperparame-

ters in convolutional neural networks (CNNs) can

signiﬁcantly impact their performance. Comparing

the default conﬁguration with the optimized one

reveals key changes that enhance the network’s

efﬁciency and adaptability. The default setup uses

the Adam optimizer with a dropout rate of 0.1, no

regularization, and a higher number of ﬁlters in

the convolutional layers, ranging from 32 to 128.

In contrast, the optimized conﬁguration employs

RMSprop, which dynamically adjusts the learning

rate, and reduces the dropout rate to 0.001, allowing

more information to be retained during training.

Additionally, this setup reduces the number of ﬁlters

to 16 in the initial layers and slightly increases

them in the ﬁnal layer, decreasing computational

complexity while maintaining the ability to capture

essential features. The kernel size is consistently kept

at 2x2 in the optimized setup, aiding in capturing ﬁne

local details. These adjustments make the network

more efﬁcient, faster to converge, and less prone to

overﬁtting compared to the default conﬁguration,

making it better suited for our speciﬁc application.

Table 5, shows the performance of the previously

mentioned TFlite models (Default, T1G, and T2G)

tested in Android Studio. We observe the same be-

havior that has been explained throughout this sec-

tion. While it is true that the execution time is higher

on the emulator compared to the computer, this is

due to the computer being much more powerful than

the Android Studio emulator. The accuracy decreases

HEALTHINF 2025 - 18th International Conference on Health Informatics

496

Table 3: Model Parameters.

Model Filters1 Kernel Size1 Dropout Rate1 Filters2 Kernel Size2 Dropout Rate2 Filters3

modelT1G 16 2 0.001 16 2 0.001 16

modelT2G 16 2 0.001 16 2 0.001 16

Model Kernel Size3 Dropout Rate3 Filters4 Kernel Size4 Units Optimizer Learning Rate

modelT1G 2 0.001 32 2 32 rmsprop 0.001

modelT2G 2 0.001 32 2 224 rmsprop 0.001

Table 4: Results after XAI, hyperparameters, and TFLite Optimization.

Name Accuracy Mbytes Proc Time(ms) Sensitivity Speciﬁcity AUC

None 0.891436808 0.5086212 0.143704203 0.876648823 0.905753472 0.95733939

Default 0.891511336 0.1345244 0.256185187 0.877022051 0.90552434 0.957337036

Size 0.891511336 0.1345244 0.25754078 0.877022051 0.90552434 0.957337036

Latency 0.891511336 0.1345244 0.254893658 0.877022051 0.90552434 0.957337036

Table 5: Accuracy and computation times for Android Stu-

dio.

Model Time (s) Accuracy

TFlite, not optimized 0.00381101875 0.857818278

T1G 0.00330349268 0.864615958

T2G 0.00296955678 0.867987304

slightly, but this reduction is not very signiﬁcant. By

using the Proﬁler tool from Android Studio, which

is located inside the emulator, it can be seen that the

memory used for our model, Native memory, occu-

pies approximately 170 MB of the 256 MB used by

the application. Furthermore, the model utilizes about

38% of the CPU. A consistent observation is that the

Trim2 (T2G) model, representing the second trim,

performs best. It outperforms both the Trim1 (T1G)

model and the original model directly converted to

TFlite (Default). This indicates that the trimming pro-

cess yields positive results, both on a powerful com-

puter and on the Android Studio emulator.

5 CONCLUSIONS

This study demonstrates the effectiveness of using

XAI techniques to enhance the performance of neu-

ral networks. By highlighting the most relevant spec-

tral regions for cough detection, we created a more

focused dataset that improved processing time and re-

duced data size. Coupled with a thorough optimiza-

tion of the CNN’s hyperparameters, these techniques

facilitated the deployment of the network on devices

with limited hardware, such as smartphones, with-

out compromising efﬁciency. Although preliminary

tests on smartphone emulators are promising, further

live testing is necessary to evaluate performance un-

der real-world conditions.

ACKNOWLEDGEMENTS

This work is part of projects TED2021-

131536B-I00, funded by Spanish

MCIN/AEI/10.13039/501100011033 and EU’s

“NextGenerationEU”/PRTR and GRS 2837/C/2023

funded by Gerencia Regional de Salud, Junta de

Castilla y Le

on, Spain. The work was also partly

supported by the EU Horizon 2020 Research and

Innovation Programme under the Marie Sklodowska-

Curie grant agreement No 101008297. This article

reﬂects only the authors’ view. The European Union

Commission is not responsible for any use that may

be made of the information it contains.

REFERENCES

Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z.,

Citro, C., Corrado, G. S., Davis, A., Dean, J., Devin,

M., et al. (2015). TensorFlow: Large-scale machine

learning on heterogeneous systems. Software avail-

able from tensorﬂow.org.

Amado-Caballero, P., de-la Higuera, P. C., Alberola-L

øpez,

S. ., de Llano, J. M. A., s, J. A. L.-V., and Alberola-

øpez, C. (2023). Insight into ADHD diagnosis with

deep learning on actimetry: Quantitative interpreta-

tion of occlusion maps in age and gender subgroups.

Artiﬁcial Intelligence in Medicine, 143:102630.

Amrulloh, Y. A., Abeyratne, U. R., Swarnkar, V., Triasih,

R., and Setyati, A. (2015). Automatic cough segmen-

tation from non-contact sound recordings in pediatric

wards. Biomedical Signal Processing and Control,

21(Supplement C):126 – 136.

Audit Scotland (2011). A review of Telehealth in Scotland.

Technical report, Audit Scotland.

Belli, S., Balbi, B., Prince, I., Cattaneo, D., Masocco, F.,

Zaccaria, S., Bertalli, L., Cattini, F., Lomazzo, A.,

Dal Negro, F., et al. (2020). Low physical functioning

and impaired performance of activities of daily life in

Optimization of a Deep-Learning-Based Cough Detector Using eXplainable Artiﬁcial Intelligence for Implementation on Mobile Devices

497

COVID-19 patients who survived hospitalisation. Eu-

ropean Respiratory Journal, 56(4).

Birring, S. S., Fleming, T., Matos, S., Raj, A. A., Evans,

D. H., and Pavord, I. D. (2008). The leicester

cough monitor: preliminary validation of an auto-

mated cough detection system in chronic cough. Eu-

ropean Respiratory Journal, 31(5):1013–1018.

Drugman, T., Urbain, J., Bauwens, N., Chessini, R., Valder-

rama, C., Lebecque, P., and Dutoit, T. (2013). Ob-

jective study of sensor relevance for automatic cough

detection. IEEE Journal of Biomedical and Health In-

formatics, 17(3):699–707.

European Commission (2018). Market study on

telemedicine. Technical report, European Union.

H. Pinnock et al. (2013). Effectiveness of telemonitoring in-

tegrated into existing clinical services on hospital ad-

mission for exacerbation of chronic obstructive pul-

monary disease: researcher blind, multicentre, ran-

domised controlled trial. BMJ, 347.

Kingma, D. P. and Ba, J. (2015). Adam: A method for

stochastic optimization. In Bengio, Y. and LeCun,

Y., editors, 3rd International Conference on Learn-

ing Representations, ICLR 2015, San Diego, CA, USA,

May 7-9, 2015, Conference Track Proceedings.

Laguarta, J., Hueto, F., and Subirana, B. (2020). COVID-

19 artiﬁcial intelligence diagnosis using only cough

recordings. IEEE Open Journal of Engineering in

Medicine and Biology, 1:275–281.

Mingyu You et al. (2022). Automatic cough detection

from realistic audio recordings using C-BiLSTM with

boundary regression. Biomedical Signal Processing

and Control, 72:103304.

Tena, A., Claria, F., and Solsona, F. (2022). Automated

detection of covid-19 cough. Biomedical Signal Pro-

cessing and Control, 71:103175.

Vizel, E., Yigla, M., Goryachev, Y., Dekel, E., Felis, V.,

Levi, H., Kroin, I., Godfrey, S., and Gavriely, N.

(2010). Validation of an ambulatory cough detection

and counting application using voluntary cough under

different conditions. Cough, 6(1):3.

World Health Organisation (2017). The Global Impact of

Respiratory Disease. European Respiratory Society.

World Health Organisation (2021). WHO Coronavirus

(COVID-19) Dashboard. https://covid19.who.int/.

Zeiler, M. D. and Fergus, R. (2014). Visualizing and under-

standing convolutional networks. In European confer-

ence on computer vision, pages 818–833. Springer.

HEALTHINF 2025 - 18th International Conference on Health Informatics

498