Assessing Signal Noise Effects on Machine Learning Models

for ECG-Based Cardiac Diagnosis

Emanuela Guglielmi

, Davide Donato Russo

, Pasquale Trinchese

Gennaro Laudato

, Simone Scalabrino

, Gianluca Testa

and Rocco Oliveto

Department of Bioscience and Territory, University of Molise, Italy

Department of Medicine and Health Sciences, University of Molise, Italy

Keywords:

Signal Noise, Machine Learning, Empirical Study, Atrial Fibrillation, Ventricular Tachycardia.

Abstract:

The Internet of Medical Things (IoMT) plays a vital role in healthcare by enhancing preventive careand chronic

disease management through continuous monitoring using smart sensors and wearable devices. However, the

reliability of IoMT systems can be compromised by noise in the acquired vital signals, which can negatively

impact the accuracy of Machine Learning (ML) models used for anomaly detection. This study evaluates the

impact of various disturbances on the performance of ML models in predicting cardiac conditions, with a fo-

cus on assessing the reliability and effectiveness of these systems in real-world applications. We investigated

the effects of three types of noise—baseline wander, muscle artifact noise, and electrode motion artifact—on

the performance of two advanced ML models designed to predict cardiac conditions, speciﬁcally atrial ﬁbril-

lation (AF) and ventricular tachycardia (VT). Our analysis centered on how different noise intensities (i.e., the

“loudness” of the noise) and durations (i.e., the length of time the noise persists) impacted the classiﬁcation

performance of these models. The VT detection model showed robust performance, with minimal impact even

under intense and prolonged noise conditions. In contrast, AF detection was affected by all types of noise,

with classiﬁcation accuracy decreasing by up to ∼59% in the most challenging scenarios.

1 INTRODUCTION

The Internet of Things (IoT) represents a major tech-

nological advancement in computing and communi-

cation, fueled by ongoing developments in wireless

sensor technology and nanotechnology. Among its

various applications, healthcare stands out as a crit-

ical area of impact.

The Internet of Medical Things (IoMT) is instru-

mental in preventive care and chronic disease man-

agement, enabling the early detection of symptoms

and health risks through continuous data collection

from patients. Smart sensors have signiﬁcantly en-

hanced healthcare management by improving efﬁ-

ciency, while wearable devices that monitor various

physiological parameters have become increasingly

user-friendly, requiring no specialized training. These

devices are capable of tracking metrics such as blood

oxygen levels, insulin levels, blood pressure, temper-

ature, and chemical balances.

Numerous studies have investigated the role of

IoMT devices, which are central to telemedicine. For

instance Balestrieri et al. (2019) introduced ATTI-

CUS, an IoMT system that continuously monitors vi-

tal signs such as ECG, respiration, and temperature.

These signals are analyzed by a Decision Support

System (DSS) that employs Machine Learning (ML)

models trained to detect anomalies, sending alerts to

medical staff when such anomalies are detected.

In real-world applications, IoMT systems like AT-

TICUS frequently face challenges related to noise in

acquired vital signals. For example, patient move-

ment—such as walking or running—can cause fric-

tion between electrodes and skin, leading to measure-

ment artifacts. Common types of noise in ECG sig-

nals include: (1) Baseline wander, characterized by

low-frequency ﬂuctuations from movement; (2) Mus-

cle artifact noise, caused by high-frequency myogenic

activity; and (3) Electrode motion artifacts, marked

by high-intensity spikes due to temporary electrode

detachment.

These artifacts are particularly problematic in con-

tinuous anomaly detection, where noise increases the

likelihood of ML models generating false positives

(i.e., detecting non-existent anomalies). This not only

burdens medical personnel with ﬁltering out false

alarms but also reduces the efﬁciency of monitoring

systems.

Although prior research has proposed noise re-

moval methods Hamil et al. (2022); Sadr et al. (2018),

458

Guglielmi, E., Russo, D. D., Trinchese, P., Laudato, G., Scalabrino, S., Testa, G. and Oliveto, R.

Assessing Signal Noise Effects on Machine Learning Models for ECG-Based Cardiac Diagnosis.

DOI: 10.5220/0013126800003911

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 18th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2025) - Volume 2: HEALTHINF, pages 458-466

ISBN: 978-989-758-731-3; ISSN: 2184-4305

most studies have focused on speciﬁc noise types or

detectors Christiansen et al. (1996); Oster and Clif-

ford (2015). There is a need for comprehensive anal-

ysis on how different real-world noise types affect ML

model accuracy, which is critical to improving the re-

liability and effectiveness of IoMT systems in prac-

tice.

In this paper, we evaluate the impact of differ-

ent noise types on the performance of two ML mod-

els for predicting cardiac conditions: Atrial Fibrilla-

tion (AF) and Ventricular Tachycardia (VT). Specif-

ically, we analyze baseline wander, muscle artifact

noise, and electrode motion artifacts. Our experi-

ments were conducted using datasets from the Phy-

sioNet repository, including the MIT-BIH Atrial Fib-

rillation Database (AFDB) for AF detection and the

MIT-BIH Normal Sinus Rhythm (NSRDB) and Ma-

lignant Ventricular Ectopy (VFDB) databases for VT

detection.

To simulate real-world noise conditions, we began

with clean signals from the databases and artiﬁcially

introduced three types of noise using the MIT-BIH

Noise Stress Test Dataset (NSTDB). We tested var-

ious noise intensities, ranging from -12dB to 12dB,

and applied noise over three different durations (25%,

50%, and 75% of the signal length).

Additionally, to assess the practical impact of

noise, we employed state-of-the-art noise-removal

techniques after introducing the artiﬁcial noise, mim-

icking the typical preprocessing steps used in IoMT

signal analysis. The performance of two state-of-the-

art ML models for AF detection Zhou et al. (2015)

and VT detection Mohammad-Taheri et al. (2016)

was then evaluated.

Our ﬁndings show that the accuracy of the AF de-

tection model gradually decreases as both the inten-

sity and duration of noise increase. This pattern was

consistent across all noise types, with electrode mo-

tion artifacts having the most signiﬁcant negative im-

pact on the AF detector’s performance. In contrast,

the VT detection model demonstrated exceptional ro-

bustness, showing little to no degradation from any

type of noise, regardless of its intensity.

These results suggest that, while the current VT

detection model Mohammad-Taheri et al. (2016) is

highly resilient, further research should prioritize im-

proving the noise robustness of AF detection models

to enhance their reliability in IoMT devices.

2 BACKGROUND AND RELATED

WORK

Over the years, many researchers have leveraged elec-

trocardiogram (ECG) signals to diagnose cardiac ab-

normalities such as VT and AF Kaplan Berkaya et al.

(2018); Guandalini et al. (2019); Ramkumar et al.

(2018); Mandala and Di (2017). Among the tech-

niques used for ECG signal analysis, the application

of machine learning Strik et al. (2023) and deep learn-

ing Murat et al. (2021) has gained signiﬁcant atten-

tion. Additionally, there is growing interest in uti-

lizing not only specialized medical devices but also

general-purpose devices such as smartwatches Burke

et al. (2020) and smartphones Chong et al. (2018) for

this purpose.

2.1 Ventricular Tachycardia

Ventricular Tachycardia (VT) is a type of tachycardia

that originates from rapid, abnormal electrical activ-

ity in the ventricles Wellens (2001). Since VT is a de-

ﬁbrillable rhythm, it can be corrected and restored to

normal sinus rhythm if cardiopulmonary resuscitation

(CPR) and deﬁbrillation are promptly administered.

Consequently, the reliable and rapid detection of VT

is critical for any medical device or system managing

cardiac arrhythmias. Several approaches have been

proposed for the detection and classiﬁcation of VT

from ECG signals. Rajeshwari and Kavitha (2021)

provide a comprehensive review of the current state-

of-the-art techniques for feature extraction and clas-

siﬁcation algorithms used in ventricular ﬁbrillation

(VF) detection. Ramakrishnan et al. (2017) integrated

these techniques into an Automatic External Deﬁbril-

lator (AED), which delivers a shock when ventricular

ﬁbrillation (VF) or rapid VT is detected. Their al-

gorithm extracts features from slope, time, and fre-

quency domains to classify the rhythms and deter-

mine if a shock is necessary. Aparna et al. (2017);

Aparna and Sharma (2020) proposed an algorithm

that detects VT by analyzing the morphological fea-

tures of ECG signals and classifying them using sup-

port vector machines (SVM).

Among the various methods for detecting VT, one

of the most effective is the approach proposed by

Mohammad-Taheri et al. (2016). They developed

three algorithms based on analyzing the slope his-

togram, slope count, and slope complexity of ECG

signals. These methods analyze all slope values

within an 8-second window, comparing the distri-

butions of normal sinus rhythm (NSR) and VT sig-

nals. NSR typically exhibits regular patterns with

abrupt changes in slope in the QRS segments, result-

Assessing Signal Noise Effects on Machine Learning Models for ECG-Based Cardiac Diagnosis

459

ing in a super-Gaussian distribution, whereas VT sig-

nals display a broader range, often appearing Gaus-

sian or sub-Gaussian. The slope count method proved

to be the most effective, achieving an overall accu-

racy of over 96.5%. The ATTICUS platform utilizes

this algorithm, and the authors suggest a discrimina-

tive threshold count of 91 for distinguishing VT from

NSR Laudato et al. (2021b).

2.2 Atrial Fibrillation

Atrial Fibrillation (AF) is a prevalent and dangerous

cardiac condition and the leading cause of cardioem-

bolic stroke Migdady et al. (2021). Affecting mil-

lions globally, the early identiﬁcation of AF is cru-

cial for maintaining health. AF is associated with an

increased risk of stroke, heart failure, and mortality,

signiﬁcantly impacting the quality and longevity of

life. Although treatment strategies for AF exist, the

paroxysmal and often minimally symptomatic nature

of AF, especially in its early stages, presents a signiﬁ-

cant challenge for clinicians and researchers Pritchett

(1992). Therefore, there has been a critical need to

develop automated and continuous methods for accu-

rate AF detection.

Over the years, various automated AF detection

methods have been proposed, showing promising re-

sults through heart rhythm analysis Colloca et al.

(2013), support vector machines Mohebbi and Ghas-

semian (2008), machine learning Sepulveda-Suescun

et al. (2017), neural networks Xiong et al. (2017),

and deep learning Yuan et al. (2016). One of the

most accurate AF detection methods on the MIT-

BIH AF database is the approach proposed by Zhou

et al. (2015). Their AF detection algorithm consists

of three main steps. First, the heart rate sequence is

transformed into a symbolic sequence over a set inter-

val. Then, a probability distribution is created from

this sequence, and a simpliﬁed version of Shannon

entropy is applied to measure the information con-

tent. Finally, AF episodes are distinguished using a

predeﬁned threshold. After trial and error, a threshold

of 0.639 provided the best results, achieving 97.83%

sensitivity, 99.68% speciﬁcity, and an overall accu-

racy of 88.51%.

2.3 Impact of Noise on the Detection of

VT and AF

Noise in ECG signals has long been a signiﬁcant chal-

lenge in signal analysis, prompting researchers to de-

velop various strategies to mitigate its effects. Some

approaches involve detecting and removing noise be-

fore analysis Hamil et al. (2022); Sadr et al. (2018),

while others focus on creating robust detection al-

gorithms that can tolerate speciﬁc noise types within

reasonable levels and intensities Chong et al. (2018).

To our knowledge, no comprehensive studies

have systematically analyzed the impact of real-world

noise types on both AF and VT detection tech-

niques. The closest related works include those by

Christiansen et al. (1996) and Oster and Clifford

(2015). Christiansen et al. (1996) examined the effect

of residual noise levels on identifying patients with

sustained monomorphic ventricular tachycardia (VT)

post-myocardial infarction. They analyzed ECGs

from 16 patients with documented VT and 41 patients

without VT, using noise levels of 0.2µV and 0.4µV .

Their ﬁndings showed that signal-averaged ECGs per-

formed better at a noise level of 0.4µV compared to

0.2µV for identifying VT patients. While lowering

noise levels increased sensitivity, it signiﬁcantly re-

duced speciﬁcity. Unlike their work, our research fo-

cuses on more impactful noise types—baseline wan-

der, muscle artifact, and electrode motion—rather

than residual noise.

Oster and Clifford (2015) conducted an in-depth

analysis of AF detection algorithms under different

noise conditions and QRS detection accuracy. They

found a linear decrease in AF detection accuracy as

SNR decreased, and demonstrated that an automatic

signal quality index could maintain AF detection ac-

curacy above 95% when analyzing segments with a

median Spectral Quality Index (SQI) over 0.8. While

their study primarily addressed muscle artifact noise

and AF detection, our research expands the analysis

to include multiple noise types and detection meth-

ods, including VT.

3 EMPIRICAL STUDY DESIGN

The goal of our study is to assess the impact of var-

ious types of noise on the performance of machine

learning models designed to detect atrial ﬁbrillation

(AF) and ventricular tachycardia (VT). Speciﬁcally,

our investigation is guided by the following research

questions (RQs):

• RQ

: To what extent does noise affect the perfor-

mance of a state-of-the-art atrial ﬁbrillation detec-

tion model?

• RQ

: To what extent does noise impact the de-

tection accuracy of a state-of-the-art ventricular

tachycardia detection model?

aims to evaluate the robustness of highly accu-

rate rhythm detection methods, particularly those re-

lying solely on R-peak information, such as the atrial

HEALTHINF 2025 - 18th International Conference on Health Informatics

460

ﬁbrillation detector. In contrast, RQ

seeks to assess

the robustness of a precise ventricular tachycardia de-

tector, based on the ECG signal slopes.

3.1 Experimental Context

The context of our study involves ECG signals col-

lected from both healthy individuals and patients di-

agnosed with atrial ﬁbrillation and ventricular tachy-

cardia. For this purpose, we utilized three datasets:

the MIT-BIH Atrial Fibrillation Database (AF-DB)

for AF detection, the MIT-BIH Malignant Ventricular

Ectopy Database (VF-DB) for VT detection, and the

MIT-BIH Normal Sinus Rhythm Database (NSR-DB)

as a baseline dataset representing healthy individuals.

AF-DB contains 25 long-term ECG recordings

from individuals with atrial ﬁbrillation, each lasting

10 hours and sampled at 250 Hz. VF-DB includes

22 half-hour ECG recordings from individuals who

experienced episodes of sustained ventricular tachy-

cardia, ventricular ﬂutter, and ventricular ﬁbrillation.

NSR-DB consists of 18 long-term ECG recordings

from individuals with no signiﬁcant arrhythmia, in-

cluding 5 men (aged 26-45) and 13 women (aged 20-

50). Finally, the Noise Stress Test Database provides

12 half-hour ECG recordings and 3 half-hour noise

recordings simulating common noise sources in am-

bulatory ECG recordings.

3.2 Experimental Procedure

For atrial ﬁbrillation (AF) detection, we adopted the

methodology proposed by Zhou et al. (2015). This

method includes three key steps:

• Converting the heart rate (HR) sequence into a

symbolic sequence within a ﬁxed interval.

• Constructing a probability distribution from the

word sequence derived from the symbolic se-

quence.

• Utilizing a simpliﬁed version of Shannon entropy

to quantify the information content of the HR se-

quence, allowing binary classiﬁcation of ECG sig-

nals as AF or non-AF.

This approach was selected for its strong perfor-

mance, low computational complexity, and presumed

robustness due to its focus on R-peaks, which are less

susceptible to signal noise compared to other ECG

features, such as the P-wave Laudato et al. (2021a).

For ventricular tachycardia (VT) detection,

we implemented the methodology proposed by

Mohammad-Taheri Mohammad-Taheri et al. (2016).

This approach involves three algorithms based on

analyzing the slope of the ECG signals:

• Analyzing the number of QRS peaks that exceed a

predeﬁned threshold within an 8-second window.

• Counting these peaks to establish a boundary

value that can correctly classify more than 96%

of the signals.

• Using a threshold of 91 peaks, as identiﬁed in the

ATTICUS platform, to discriminate between VT

and normal sinus rhythm (NSR) Laudato et al.

(2021b).

In the context of our study, we analyzed the effects of

three distinct types of noise:

• Baseline Wander (BW): Caused primarily by res-

piration and body movements.

• Muscle Artifact (MA): Resulting from the electri-

cal activity of muscles.

• Electrode Motion Artifact (EM): Induced by tem-

porary loss of electrode adhesion to the skin..

To address our research questions, we injected these

noise types into ECG sequences before processing

them with the respective detectors. We manipulated

two parameters for each noise type: (i) duration (the

percentage of the ECG signal affected by noise); and

(ii) intensity (the magnitude of the disturbance).

We tested three duration levels for each

noise/methodology combination: 25%, 50%,

and 75%. For negative instances (healthy signals),

noise was uniformly distributed throughout the

signal. For positive instances (signals with AF or

VT), noise was introduced starting from a randomly

chosen point within the segment where the AF/VT

event occurs. This approach helps determine if

the noise can obscure the presence of an AF/VT

event when applying the detector. To ensure robust

results, each analysis was repeated ten times, with

the average results reported. To inject the noise in

the signals, we used noise recordings from the Noise

Stress Test Database for the three noise types. The

noise signal was resampled to match the desired

window length and added to a speciﬁc segment of

the ECG signal. The noise was scaled according to

a speciﬁed signal-to-noise ratio (SNR) to simulate

realistic conditions. The noise pattern was then

applied to the ECG signal by adding the noise signal

to the ECG data. For intensity, we used the SNR scale

from Mohd Apandi et al. (2020), which includes

these values: [-12, -6, -3, 0, +3, +6, +12] dB. We

evaluated each detection approach (with and without

noise) at the speciﬁed noise durations and intensities.

Sensitivity and speciﬁcity were measured for each

model/noise combination. Figure 1 illustrates an

example of a 2-second ECG signal fragment from the

NSRDB dataset, showing the original signal and the

signal affected by BW, MA, and EM noise types.

Assessing Signal Noise Effects on Machine Learning Models for ECG-Based Cardiac Diagnosis

461

Figure 1: Original ECG signal and the version of the ECG

affected by Baseline Wander (BW), Muscle Artifact (MA),

and Electrode Motion (EM) noise types at -12 dB intensity

and 25% duration.

For each combination of detector (AF/VT), noise

type, duration, and intensity, we report the metrics:

• Sensitivity or Recall (True Positive Rate): Pro-

portion of actual positives correctly identiﬁed by

the model. Calculated as:

Sensitivity =

T P

T P + FN

High sensitivity indicates effective detection of

true positive cases.

• Speciﬁcity (True Negative Rate): Proportion of

actual negatives correctly identiﬁed.

Speciﬁcity =

T N

T N + FP

High speciﬁcity means accurate identiﬁcation of

true negatives.

• Precision (Positive Predictive Value): Propor-

tion of positive identiﬁcations that are actually

correct.

Precision =

T P

T P + FP

High precision indicates a low rate of false posi-

tives among predicted positives, reducing the bur-

den on medical teams.

• F1-Score: Harmonic mean of precision and sen-

sitivity, balancing the trade-off between them.

F1-Score = 2 ×

Precision × Sensitivity

Precision + Sensitivity

A high F1-score reﬂects a good balance between

precision and sensitivity, minimizing false posi-

tives and false negatives.

• Accuracy: Overall correctness of the model, cal-

culated as:

Accuracy =

T P + T N

(T P + T N + FP + FN)

Accuracy provides a general measure of model

performance across both positive and negative

cases.

It is worth noting that after the introduction of ar-

tiﬁcial noise, we employed standard noise-removal

methods Kher (2019). Speciﬁcally, our preprocessing

procedure consists of the following steps:

• First Stage: A high-pass ﬁlter with a cutoff

frequency of 1 Hz is applied to remove low-

frequency components typical of baseline wander

(around 0.5 Hz) and to reduce signal drifting.

• Second Stage: A low-pass ﬁlter with a cutoff

frequency of 30 Hz is used to eliminate high-

frequency noise, such as interspersion and muscle

noise.

This preprocessing, in addition to being a standard

procedure for ECG analysis, is the same method used

in the VT detection approach Amann et al. (2005).

Additionally, it is important to mention that we ap-

plied the same noise-removal preprocessing for both

AF and VT detection. This consistency was neces-

sary because the baseline VT approach incorporated

this preprocessing as a core component. On the other

hand, the baseline AF approach, which begins with

a heart rate sequence, did not specify any particular

noise-removal method.

4 ANALYSIS OF THE RESULTS

This section presents the results obtained from the ex-

periments described in the previous section.

4.1 RQ

: Atrial Fibrillation Robustness

Table 1 shows the results for AF detection across

varying noise intensities and durations for all three

noise types. It is important to note that 12dB rep-

resents the absence of noise (i.e.,, the performance

HEALTHINF 2025 - 18th International Conference on Health Informatics

462

Table 1: Atrial Fibrillation detection performance across varying noise intensities, durations, and types.

Duration = 25%

Baseline Wander Muscle Artifact Electrode Motion

Intensity Speciﬁcity Precision Sensitivity Accuracy F1 Speciﬁcity Precision Sensitivity Accuracy F1 Speciﬁcity Precision Sensitivity Accuracy F1

-12dB 0.99 1.00 0.83 0.90 0.91 0.99 1.00 0.80 0.89 0.89 1.00 1.00 0.77 0.87 0.87

-6dB 0.91 0.93 0.92 0.91 0.92 0.90 0.92 0.89 0.90 0.91 1.00 1.00 0.77 0.87 0.87

-3dB 0.84 0.88 0.88 0.87 0.88 0.94 0.95 0.89 0.91 0.92 0.91 0.91 0.77 0.83 0.83

0dB 0.94 0.95 0.88 0.90 0.91 0.97 0.98 0.87 0.91 0.92 0.87 0.89 0.77 0.82 0.83

3dB 0.98 0.99 0.88 0.92 0.93 0.99 1.00 0.88 0.93 0.94 0.93 0.94 0.80 0.86 0.87

6dB 0.99 0.99 0.84 0.91 0.91 1.00 1.00 0.86 0.92 0.92 0.99 0.99 0.83 0.90 0.90

12dB 1.00 1.00 0.83 0.90 0.90 1.00 1.00 0.83 0.90 0.90 1.00 1.00 0.83 0.90 0.90

Duration = 50%

Baseline Wander Muscle Artifact Electrode Motion

Intensity Speciﬁcity Precision Sensitivity Accuracy F1 Speciﬁcity Precision Sensitivity Accuracy F1 Speciﬁcity Precision Sensitivity Accuracy F1

-12dB 0.17 0.61 0.99 0.63 0.75 0.73 0.83 1.00 0.88 0.90 1.00 1.00 0.53 0.74 0.70

-6dB 0.43 0.67 0.91 0.70 0.77 0.20 0.61 0.98 0.64 0.75 1.00 1.00 0.64 0.80 0.78

-3dB 0.74 0.82 0.91 0.84 0.86 0.49 0.70 0.95 0.75 0.81 1.00 1.00 0.67 0.81 0.80

0dB 0.94 0.95 0.87 0.90 0.91 0.82 0.87 0.91 0.87 0.89 0.90 0.90 0.70 0.79 0.79

3dB 0.99 0.99 0.83 0.90 0.90 0.99 0.99 0.89 0.93 0.94 0.95 0.95 0.75 0.83 0.83

6dB 1.00 1.00 0.83 0.90 0.90 1.00 1.00 0.85 0.92 0.92 0.94 0.94 0.81 0.87 0.87

12dB 1.00 1.00 0.83 0.90 0.90 1.00 1.00 0.83 0.90 0.90 1.00 1.00 0.83 0.90 0.90

Duration = 75%

Baseline Wander Muscle Artifact Electrode Motion

Intensity Speciﬁcity Precision Sensitivity Accuracy F1 Speciﬁcity Precision Sensitivity Accuracy F1 Speciﬁcity Precision Sensitivity Accuracy F1

-12dB 0.01 0.56 0.97 0.55 0.71 0.11 0.59 0.98 0.60 0.73 1.00 1.00 0.03 0.46 0.09

-6dB 0.70 0.80 0.91 0.82 0.85 0.16 0.60 0.99 0.62 0.75 1.00 1.00 0.13 0.51 0.22

-3dB 0.93 0.94 0.87 0.90 0.90 0.60 0.75 0.91 0.78 0.82 1.00 1.00 0.23 0.57 0.37

0dB 0.96 0.97 0.83 0.89 0.89 0.84 0.88 0.91 0.88 0.90 0.75 0.67 0.39 0.55 0.49

3dB 1.00 1.00 0.83 0.90 0.90 0.98 0.99 0.87 0.92 0.92 0.86 0.84 0.55 0.69 0.66

6dB 1.00 1.00 0.83 0.90 0.90 1.00 1.00 0.83 0.90 0.91 0.94 0.94 0.78 0.85 0.85

12dB 1.00 1.00 0.83 0.90 0.90 1.00 1.00 0.83 0.90 0.90 1.00 1.00 0.81 0.90 0.90

of the model without any noise), while -12dB indi-

cates the highest noise intensity. On the datasets used,

the AF detection approach achieved an accuracy of

0.90, a sensitivity of 0.83, and a speciﬁcity and preci-

sion of 1.00. The results indicate that different noise

types and their associated variables (intensity and du-

ration) affect the AF detection model in varying de-

grees. However, it is evident that both noise intensity

and, more critically, noise duration signiﬁcantly de-

crease the effectiveness of the model. As expected,

increasing the duration of any noise type signiﬁcantly

reduces the performance of the model. The most pro-

nounced degradation occurs with the highest noise in-

tensity (-12dB) and the longest duration (75%), par-

ticularly with electrode motion (EM) noise. In this

scenario, accuracy drops by 48.8%, and the F1 score

decreases by 90%. This suggests that the AF detec-

tion approach by Zhou et al. (2015) is highly inef-

fective when signals are heavily affected by EM ar-

tifacts. For baseline wander (BW) and muscle arti-

fact (MA) noise, the accuracy reductions (38.8% and

33.3%, respectively) and F1 score reductions (21.1%

and 18.8%) are more moderate.

A deeper analysis reveals that for BW and MA

noise, precision decreases while sensitivity (recall)

increases as noise levels rise. This implies that the

model tends to classify more instances as AF-positive,

albeit with an increasing number of false positives.

This behavior likely occurs because the AF detection

algorithm primarily identiﬁes anomalies in R-R inter-

vals, and both BW and MA noise may introduce peaks

that are mistaken for R-peaks, leading the model to

detect false anomalies. However, it is important to

note that the number of true positives remains stable

(or even increases slightly) as noise increases. This

suggests that while BW and MA noise lead to higher

false positive rates, the model still maintains a good

ability to detect actual AF cases. Conversely, with

EM noise, the opposite trend is observed: precision

increases while sensitivity decreases as noise levels

rise. This indicates that the model tends to classify

most instances as AF-negative, making the algorithm

unreliable as it may fail to detect true AF events.

4.2 RQ

: Ventricular Tachycardia

Robustness

Table 2 presents the results for VT detection. Our

ﬁndings indicate that increasing the intensity and

duration of baseline wander (BW), muscle artifact

(MA), and electrode motion (EM) noise does not sig-

niﬁcantly affect the reliability of the VT detector.

This suggests a high level of robustness to noisy sig-

nals.

As shown, there are only minor performance

drops under extreme noise conditions (e.g.,, 75% du-

ration, 3dB BW noise), but these are negligible in

most practical scenarios (e.g.,, 1.1% lower accu-

racy and F1-score). In conclusion, noise perturbation

does not signiﬁcantly impact the performance of the

state-of-the-art VT detector Mohammad-Taheri et al.

(2016), regardless of the intensity level, duration, or

type of noise.

4.3 Threats to Validity

Though we used carefully annotated datasets, some

noise may still be present, and the automated noise

generation may not capture all real-world variations

To enhance reliability, experiments were repeated

Assessing Signal Noise Effects on Machine Learning Models for ECG-Based Cardiac Diagnosis

463

Table 2: Ventricular Tachycardia detection performance across varying noise intensities, durations, and types.

Duration = 25%

Baseline Wander Muscle Artifact Electrode Motion

Intensity Speciﬁcity Precision Sensitivity Accuracy F1 Speciﬁcity Precision Sensitivity Accuracy F1 Speciﬁcity Precision Sensitivity Accuracy F1

-12dB 0.86 0.86 0.95 0.90 0.90 0.86 0.86 0.95 0.90 0.90 0.86 0.86 0.95 0.90 0.90

-6dB 0.86 0.86 0.95 0.90 0.90 0.86 0.86 0.95 0.90 0.90 0.86 0.86 0.95 0.90 0.90

-3dB 0.86 0.86 0.95 0.90 0.90 0.86 0.86 0.95 0.90 0.90 0.86 0.86 0.95 0.90 0.90

0dB 0.86 0.86 0.95 0.90 0.90 0.86 0.86 0.95 0.90 0.90 0.86 0.86 0.95 0.90 0.90

3dB 0.86 0.86 0.95 0.90 0.90 0.86 0.86 0.95 0.90 0.90 0.86 0.86 0.95 0.90 0.90

6dB 0.86 0.86 0.95 0.90 0.90 0.86 0.86 0.95 0.90 0.90 0.86 0.86 0.95 0.90 0.90

12dB 0.86 0.86 0.95 0.90 0.90 0.86 0.86 0.95 0.90 0.90 0.86 0.86 0.95 0.90 0.90

Duration = 50%

Baseline Wander Muscle Artifact Electrode Motion

Intensity Speciﬁcity Precision Sensitivity Accuracy F1 Speciﬁcity Precision Sensitivity Accuracy F1 Speciﬁcity Precision Sensitivity Accuracy F1

-12dB 0.84 0.84 0.95 0.89 0.89 0.85 0.85 0.95 0.90 0.90 0.84 0.85 0.95 0.89 0.89

-6dB 0.85 0.85 0.95 0.90 0.90 0.86 0.86 0.95 0.90 0.90 0.85 0.85 0.95 0.90 0.90

-3dB 0.86 0.86 0.95 0.90 0.90 0.85 0.85 0.95 0.90 0.90 0.86 0.86 0.95 0.90 0.90

0dB 0.86 0.86 0.95 0.90 0.90 0.85 0.85 0.95 0.90 0.90 0.86 0.86 0.95 0.90 0.90

3dB

0.86 0.86 0.95 0.90 0.90 0.86 0.86 0.95 0.90 0.90 0.86 0.86 0.95 0.90 0.90

6dB 0.86 0.86 0.95 0.90 0.90 0.86 0.86 0.95 0.90 0.90 0.86 0.86 0.95 0.90 0.90

12dB 0.86 0.86 0.95 0.90 0.90 0.86 0.86 0.95 0.90 0.90 0.86 0.86 0.95 0.90 0.90

Duration = 75%

Baseline Wander Muscle Artifact Electrode Motion

Intensity Speciﬁcity Precision Sensitivity Accuracy F1 Speciﬁcity Precision Sensitivity Accuracy F1 Speciﬁcity Precision Sensitivity Accuracy F1

-12dB 0.84 0.85 0.95 0.90 0.90 0.83 0.84 0.95 0.89 0.89 0.82 0.83 0.95 0.89 0.89

-6dB 0.83 0.83 0.95 0.89 0.89 0.83 0.84 0.95 0.89 0.89 0.85 0.85 0.96 0.90 0.90

-3dB 0.84 0.84 0.95 0.89 0.89 0.85 0.85 0.95 0.90 0.90 0.84 0.84 0.95 0.89 0.89

0dB 0.85 0.85 0.95 0.90 0.90 0.85 0.85 0.95 0.90 0.90 0.86 0.86 0.95 0.90 0.90

3dB 0.85 0.85 0.95 0.90 0.90 0.85 0.85 0.95 0.90 0.90 0.85 0.85 0.95 0.90 0.90

6dB 0.85 0.85 0.95 0.90 0.90 0.85 0.85 0.95 0.90 0.90 0.86 0.86 0.95 0.90 0.90

12dB 0.85 0.85 0.95 0.90 0.90 0.86 0.86 0.95 0.90 0.90 0.86 0.86 0.95 0.90 0.90

multiple times, and average results were reported.

Moreover, the study’s results are constrained by the

speciﬁc AF and VT detection algorithms used, which

may not generalize to other algorithms or populations.

5 CONCLUSION AND FUTURE

WORK

This study focused on the impact of common noise

types encountered in signals acquired through IoMT

devices—namely, baseline wander (BW), muscle ar-

tifact (MA), and electrode motion (EM)—on the de-

tection of two critical conditions: Atrial Fibrillation

(AF) and Ventricular Tachycardia (VT). Our ﬁnd-

ings demonstrate that VT detection remained robust

across different noise conditions. In contrast, AF

detection was more vulnerable, with precision and

sensitivity signiﬁcantly impacted by baseline wander

(BW), muscle artifact (MA), and electrode motion

(EM) noise. Practitioners should therefore be cau-

tious when deploying AF detection systems in envi-

ronments prone to noise, especially EM noise. Future

work should focus on developing more noise-resistant

AF detection algorithms and expanding research to

encompass a wider range of noise types and patient

populations to ensure generalizability and accuracy in

real-world settings.

ACKNOWLEDGMENTS

This study was conducted within the project funded

by Next Generation EU – “Age-It - Ageing Well in an

Ageing Society” project (PE0000015), National Re-

covery and Resilience Plan (NRRP) - PE8 - Mission

4, C2, Intervention 1.3. The views and opinions ex-

pressed are only those of the authors and do not nec-

essarily reﬂect those of the European Union or the Eu-

ropean Commission. Neither the European Union nor

the European Commission can be held responsible for

them. The authors would like to thank Aldo Palombo

for the support in the initial phase of this work.

REFERENCES

Amann, A., Tratnig, R., and Unterkoﬂer, K. (2005). Re-

liability of old and new ventricular ﬁbrillation detec-

tion algorithms for automated external deﬁbrillators.

Biomedical engineering online, 4:1–15.

Aparna, P., Mirajkar, P., and Prabhu, R. (2017). Detec-

tion and classiﬁcation of ventricular tachycardia using

svm. In International Journal of Innovative Research

in Electrical, Electronics, Instrumentation and Con-

trol Engineering, National Conference on Advances

in Electrical Engineering, volume 5, pages 116–120.

Aparna, P. and Sharma, K. M. (2020). Detection of a ﬁb

and its classiﬁcation using svm. In 2020 2nd Interna-

tional Conference on Innovative Mechanisms for In-

dustry Applications (ICIMIA), pages 116–120.

Balestrieri, E., Boldi, F., Colavita, A. R., De Vito, L.,

Laudato, G., Oliveto, R., Picariello, F., Rivaldi, S.,

Scalabrino, S., Torchitti, P., et al. (2019). The architec-

ture of an innovative smart t-shirt based on the internet

of medical things paradigm. pages 1–6.

Burke, J., Haigney, M. C., Borne, R., and Krantz, M. J.

(2020). Smartwatch detection of ventricular tachy-

cardia: Case series. HeartRhythm Case Reports,

6(10):800–804.

Chong, J. W., Cho, C. H., Tabei, F., Le-Anh, D., Esa, N.,

Mcmanus, D. D., and Chon, K. H. (2018). Motion and

noise artifact-resilient atrial ﬁbrillation detection us-

HEALTHINF 2025 - 18th International Conference on Health Informatics

464

ing a smartphone. IEEE Journal on Emerging and Se-

lected Topics in Circuits and Systems, 8(2):230–239.

Christiansen, E. H., Frost, L., MØlgaar, H., Nielsen, T. T.,

and Pedersen, A. K. (1996). Noise in the signal-

averaged electrocardiogram and accuracy for identi-

ﬁcation of patients with sustained monomorphic ven-

tricular tachycardia after myocardial infarction. Euro-

pean Heart Journal, 17(6):911–916.

Colloca, R., Johnson, A. E., Mainardi, L., and Clifford,

G. D. (2013). A support vector machine approach for

reliable detection of atrial ﬁbrillation events. In Com-

puting in Cardiology 2013, pages 1047–1050. IEEE.

Guandalini, G. S., Liang, J. J., and Marchlinski, F. E.

(2019). Ventricular tachycardia ablation. JACC: Clin-

ical Electrophysiology, 5(12):1363–1383.

Hamil, H., Zidelmal, Z., Azzaz, M. S., Sakhi, S., Kaibou,

R., Djilali, S., and Ould Abdeslam, D. (2022). De-

sign of a secured telehealth system based on multiple

biosignals diagnosis and classiﬁcation for iot applica-

tion. Expert Systems, 39(4):e12765.

Kaplan Berkaya, S., Uysal, A. K., Sora Gunal, E., Ergin, S.,

Gunal, S., and Gulmezoglu, M. B. (2018). A survey

on ecg analysis. Biomedical Signal Processing and

Control, 43:216–235.

Kher, R. (2019). Signal processing techniques for removing

noise from ecg signals. Journal of Biomedical Engi-

neering and Research.

Laudato, G., Boldi, F., Colavita, A. R., Rosa, G., Scal-

abrino, S., Lazich, A., and Oliveto, R. (2021a). Com-

bining rhythmic and morphological ecg features for

automatic detection of atrial ﬁbrillation: local and

global prediction models. In Biomedical Engineering

Systems and Technologies: 13th International Joint

Conference, BIOSTEC 2020, Valletta, Malta, Febru-

ary 24–26, 2020, Revised Selected Papers 13, pages

425–441. Springer.

Laudato, G., Scalabrino, S., Colavita, A. R., Chiac-

chiari, Q., D’Orazio, R., Donadelli, R., De Vito,

L., Picariello, F., Tudosa, I., Malatesta, R., et al.

(2021b). Atticus: Ambient-intelligent tele-monitoring

and telemetry for incepting and catering over hu-

man sustainability. Frontiers in Human Dynamics,

3:614309.

Mandala, S. and Di, T. C. (2017). Ecg parameters for ma-

lignant ventricular arrhythmias: a comprehensive re-

view. Journal of medical and biological engineering,

37(4):441–453.

Migdady, I., Russman, A., and Buletko, A. B. (2021). Atrial

ﬁbrillation and ischemic stroke: a clinical review. In

Seminars in Neurology, volume 41, pages 348–364.

Thieme Medical Publishers, Inc.

Mohammad-Taheri, S., Shirazi, M.-A. M., and Raﬁezade,

A. (2016). Slope analysis based methods for detection

of ventricular ﬁbrillation and ventricular tachycardia.

In 2016 24th Iranian Conference on Electrical Engi-

neering (ICEE), pages 1100–1103. IEEE.

Mohd Apandi, Z. F., Ikeura, R., Hayakawa, S., and Tsut-

sumi, S. (2020). An analysis of the effects of noisy

electrocardiogram signal on heartbeat detection per-

formance. Bioengineering, 7(2):53.

Mohebbi, M. and Ghassemian, H. (2008). Detection of

atrial ﬁbrillation episodes using svm. In 2008 30th

annual international conference of the IEEE engineer-

ing in medicine and biology society, pages 177–180.

IEEE.

Murat, F., Sadak, F., Yildirim, O., Talo, M., Murat, E., Kara-

batak, M., Demir, Y., Tan, R.-S., and Acharya, U. R.

(2021). Review of deep learning-based atrial ﬁbrilla-

tion detection studies. International Journal of Envi-

ronmental Research and Public Health, 18(21).

Oster, J. and Clifford, G. D. (2015). Impact of the presence

of noise on rr interval-based atrial ﬁbrillation detec-

tion. Journal of Electrocardiology, 48(6):947–951.

Pritchett, E. L. (1992). Management of atrial ﬁbrillation.

New England Journal of Medicine, 326(19):1264–

1271.

Rajeshwari, M. and Kavitha, K. (2021). A review of

feature extraction from ecg signals and classiﬁca-

tion/detection for ventricular arrhythmias. Rec. Ad-

van. Comp. Sci. Commun, 14(1):192–200.

Ramakrishnan, S., Akshaya, V., Kishor, S., and Thyagara-

jan, T. (2017). Real time implementation of arrhyth-

mia classiﬁcation algorithm using statistical methods.

In 2017 Trends in Industrial Measurement and Au-

tomation (TIMA), pages 1–4.

Ramkumar, S., Nerlekar, N., D’Souza, D., Pol, D. J.,

Kalman, J. M., and Marwick, T. H. (2018). Atrial

ﬁbrillation detection using single lead portable elec-

trocardiographic monitoring: a systematic review and

meta-analysis. BMJ open, 8(9):e024178.

Sadr, N., Jayawardhana, M., Pham, T. T., Tang, R., Balaei,

A. T., and de Chazal, P. (2018). A low-complexity

algorithm for detection of atrial ﬁbrillation using an

ecg. Physiological measurement, 39(6):064003.

Sepulveda-Suescun, J., Murillo-Escobar, J., Urda-Benitez,

R., Orrego-Metaute, D., and Orozco-Duque, A.

(2017). Atrial ﬁbrillation detection through heart

rate variability using a machine learning approach

and poincare plot features. In VII Latin American

Congress on Biomedical Engineering CLAIB 2016,

Bucaramanga, Santander, Colombia, October 26th-

28th, 2016, pages 565–568. Springer.

Strik, M., Sacristan, B., Bordachar, P., Duchateau, J.,

Eschalier, R., Mondoly, P., Laborderie, J., Gassa,

N., Zemzemi, N., Laborde, M., Garrido, J., Ma-

tencio Perabla, C., Jimenez-Perez, G., Camara, O.,

ıssaguerre, M., Dubois, R., and Ploux, S. (2023).

Artiﬁcial intelligence for detection of ventricular over-

sensing: Machine learning approaches for noise de-

tection within nonsustained ventricular tachycardia

episodes remotely transmitted by pacemakers and im-

plantable cardioverter-deﬁbrillators. Heart Rhythm,

20(10):1378–1384. Focus Issue: Sudden Death.

Wellens, H. J. (2001). Ventricular tachycardia: diagnosis

of broad qrs complex tachycardia. Heart, 86(5):579–

585.

Xiong, Z., Stiles, M. K., and Zhao, J. (2017). Robust ecg

signal classiﬁcation for detection of atrial ﬁbrillation

using a novel neural network. In 2017 Computing in

Cardiology (CinC), pages 1–4. IEEE.

Assessing Signal Noise Effects on Machine Learning Models for ECG-Based Cardiac Diagnosis

465

Yuan, C., Yan, Y., Zhou, L., Bai, J., and Wang, L. (2016).

Automated atrial ﬁbrillation detection based on deep

learning network. In 2016 IEEE International Con-

ference on Information and Automation (ICIA), pages

1159–1164. IEEE.

Zhou, X., Ding, H., Wu, W., and Zhang, Y. (2015). A

real-time atrial ﬁbrillation detection algorithm based

on the instantaneous state of heart rate. PloS one,

10(9):e0136544.

HEALTHINF 2025 - 18th International Conference on Health Informatics

466