A Comparative Study on Cloud-based and Edge-Based Digital Twin

Frameworks for Prediction of Cardiovascular Disease

Havvanur Dervis¸o

glu

, Burak

Ulver

, Rabia Arkan Yurto

glu

, Rus¸en Halepmollası

and Mehmet Haklıdır

ITAK Informatics and Information Security Research Center, Kocaeli, Turkey

Keywords:

Cloud Computing, Edge Computing, Digital Twin, Healthcare.

Abstract:

Digital Twins that can integrate with related technologies such as Artiﬁcial intelligence, optimization, mobile

communication systems, edge computing, fog computing, cloud computing, etc. are virtual representations

of physical objects and reﬂect the real time status through streaming data. In this study, we provide two Dig-

ital Twin frameworks both cloud-based and edge-based and compare them in terms of scalability, ﬂexibility,

latency and security. We represented those frameworks by developing a case study to predict cardiac patient,

continuously monitor the risks related to heart disease, and reporting the risks to both healthcare professionals

and users in real time. We extracted features over electrocardiogram signals and performed popular machine

learning algorithms. We employed feature binning and feature selection methods to increase the robustness of

the prediction model and, in total, we built 20 models. We presented empirical analysis on a publicly available

dataset based on PTB Diagnostic ECG Database and evaluated the results in terms of accuracy, precision, re-

call and F-score. When predicting cardiac patients, Linear Regression outperformed the other classiﬁers with

accuracy and F-score rates of 86% and 92%, respectively. This model has also the highest recall rate (98%),

which is vital in predicting diseases. Meanwhile, Gradient Boosted Tree applied binning, mRMR feature se-

lection method and random oversampling achieve high precision (91%).

1 INTRODUCTION

Industry 4.0 is one of the key initiatives of utiliz-

ing a wide range of advanced technologies, such as

cloud computing, big data, Digital Twin (DT), Ma-

chine Learning (ML), Deep Learning (DL) and virtual

reality. The Internet of Things (IoT), which is another

aspect of Industry 4.0, has a signiﬁcant role in the

digitalization of data for contributing value to many

sectors including health, energy, education, industry,

etc.-. Also, IoT and digitalization represent a novel

paradigm by enabling the creation of DTs through the

capability to collect and communicate data.

DT was introduced as a concept underlying prod-

uct lifecycle management by Grieves (2002) and of-

fered with different names such as mirrored spaces

model (Grieves, 2005), information mirroring model,

https://orcid.org/0000-0002-2122-6944

https://orcid.org/0000-0002-5790-0590

https://orcid.org/0000-0003-3837-8052

https://orcid.org/0000-0002-9941-2712

https://orcid.org/0000-0003-4985-1116

and virtual twin (Githens, 2007). In 2010, NASA

used DT for the Apollo project in which two identical

space vehicles were created to simulate space status

during ﬂight training. Thus, John Vickers ﬁrst coined

”DT” name for the model in 2010 NASA Roadmap

Report (Piascik et al., 2012). In this context, DT con-

cept is a model that can separate information about

a physical system from the system itself and subse-

quently mirror or twin that system (Grieves, 2019).

Industry 4.0 also elevates healthcare to novel and

advanced levels on the basis of digitization, IoT, AI,

cloud/fog/edge computing and 5G networks. Further-

more, those technologies make possible the collection

and analysis of data from anyone, anywhere and any-

time and dramatically impact the healthcare systems

by connecting them to patients’ personal devices to

capture data and to notify patients, doctors, or patient

relatives in real time Ge et al. (2019). Moreover, the

emergence of smart healthcare services, through digi-

talization, has grown rapidly the body of research and

implementation of several methods focus on improv-

ing the quality of healthcare services and human wel-

fare while reducing healthcare costs and the death rate

Dervi¸so

glu, H., Ülver, B., Yurto

glu, R., Halepmollası, R. and Haklıdır, M.

A Comparative Study on Cloud-based and Edge-Based Digital Twin Frameworks for Prediction of Cardiovascular Disease.

DOI: 10.5220/0011859400003476

In Proceedings of the 9th International Conference on Information and Communication Technologies for Ageing Well and e-Health (ICT4AWE 2023), pages 159-169

ISBN: 978-989-758-645-3; ISSN: 2184-4984

 2023 by SCITEPRESS – Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)

159

(Huang. et al., 2021; Dritsas. et al., 2022). In this con-

text, DT promises a new era for healthcare by chang-

ing the smart health concept and taking medicine to

an unprecedented level.

In recent years, researchers from both academia

and industry have expressed considerable interest in

the improvement of DT technologies. Therefore, re-

search studies on DTs and their applications have

been prominent in a wide range of domains includ-

ing healthcare. DTs, which can combine other core

technologies including AI, big data, 5G/6G, cloud

computing and edge computing, are virtual represen-

tations of physical assets and can express the real

time situation through streaming data (Fuller et al.,

2020). Moreover, edge computing and cloud comput-

ing, through supplementary capabilities, offer new as-

pects when implementing DTs on the different levels

where they have several requirements such as scala-

bility, latency, reliability, centralization, etc. to pro-

cess, analyse and transmit data. There exist many

studies investigating edge-based DT and cloud-based

DT, separately. However, there is a gap in studies im-

plementing both cloud-based DT and edge-based DT

and comparing their performances. Although Khan

et al. (2022) compared cloud-based against edge-

based DT, they did not provide any empirical ev-

idence. According to our knowledge, there is no

study implementing and comparing both edge-based

DT and cloud-based DT frameworks.

In this study, we aim to offer both an edge-based

DT and a cloud-based to compare their performance.

Hence, we present a cloud-based DT framework and

an edge-based DT framework and contrast them in

terms of latency, scalability, mobility, and centraliz-

ing. To represent the frameworks on this basis, we im-

plemented a case study to predict cardiac patients and

monitor the risks of heart diseases, reporting those

to users and healthcare providers in real time. To

this end, we trained common ML classiﬁers with ex-

tracted features over electrocardiogram signals. To

improve the prediction results, we applied data pre-

processing and used various feature binning and fea-

ture selection techniques. We presented an empirical

study on a dataset based on an open-source database,

namely, PTB Diagnostic ECG Database (Bousseljot

et al., 1995; Goldberger et al., 2000). The dataset con-

sists of 549 samples collected from 290 persons (209

men and 81 women). While 69 of the samples are la-

belled as healthy, 378 of them are labelled as patients.

We applied various sampling methods with differ-

ent parameters due to the imbalanced structure of the

dataset. We compared the ML model results in terms

of accuracy, precision, recall and F-score. According

to our results, we outperformed a benchmark study

(Cardiac Twin) (Martinez-Velazquez et al., 2019) on

the same dataset.

Structure of the Paper. Section 2 summarizes pre-

vious related works on cloud-based DTs, edge-based

DTs and DT applications in healthcare. In Section 3,

we describe DT frameworks for cloud and edge, sep-

arately. Section 4 provides a case study in which we

explain the dataset and also present data preprocess-

ing, classiﬁcation methods and evaluation metrics. In

Section 5, the detail of obtained results is reported and

discussed. Finally, we conclude the paper and present

the future work in Section 6.

2 RELATED WORK

Researchers have widely studied DT, which uses

other popular technologies including cloud comput-

ing, edge computing, AI and IoT, in various ﬁelds

such as manufacturing, energy, aerospace, construc-

tion and healthcare. In this section, we discuss the

studies in the literature which utilize DTs on cloud,

DTs on edge and DTs in healthcare, respectively.

2.1 Digital Twins on Cloud

The utilization of DT with cloud computing technol-

ogy which has many capabilities like unlimited stor-

age capacity, dynamic scaling, and high availability

allows considerable advancements to be experienced

in several ﬁelds (Liu et al., 2019; Wang et al., 2022,

2020). Liu et al. (2019) presented a cloud-based

DT Health (CloudDTH) reference framework includ-

ing key technologies, i.e., cloud computing, health

IoT and DT, to create effective solutions for elderly

health services. CloudDTH, which consists of eight

layer architecture, is based on DT Healthcare (DTH)

conceptual model. Moreover, they conducted a case

study that made drug recommendations to patients or

healthcare professionals using online ECG data (Liu

et al., 2019).

In (Wang et al., 2022), a Battery Management

System, which has a 4-layer architecture and utilizes

the power of cloud and DT technologies, was pro-

vided. The architecture processes big data using the

capabilities of cloud technology such as high stor-

age and processing power capacity, while also using

the capabilities of DT technology to digitize the be-

havior and real processes of the battery. Authors ar-

gue that real time optimization studies can be imple-

mented throughout the whole life cycle of batteries for

more complicated and intelligent battery management

with BMS, or various analyzes and predictions can be

ICT4AWE 2023 - 9th International Conference on Information and Communication Technologies for Ageing Well and e-Health

160

made by obtaining insights from the data using histor-

ical data and AI models. Besides, they stated that sen-

sitive management processes of systems with more

complex structures, such as large-capacity lithium-

ion battery packs, can be done effectively with DT

and cloud technologies.

Wang et al. (2020) proposed a DT framework

for connected vehicles. The proposed DT frame-

work uses the V2C (vehicle-to-cloud) communication

based Advanced Driver Assistance System to twin the

connected vehicles in the cloud. According to their

results, the proposed system can beneﬁt transporta-

tion with acceptable communication delays.

2.2 Digital Twins on Edge

In literature, to utilize the advantages of edge comput-

ing such as latency and mobility, there are also studies

that present DT on edge in various ﬁelds (Martinez-

Velazquez et al., 2019; Bellavista et al., 2021).

Martinez-Velazquez et al. (2019) presented an

edge-based DT architecture, namely Cardio Twin,

working at the edge for monitoring the heart con-

ditions of patients. Cardio Twin consists of three

layers, i.e. Data Source, AI-Inference Engine, and

Multimodal Interaction. Besides, as a PoC study,

they implemented the AI-Inference Engine based on

those layers and built a CNN model using PhysioNet’s

”PTB Diagnostic ECG Database” dataset. They ob-

tained data on the mobile phones used as edge devices

during the real time test phases. The performance

results of the model obtained at the end of the PoC

study are accuracy 85.7%, precision 95.5% and recall

86.3%. The authors emphasized that edge-based DT

architecture takes the advantages of edge computing

and prevents the latency caused by cloud (Martinez-

Velazquez et al., 2019).

Bellavista et al. (2021) argue that manual conﬁg-

urations of networks in industrial environments are

time-consuming and error-prone. To this end, au-

thors presented the Application-Driven DT Network-

ing (ADTN) middleware. ADTN middleware consists

of semantically enriching simple DTs, namely SDT,

deployed to edge nodes and composed DTs, namely,

CDT, that perform ﬂexible arrangements. According

to their results, ADTN middleware is the feasibility

and efﬁcient, however, there are issues to be investi-

gated in the future to promote its use in real industrial

environments

In (Glatt et al., 2021), the edge-based DT concept

was introduced to ensure sustainability through the

assessment of ecological conditions in cross-company

production networks. The presented concept consists

of two levels, namely the Network level and the Com-

pany level. While the Network level includes the pro-

cesses and activities of several companies, the Com-

pany level focuses on individual processes in detail.

The authors mentioned that the difﬁculties that may

be encountered with the application of this concept in

an industrial environment can be determined and the

effect of the presented approach on performance can

be examined (Glatt et al., 2021).

2.3 Digital Twins in Healtcare

Problems relating to an inability to accession patients’

historic data, corrupt/miss health data and undiag-

nosed or delayed diagnoses cause thousands of deaths

each year (Tyagi et al., 2016). On the other hand,

technologies facilitating the digitization of medicine,

in general, allow ML methods to be trained on suf-

ﬁciently large dataset and achieve clinical accuracy

that is vital in medicine (Halepmollası et al., 2021).

Thus, those technologies lead a new paradigm reduc-

ing costs while improving the quality of health ser-

vices. According to a recent research report (Kalis

et al., 2022), almost 80% of healthcare executives

stated that their organizations’ usage of IoT/Edge

devices had increased enormously during the pre-

vious three years. Also, nearly half of them be-

lieve that DT technology will make a breakthrough

in the future by building a bridge between the digi-

tal and physical worlds and have a positive impact on

healthcare (Kalis et al., 2022). Moreover, there exist

many studies presenting DT applications in health-

care based on Human/Patient (Kamel Boulos and

Zhang, 2021; Shengli, 2021), Hospital/Health Insti-

tutions (Hassani et al., 2022; Singh et al., 2022) and

Medical (Bj

ornsson et al., 2020).

Personalized and holistic healthcare can be pro-

vided for the whole life cycle of humans through

Human/Patient-oriented DT research. Those health

services include monitoring persons’ health state,

early detecting and diagnosing diseases, applying

personalized treatment methods according to ge-

netic, physiological, and other characteristics of per-

sons, and examining the effect of the therapies used

(Kamel Boulos and Zhang, 2021; Singh et al., 2022).

DT after life can be beneﬁt for organ transplant pro-

cedures such as recipient-donor matching and organ

transplant (Hassani et al., 2022). Shengli (2021) pre-

sented Human DT (HDT) that is based on the Aug-

mented DT conceptual model to provide the lifecy-

cle management of a human. To represent human

in cyberspace, the proposed model overcome chal-

lenges like the complexity of human, social ethic is-

sues, safety etc. Thus, author states that DT is an im-

portant technology that provides the interaction be-

A Comparative Study on Cloud-based and Edge-Based Digital Twin Frameworks for Prediction of Cardiovascular Disease

161

tween physical and cyberspace (Shengli, 2021).

Although human DTs have the aforementioned

beneﬁts, they also have several challenges that must

be overcome to create them holistically and compre-

hensively. For instance, collecting health data such

as blood analysis and X-ray required to create hu-

man DTs can be time consuming and costly (Shengli,

2021). Also, the collected data can be person-based

and diverse and has security and privacy concerns

(Shengli, 2021; Jimenez et al., 2020).

The services provided by the various organiza-

tional structure components of the health institutes

can be utilized effectively through Hospital-oriented

DT research. For example, to improve patient care

and treatment, health services can be improved by

scheduling further studies in all processes (Singh

et al., 2022). Besides, monitoring electronic devices

in terms of maintenance and repairs can save money

and time while also ensuring that patients receive non-

stop service and preventing the breakdown of devices.

Hospitals that increase staff productivity, treatment

success rate, patient satisfaction, and effective execu-

tion of operations can be designed and constructed via

DT technology (Hassani et al., 2022; Karakra et al.,

2019). For instance, Siemens Healthineers use DT

techniques to streamline hospital operations in the ra-

diology department of a hospital in Dublin, Ireland

(Scharff, 2018). Moreover, in (Hassani et al., 2022;

Tao et al., 2022), authors stated that medical-oriented

DT beneﬁts to observe the effects and determine the

best intervention method can be obtained by perform-

ing applications on DT before any surgical interven-

tion or performing experiments on DTs in the de-

velopment of new medical equipment and drugs or

healthcare education can be done practically on DTs.

3 DIGITAL TWIN

In this study, we aim to compare the performance of

two different DT frameworks in predicting cardiac pa-

tients. While the ﬁrst framework is based on cloud

computing, the second framework is based on edge

computing (Figure 1).

We offered a DT framework on cloud that pro-

vides scalability, resource sharing, and service-on-

demand. As illustrated in Figure 1a, the framework

consists of three layers:

Edge Layer. In this layer, there exist devices (e.g.

cell phone or smart watch) used to create digital

copies of the physical assets. The ECG signals of the

patients are sent from this layer to the cloud layer.

Cloud Layer. The digital copy of the physical en-

tity is created on this layer that contains the storage

and AI modules. The storage module contains two

databases, one holding the historical ECG signals of

the patients in the system and the other holding pre-

diction results. As shown in Figure 2, AI module has

two main tasks (i) the ﬁrst task is to preprocess the

historic data and build ML models; (ii) the second

task is to predict cardiac or not through deployed ML

model in real time and send prediction results to the

database on storage module.

Application Layer. The status of the physical asset

can be continuously monitored in this layer. Thus, the

layer includes screens on which prediction results can

be visualized and also rules that send notiﬁcations to

the expert based on prediction results.

3.1 Digital Twin Framework on Edge

We also provided a DT framework on edge that deals

with latency issues and allow mobility. As shown

in Figure 1b, the edge-based DT framework also in-

cludes three layers:

Edge Layer. In this layer, there are devices used to

create digital copies of assets. Also, in the edge-based

DT framework, ML models that were trained on the

cloud are deployed to those devices on edge.

Cloud Layer. It is similar to the cloud-based DT

framework, cloud layer includes Storage and AI mod-

ules. On the other hand, ML models are trained on

cloud layer and utilized on edge layer in predicting

cardiac patients.

Application Layer. In edge-based DT framework,

this layer is not only available on the cloud, but also

on the edge. Thus, users can monitor their own status

in real time and notiﬁcations can be sent to experts

according to the prediction results.

4 CASE STUDY

In this section, we deﬁne the problem statement,

present the details of the dataset obtained from PTB

Diagnostic ECG Database (PTBD) (Bousseljot et al.,

1995; Goldberger et al., 2000) and explain the details

of AI module in which ML models were constructed.

Also, we describe the evaluation metrics used to com-

pare the prediction results of ML models.

ICT4AWE 2023 - 9th International Conference on Information and Communication Technologies for Ageing Well and e-Health

162

(a) Cloud-based Digital Twin Framework (b) Edge-based Digital Twin Framework

Figure 1: Cloud-based and Edge-based Digital Twin Framework Structures and Components.

4.1 Problem Statement

It’s crucial to continuously monitor cardiac patients

to reduce the rate of sudden death that heart diseases

often might result in it (WHO, 2020).Therefore, per-

sonal health monitoring tools, such as mobile apps or

built-in sensors, can continuously monitor key health

indicators of a user (e.g. ECG, blood pressure, heart

rate, etc.) and reduces the risk of incorrect data en-

try. Meanwhile, anonymous data can be captured and

transferred to the cloud by those devices and com-

pared with historical data to detect any disease or no-

tify the appropriate health personnel. In this context,

monitoring heart health indicators enables quick inter-

vention in emergency situations and provides early di-

agnosis of diseases by predicting possible risks. Thus,

healthcare services and patients’ quality of life can

improve.

Creating a virtual representation of each patient

could be one of the best ways for healthcare systems

in monitoring key health indicators, increasing con-

trol over health, and enhancing healthcare services.

To this end, in this study, we offer two different DT

frameworks that allow continuously monitoring of pa-

tients’ heart health indicators and compare them in

terms of latency, scalability, etc.-. Also, we build

an ML model to predict in real time whether people

whose heart health data are monitored are healthy or

have heart disease. For this purpose, we combine DT

with data analytics, ML, cloud computing, edge com-

puting and IoT technologies in both cloud-based DT

and edge-based DT frameworks.

4.2 Dataset

We used a publicly available dataset obtained from

PhysioNet’s ”PTB Diagnostic ECG Database” (Bous-

seljot et al., 1995; Goldberger et al., 2000). In this

dataset, the ECGs were recorded using an experimen-

tal, non-commercial PTB recorder that satisﬁed the

various requirements including 16 input channels, in-

put voltage, input resistance, resolution, bandwidth,

noise voltage, online recording of skin resistance, and

noise level recording during signal collection. The

details of requirements are explained in (Goldberger

et al., 2000).

The dataset contains 549 records from 290 people,

209 men and 81 women. The registered individuals

range in age from 17 to 87 and mean age of men is

55.5 while mean age of women is 61.6. Also, some

people may have 5 records, while others only have

one record. Each record contains 15 signals measured

simultaneously (i, ii, iii, avr, avl, avf, v1, v2, v3, v4,

v5, v6, vx, vy, vz). In the dataset, there are 69 samples

with healthy labels, 378 have patient labels and the

rest have no labels.

4.3 Preprocessing Steps in AI Module

There are 15 different ECG signals that are generally

sensitive to noise. We used v4 signals as it is received

from the closest location to the heart and best rep-

resent the status of the heart. Moreover, ﬁve ﬁducial

points P, Q, R, S, and T were extracted from the signal

data and also obtained the distance, amplitude, angle,

slope and height between the points using the ﬁducial

points. The stages performed on the signal data are as

follows (Figure 2):

4.3.1 Cleaning the Signal Data, Denoising and

Smoothing

ECG signals deﬁne how the heart beats electrically.

The ECG signals are produced when the heart’s atrial

and ventricular muscles contract and relax. How-

ever, ECG signals have four primary types of artifacts,

i.e., baseline wander, powerline interference, EMG

noise, and electrode motion (Kher et al., 2019). Ar-

tifacts are unwanted signals and sometimes prevent

doctors from making a correct diagnosis. Therefore,

to remove artifacts from ECGs, we used appropriate

signal-processing ﬁlters that are generally utilized to

remove or reduce noise and cure data quality.

A Comparative Study on Cloud-based and Edge-Based Digital Twin Frameworks for Prediction of Cardiovascular Disease

163

Figure 2: AI Module Flow Diagram.

High-pass Filter. Baseline wander is an effect that

causes a signal to zigzag rather than to straight. Those

zigzags may cause the signal to move from regular

base and can be eliminated by using a high-pass ﬁlter.

The maximum ripple of the ﬁlter is set to 12 db and

the Kaiser window technique (Wang et al., 2022) is

used to determine the ﬁlter window parameters.

Band Stop Filter. Powerline interference repre-

sents a common noise source caused by electromag-

netic ﬁelds and muscle contractions. The noise is

identiﬁed by 50 or 60 Hz sinusoidal interference and

affects low-frequency ECG waves. We determined

the cut-off frequencies, used for the band-stop ﬁlter,

as 59.5 and 60.5 Hz to remove noise.

Low-pass Filter. We used to Low-pass ﬁlter to

eliminate high order harmonics.

Smoothing Filter. We applied smoothing ﬁlter,

namely Savitzky-Golay ﬁlter (Luo et al., 2005), on

the signal after removing the noise. As a result of this

ﬁlter, we can capture important patterns on the signal

and detect peaks with high accuracy.

4.3.2 Peak Extraction

We extracted the peak points over the ECG signal. To

this end, we ﬁrstly identiﬁed the R peaks over the sig-

nal then we identiﬁed the T, Q, P and S peaks using

the R peaks.

4.3.3 Eliminate and Impute the NaN Values

We performed K Nearest Neighbour method on each

list of peaks to deal with the NaN values. For this pur-

pose, we calculated a mean of its k nearest neighbours

for each missing data in the training set and used those

to impute the NaN values with the means.

4.4 Methodology in AI Module

Besides to aforementioned preprocessing steps, we

also applied the feature engineering techniques - i.e.,

feature binning, feature selection methods and sam-

pling, respectively- to improve the ML model results

(Figure 2).

4.4.1 Feature Binning

The process of converting continuous or numerical

values into categorical features is called binning or

discretization. In this study, the features extracted

from the signal data are continuous. Hence, to inves-

tigate the effects of discrete features on the prediction

process, we binned the features. For this purpose, we

employed a quantile-based discretization function.

Quantile-Based Discretization. It is the process of

creating equal-sized bins by discretizing the variable

based on order or sample quantities. We applied

Quantile-based discretization for each continuous fea-

ture column (quantile number is 4).

ICT4AWE 2023 - 9th International Conference on Information and Communication Technologies for Ageing Well and e-Health

164

4.4.2 Feature Selection

The success of the prediction models is directly re-

lated to the use of relevant feature selection meth-

ods. In order to select the features that best represent

the model, we applied two feature selection methods,

i.e., Chi-square and mRMR, which are the most pre-

ferred in the literature (Rachburee and Punlumjeak,

2015) and we also compared their performance. We

extracted 58 different features over the ECG signals

by implementing the above methods.

Minimum Redundancy and Maximum Relevance

(mRMR). It is the ﬁltering process that selects the

features with the highest correlation with the target

classes using the relationship between the feature and

the target class (Rachburee and Punlumjeak, 2015).

In the study, in order to observe the effect of the

selected feature number on the model performance,

the dataset obtained by changing the selected fea-

ture number with mRMR were used in model training

(The number of different selected features is 4, 10, 15,

18 and 20).

Chi-Square. Chi-square makes feature selection by

guessing whether the class label is independent of

a feature(Rachburee and Punlumjeak, 2015). In the

study, different results were interpreted by choosing

10 and 15 features with this method.

4.4.3 Sampling

In our dataset, each instance has a corresponding

label of “0” or “1”, where “1” means cardiac and

“0” means healthy persons. The samples labelled as

healthy persons only account for a small portion of

the whole dataset (15,4%). Meanwhile, the imbal-

anced distribution of classes in the dataset directly

affects the prediction performance as ML algorithms

usually suppose a balanced class distribution (Mur-

phy, 2018) Furthermore, training classiﬁcation mod-

els directly with imbalanced data may cause bias in

the prediction performance and result in a low pre-

diction score in terms of some evaluation metrics.

Thus, we implemented sampling methods to address

the problem of a serious imbalance between cardiac

and healthy classes. We performed several sampling

techniques including random under-sampling, ran-

dom over-sampling and SMOTE with several rates.

Random Under Sampling (RUS). To deal with is-

sues caused by imbalanced dataset and obtain a bal-

anced dataset, we applied the random under-sampling

technique. Under-sampling creates a balanced dataset

consisting of classes with the same number of sam-

ples by making as much selection as the minor class

from the major class in the imbalanced dataset.

Random over Sampling (ROS). There are a num-

ber of methods available to obtain balanced dataset.

One of them is oversampling that creates multiple

copies of the minority class in the training data, up

to the number of members of the major class.

SMOTE: Synthetic Minority Oversampling Tech-

nique (SMOTE) is one of the most frequently used

method in the literature to balance the number of sam-

ples in classes (Turlapati and Prusty, 2020). Minority

instances are increased by using linear interpolation

for training to balance the number of samples between

the two classes (Turlapati and Prusty, 2020).

4.4.4 Stratiﬁed K-fold Cross-validation

K-fold cross-validation involves splitting the dataset

into k folds. With this method, iteratively the k-

1 fold is used in training and the k. fold is used

in the test, thus allowing each k to be used as test

data. In our study, we used the stratiﬁed k-fold cross-

validation method, which is suitable for unbalanced

dataset, which is more suitable for our problem as it

preserves the class distribution in each k (used as k 5

in the study).

4.4.5 Machine Learning Model

Our objective in the study is to develop a DT that

predicts whether the person is sick using ECG signal

data and sends a notiﬁcation to the specialist in case

of illness. In this way, stream ECG signal data will

be processed and emergency intervention will be pro-

vided in case the person is sick. For this purpose, we

ﬁrst performed preprocessing, feature binning, selec-

tion and sampling operations on the data, and then we

used the model that gave the best results in DT frame-

work by obtaining the performance results in different

ML methods. In this study, we built the cardiac pa-

tient prediction model using ML algorithms, namely,

Gradient Boosted Tree and Linear Regression as they

are commonly used techniques for binary classiﬁca-

tion problems.

Logistic Regression (LR): In this study, we used

LR is often used in binary classiﬁcation problems in

modelling the probability of a discrete outcome cor-

responding to an input variable. We also performed

hyperparameter tuning on this model and the best re-

sults were obtained with default parameters (penalty:

A Comparative Study on Cloud-based and Edge-Based Digital Twin Frameworks for Prediction of Cardiovascular Disease

165

l2, solver: lbfgs and maximum iteration: 100) (Mc-

Cullagh and Nelder, 2019).

Gradient Boosting Classiﬁer (GB). Boosting is an

ensemble transforming method for weak learners into

strong learners by adding new models to ﬁx the er-

rors made by existing models. Models are added in-

crementally with iterations until no improvement is

detected. Gradient Boosting creates new trees after

creating the ﬁrst leaf, taking into account the predic-

tion errors and utilizes the gradient descent algorithm

to minimise the loss. We also performed hyperparam-

eter tuning on GB using the grid search and the best

results were obtained when the learning rate was 1,

max depth 9 and the number of estimators 50 (Fried-

man, 2001).

4.5 Evalutation Metrics

In this study, we chose to analyze accuracy, recall,

precision, F-score over confusion matrices to assess

the predictive performance of classiﬁers. Accuracy

is the most common evaluation metric to identify the

correct prediction rate of classiﬁers. However, pre-

cision, recall, and F-measure metrics should be used

together with accuracy which can be misleading in

dataset where the imbalance and predictions belong-

ing to the less class are important.

Accuracy =

T P+T N

T P+FP+T N+FN

(1)

Precision is important when FPs are costly for

us, as it gives information about percentage of actual

cardiac patients among the predicted cardiac patients

(Eq. 1).

Precision =

T P

T P+FP

(1)

On the other hand, Recall is also important when

the FNs are critical, and calculated as shown in Eq. 2.

This metric shows to what proportion of accurately a

classiﬁer predicts cardiac patients.

Recall =

T P

T P+FN

(2)

We also would like to measure he trade-off be-

tween recall and precision. Therefore, the F-score is

used as the harmonic mean of these metrics (Eq. 3).

F − score =

2∗recall∗precision

recall+precision

(3)

5 RESULTS AND DISCUSSION

In this section, we explain the details of the experi-

mental results obtained by several ML models. Also,

we compare cloud-based DT against edge-based DT.

In this article, we presented DT solution for real

time monitoring of heart disease, which is one of the

diseases that affect human life most with fast inter-

vention.

Figure 3: Comparison of AI Module Best Results and Car-

dioTwin.

5.1 Machine Learning Results

The proposed DT frameworks allow the users in the

system to be monitored in real time whether they are

cardiac or not. Also, they send notiﬁcations to the

experts in case of cardiac. In this study, we used two

different ML classiﬁers, namely GB and LR, with dif-

ferent preprocessing, feature selection, and sampling

methods. Then, we selected the model that has the

best prediction result to deploy DT. To validate mod-

els, we performed 5-fold cross validation method and

the analyzed results through different experiments.

Table 2 summarizes the results obtained from the ex-

periments of all cardiac patients prediction models.

When the results were examined, we obtained the

best results with the LR when the feature binning (the

number of bins is 4) and chi2 feature selection (the

number of features is 15) methods were applied to

the data. This combination achieves 86% accuracy

rate, 98% recall rate and 92% F-score rate. Also,

we achieved the highest precision rate (92%) with the

combination of GB+Binning+mRMR+ROS model in

predicting cardiac. On the other hand, the results in

our reference study called CarrdioTwin; accuracy is

85% and F-score is 90%.

When we analyzed the results in Table 1, we ob-

served that the best results were obtained LR + Bin-

ning + chi2 combination. The reason why the results

obtained with feature binning are better may be be-

cause the features in the dataset we use are continu-

ous and they need to be expressed categorically. The

better results we get when feature selection is applied

may be due to the fact that both the features (it is

applied after feature binning) and the label are com-

posed of discrete values. In addition, during the ex-

ICT4AWE 2023 - 9th International Conference on Information and Communication Technologies for Ageing Well and e-Health

166

Table 1: AI Module Benchmark Results.

Model ACC Precision Recall F-score

CardioTwin 0.85 0.95 0.86 0.90

Logistic Regression (LR) 0.82 0.84 0.97 0.9

LR + Binning 0.83 0.89 0.91 0.9

LR + Binning + mRMR 0.86 0.88 0.97 0.92

LR + Binning + chi2 0.86 0.87 0.98 0.92

LR + Binning + chi2 + RUS 0.79 0.91 0.84 0.87

LR + Binning + chi2 + ROS 0.81 0.91 0.85 0.88

LR + Binning + chi2 + SMOTE 0.8 0.9 0.86 0.88

LR + Binning + mRMR + RUS 0.8 0.9 0.85 0.88

LR + Binning + mRMR + ROS 0.77 0.91 0.8 0.85

LR + Binning + mRMR + SMOTE 0.8 0.9 0.86 0.88

Gradient Boosted Tree (GB) 0.84 0.89 0.92 0.91

GB + Binning 0.82 0.88 0.91 0.9

GB + Binning + mRMR 0.85 0.91 0.91 0.91

GB + Binning + chi2 0.82 0.88 0.91 0.9

GB + Binning + chi2 + RUS 0.68 0.89 0.71 0.79

GB + Binning + chi2 + ROS 0.79 0.88 0.87 0.88

GB + Binning + chi2 + SMOTE 0.76 0.88 0.83 0.85

GB + Binning + mRMR + RUS 0.73 0.9 0.78 0.83

GB + Binning + mRMR + ROS 0.85 0.92 0.92 0.91

GB + Binning + mRMR + SMOTE 0.81 0.91 0.87 0.89

(a) Latency (ms) of Cloud-based Digital Twin

Framework

(b) Latency (ms) of Edge-based Digital Twin

Framework

Figure 4: Latency Comparison of Cloud-based and Edge-based Digital Twin Framework.

amining of the results, it is seen that the scores ob-

tained by over sampling are generally better than un-

der sampling.

When the results we obtained in the study are

compared with the results of CardioTwin (Figure 3),

our models are more successful in terms of accuracy,

recall and F-score. On the other hand, CardioTwin

reach higher rate in terms of precision. However, re-

call rate is vital in healthcare studies. Therefore, we

tuned the models according to recall metric.

5.2 Digital Twin Frameworks

In this study, edge-based and cloud-based DT frame-

works are presented and compared in terms of scala-

bility, ﬂexibility, latency and security. The incredible

increase in IoT devices and generated data has made

scalability an important criterion. Adding new nodes

in edge-based DT is considerably better than cloud-

based DT in terms of scalability as it has little effect

on system latency performance (Khan et al., 2022).

In order to keep up with the diversity of digitaliza-

tion and IoT devices in every ﬁeld, systems must meet

very high requirements. DTs implemented in Edge

meet this ﬂexibility requirement quite well compared

to cloud-based DTs (Khan et al., 2022). The latency is

a major issue in scenarios where results are required

in real time. To produce results on cloud-based DT,

the data generated at the edge must be transferred to

the cloud environment. The latency problem is mini-

mized as the ML model on edge-based DT produces

results where the data is generated. The results of our

study support this, according to the results (Figure 4),

it was observed that the latency of edge-based DT (av-

A Comparative Study on Cloud-based and Edge-Based Digital Twin Frameworks for Prediction of Cardiovascular Disease

167

erage 10.6 ms, Figure 4b) is lower than cloud-based

DT (average 18.8 ms, Figure 4a). Security is impor-

tant to provide secure and reliable services to users,

so it is a metric we should consider in the systems

we develop (Asim et al., 2020). Edge-based systems

are more secure than cloud-based systems because of

their decentralized architecture, whereas cloud-based

systems are more vulnerable to attacks because they

transmit long distances between users and the cloud

(Asim et al., 2020).

6 CONCLUSION

The concept of Industry 4.0, which combines the do-

mains of Informatics and Industry, has spread from

the industrial sector to all other sectors. IoT, 5G and

6G networks, cloud and edge computing, big data, AI

and DT technologies are at the center of these devel-

opments.

In this paper, we discussed the comparison of

cloud-based DT and edge-based DT via a case study.

In this study, we built ML models on PTBD health-

care dataset to predict human heart diseases in real

time and thus to apply quick treatments. In this

context, we performed preprocessing stages such as

cleaning the signal data, denoising, smoothing, peak

extraction, eliminate the NaN values, imputer for

missing values and feature engineering stages such

as feature binning, feature selection, sampling. Even

though in Cardio Twin (Martinez-Velazquez et al.,

2019) paper was obtained the highest precision rate,

we tuning it based on the recall metric because TP

value is more vital in detecting diseases in the health

ﬁeld. Thus, we outperformed better in terms of recall,

F-score and accuracy.

A major future step of this study is to apply a so-

lution to the data security and privacy concern, which

is frequently encountered in health studies, by com-

bining cloud computing, edge computing, federated

learning and DT technologies. In addition, our future

work also will include on trying different ML mod-

els with the new feature dataset containing the clini-

cal ﬁndings of the patients and validating models with

different health dataset. Additionally, a case study in-

cludes data which is collected from sensors can be

added for the real world usage experiment.

REFERENCES

Asim, M., Wang, Y., Wang, K., and Huang, P.-Q. (2020).

A review on computational intelligence techniques

in cloud and edge computing. IEEE Transactions

on Emerging Topics in Computational Intelligence,

4(6):742–763.

Bellavista, P., Giannelli, C., Mamei, M., Mendula, M., and

Picone, M. (2021). Application-driven network-aware

digital twin management in industrial edge environ-

ments. IEEE Transactions on Industrial Informatics,

17(11):7791–7801.

ornsson, B., Borrebaeck, C., Elander, N., Gasslander, T.,

Gawel, D. R., Gustafsson, M., J

ornsten, R., Lee, E. J.,

Li, X., Lilja, S., et al. (2020). Digital twins to person-

alize medicine. Genome medicine, 12(1):1–4.

Bousseljot, R., Kreiseler, D., and Schnabel, A. (1995).

Nutzung der ekg-signaldatenbank cardiodat der ptb

uber das internet.

Dritsas., E., Alexiou., S., and Moustakas., K. (2022).

Cardiovascular disease risk prediction with super-

vised machine learning techniques. In Proceedings

of the 8th International Conference on Information

and Communication Technologies for Ageing Well

and e-Health - ICT4AWE,, pages 315–321. INSTICC,

SciTePress.

Friedman, J. H. (2001). Greedy function approximation: a

gradient boosting machine. Annals of statistics, pages

1189–1232.

Fuller, A., Fan, Z., Day, C., and Barlow, C. (2020). Digi-

tal twin: Enabling technologies, challenges and open

research. IEEE access, 8:108952–108971.

Ge, X., Zhou, R., and Li, Q. (2019). 5g nfv-based tactile

internet for mission-critical iot services. IEEE Internet

of Things Journal, 7(7):6150–6163.

Githens, G. (2007). Product lifecycle management: driv-

ing the next generation of lean thinking by michael

grieves.

Glatt, M., K

olsch, P., Siedler, C., Langlotz, P., Ehmsen, S.,

and Aurich, J. C. (2021). Edge-based digital twin to

trace and ensure sustainability in cross-company pro-

duction networks. Procedia CIRP, 98:276–281.

Goldberger, A. L., Amaral, L. A., Glass, L., Hausdorff,

J. M., Ivanov, P. C., Mark, R. G., Mietus, J. E., Moody,

G. B., Peng, C.-K., and Stanley, H. E. (2000). Phys-

iobank, physiotoolkit, and physionet: components of

a new research resource for complex physiologic sig-

nals. circulation, 101(23):e215–e220.

Grieves, M. (2002). Plm initiatives [powerpoint slides]. In

Product Lifecycle Management Special Meeting.

Grieves, M. W. (2005). Product lifecycle management: the

new paradigm for enterprises. International Journal

of Product Development, 2(1-2):71–84.

Grieves, M. W. (2019). Virtually intelligent product sys-

tems: digital and physical twins.

Halepmollası, R., Zeybel, M., Eyvaz, E., Arkan, R., Genc,

A., Bilgen, I., and Haklidir, M. (2021). Towards fed-

erated learning in identiﬁcation of medical images: A

case study. Artiﬁcial Intelligence Theory and Appli-

cation, Volume: 1:30–39.

Hassani, H., Huang, X., and MacFeely, S. (2022). Impactful

digital twin in the healthcare revolution. Big Data and

Cognitive Computing, 6(3):83.

Huang., J., Chang., L., and Lin., H. (2021). Implementation

of iot, wearable devices, google assistant and google

ICT4AWE 2023 - 9th International Conference on Information and Communication Technologies for Ageing Well and e-Health

168

cloud platform for elderly home care system. In Pro-

ceedings of the 7th International Conference on Infor-

mation and Communication Technologies for Ageing

Well and e-Health - ICT4AWE,, pages 203–212. IN-

STICC, SciTePress.

Jimenez, J. I., Jahankhani, H., and Kendzierskyj, S. (2020).

Health care in the cyberspace: Medical cyber-physical

system and digital twin challenges. In Digital twin

technologies and smart cities, pages 79–92. Springer.

Kalis, B., McHugh, J., Safavi, K., and Truscott, A. (2022).

Accenture digital health technology vision 2022.

Kamel Boulos, M. N. and Zhang, P. (2021). Digital twins:

from personalised medicine to precision public health.

Journal of Personalized Medicine, 11(8):745.

Karakra, A., Fontanili, F., Lamine, E., and Lamothe, J.

(2019). Hospit’win: a predictive simulation-based

digital twin for patients pathways in hospital. In 2019

IEEE EMBS international conference on biomedical

& health informatics (BHI), pages 1–4. IEEE.

Khan, L. U., Saad, W., Niyato, D., Han, Z., and Hong, C. S.

(2022). Digital-twin-enabled 6g: Vision, architectural

trends, and future directions. IEEE Communications

Magazine, 60(1):74–80.

Kher, R. et al. (2019). Signal processing techniques for re-

moving noise from ecg signals. J. Biomed. Eng. Res,

3(101):1–9.

Liu, Y., Zhang, L., Yang, Y., Zhou, L., Ren, L., Wang, F.,

Liu, R., Pang, Z., and Deen, M. J. (2019). A novel

cloud-based framework for the elderly healthcare ser-

vices using digital twin. IEEE access, 7:49088–

49101.

Luo, J., Ying, K., and Bai, J. (2005). Savitzky–golay

smoothing and differentiation ﬁlter for even number

data. Signal processing, 85(7):1429–1434.

Martinez-Velazquez, R., Gamez, R., and El Saddik, A.

(2019). Cardio twin: A digital twin of the human heart

running on the edge. In 2019 IEEE International Sym-

posium on Medical Measurements and Applications

(MeMeA), pages 1–6. IEEE.

McCullagh, P. and Nelder, J. A. (2019). Generalized linear

models. Routledge.

Murphy, K. P. (2018). Machine learning: A probabilistic

perspective (adaptive computation and machine learn-

ing series).

Piascik, B., Vickers, J., Lowry, D., Scotti, S., Stewart, J.,

and Calomino, A. (2012). Materials, structures, me-

chanical systems, and manufacturing roadmap. NASA

TA, pages 12–2.

Rachburee, N. and Punlumjeak, W. (2015). A comparison

of feature selection approach between greedy, ig-ratio,

chi-square, and mrmr in educational mining. In 2015

7th international conference on information technol-

ogy and electrical engineering (ICITEE), pages 420–

424. IEEE.

Scharff, S. (2018). From digital twin to improved patient

experience.

Shengli, W. (2021). Is human digital twin possible? Com-

puter Methods and Programs in Biomedicine Update,

1:100014.

Singh, M., Srivastava, R., Fuenmayor, E., Kuts, V., Qiao,

Y., Murray, N., and Devine, D. (2022). Applications

of digital twin across industries: A review. Applied

Sciences, 12(11):5727.

Tao, F., Xiao, B., Qi, Q., Cheng, J., and Ji, P. (2022). Digital

twin modeling. Journal of Manufacturing Systems,

64:372–389.

Turlapati, V. P. K. and Prusty, M. R. (2020). Outlier-smote:

A reﬁned oversampling technique for improved de-

tection of covid-19. Intelligence-based medicine,

3:100023.

Tyagi, S., Agarwal, A., and Maheshwari, P. (2016). A

conceptual framework for iot-based healthcare sys-

tem using cloud computing. In 2016 6th International

Conference-Cloud System and Big Data Engineering

(Conﬂuence), pages 503–507. IEEE.

Wang, Y., Xu, R., Zhou, C., Kang, X., and Chen, Z. (2022).

Digital twin and cloud-side-end collaboration for in-

telligent battery management system. Journal of Man-

ufacturing Systems, 62:124–134.

Wang, Z., Liao, X., Zhao, X., Han, K., Tiwari, P., Barth,

M. J., and Wu, G. (2020). A digital twin paradigm:

Vehicle-to-cloud based advanced driver assistance

systems. In 2020 IEEE 91st Vehicular Technology

Conference (VTC2020-Spring), pages 1–6. IEEE.

WHO (2020). The top 10 causes of death.

A Comparative Study on Cloud-based and Edge-Based Digital Twin Frameworks for Prediction of Cardiovascular Disease

169