A Comparative Study on Cloud-based and Edge-Based Digital Twin
Frameworks for Prediction of Cardiovascular Disease
Havvanur Dervis¸o
˘
glu
a
, Burak
¨
Ulver
b
, Rabia Arkan Yurto
˘
glu
c
, Rus¸en Halepmollası
d
and Mehmet Haklıdır
e
T
¨
UB
˙
ITAK Informatics and Information Security Research Center, Kocaeli, Turkey
Keywords:
Cloud Computing, Edge Computing, Digital Twin, Healthcare.
Abstract:
Digital Twins that can integrate with related technologies such as Artificial intelligence, optimization, mobile
communication systems, edge computing, fog computing, cloud computing, etc. are virtual representations
of physical objects and reflect the real time status through streaming data. In this study, we provide two Dig-
ital Twin frameworks both cloud-based and edge-based and compare them in terms of scalability, flexibility,
latency and security. We represented those frameworks by developing a case study to predict cardiac patient,
continuously monitor the risks related to heart disease, and reporting the risks to both healthcare professionals
and users in real time. We extracted features over electrocardiogram signals and performed popular machine
learning algorithms. We employed feature binning and feature selection methods to increase the robustness of
the prediction model and, in total, we built 20 models. We presented empirical analysis on a publicly available
dataset based on PTB Diagnostic ECG Database and evaluated the results in terms of accuracy, precision, re-
call and F-score. When predicting cardiac patients, Linear Regression outperformed the other classifiers with
accuracy and F-score rates of 86% and 92%, respectively. This model has also the highest recall rate (98%),
which is vital in predicting diseases. Meanwhile, Gradient Boosted Tree applied binning, mRMR feature se-
lection method and random oversampling achieve high precision (91%).
1 INTRODUCTION
Industry 4.0 is one of the key initiatives of utiliz-
ing a wide range of advanced technologies, such as
cloud computing, big data, Digital Twin (DT), Ma-
chine Learning (ML), Deep Learning (DL) and virtual
reality. The Internet of Things (IoT), which is another
aspect of Industry 4.0, has a significant role in the
digitalization of data for contributing value to many
sectors including health, energy, education, industry,
etc.-. Also, IoT and digitalization represent a novel
paradigm by enabling the creation of DTs through the
capability to collect and communicate data.
DT was introduced as a concept underlying prod-
uct lifecycle management by Grieves (2002) and of-
fered with different names such as mirrored spaces
model (Grieves, 2005), information mirroring model,
a
https://orcid.org/0000-0002-2122-6944
b
https://orcid.org/0000-0002-5790-0590
c
https://orcid.org/0000-0003-3837-8052
d
https://orcid.org/0000-0002-9941-2712
e
https://orcid.org/0000-0003-4985-1116
and virtual twin (Githens, 2007). In 2010, NASA
used DT for the Apollo project in which two identical
space vehicles were created to simulate space status
during flight training. Thus, John Vickers first coined
”DT” name for the model in 2010 NASA Roadmap
Report (Piascik et al., 2012). In this context, DT con-
cept is a model that can separate information about
a physical system from the system itself and subse-
quently mirror or twin that system (Grieves, 2019).
Industry 4.0 also elevates healthcare to novel and
advanced levels on the basis of digitization, IoT, AI,
cloud/fog/edge computing and 5G networks. Further-
more, those technologies make possible the collection
and analysis of data from anyone, anywhere and any-
time and dramatically impact the healthcare systems
by connecting them to patients’ personal devices to
capture data and to notify patients, doctors, or patient
relatives in real time Ge et al. (2019). Moreover, the
emergence of smart healthcare services, through digi-
talization, has grown rapidly the body of research and
implementation of several methods focus on improv-
ing the quality of healthcare services and human wel-
fare while reducing healthcare costs and the death rate
Dervi¸so
˘
glu, H., Ülver, B., Yurto
˘
glu, R., Halepmollası, R. and Haklıdır, M.
A Comparative Study on Cloud-based and Edge-Based Digital Twin Frameworks for Prediction of Cardiovascular Disease.
DOI: 10.5220/0011859400003476
In Proceedings of the 9th International Conference on Information and Communication Technologies for Ageing Well and e-Health (ICT4AWE 2023), pages 159-169
ISBN: 978-989-758-645-3; ISSN: 2184-4984
Copyright
c
2023 by SCITEPRESS – Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)
159
(Huang. et al., 2021; Dritsas. et al., 2022). In this con-
text, DT promises a new era for healthcare by chang-
ing the smart health concept and taking medicine to
an unprecedented level.
In recent years, researchers from both academia
and industry have expressed considerable interest in
the improvement of DT technologies. Therefore, re-
search studies on DTs and their applications have
been prominent in a wide range of domains includ-
ing healthcare. DTs, which can combine other core
technologies including AI, big data, 5G/6G, cloud
computing and edge computing, are virtual represen-
tations of physical assets and can express the real
time situation through streaming data (Fuller et al.,
2020). Moreover, edge computing and cloud comput-
ing, through supplementary capabilities, offer new as-
pects when implementing DTs on the different levels
where they have several requirements such as scala-
bility, latency, reliability, centralization, etc. to pro-
cess, analyse and transmit data. There exist many
studies investigating edge-based DT and cloud-based
DT, separately. However, there is a gap in studies im-
plementing both cloud-based DT and edge-based DT
and comparing their performances. Although Khan
et al. (2022) compared cloud-based against edge-
based DT, they did not provide any empirical ev-
idence. According to our knowledge, there is no
study implementing and comparing both edge-based
DT and cloud-based DT frameworks.
In this study, we aim to offer both an edge-based
DT and a cloud-based to compare their performance.
Hence, we present a cloud-based DT framework and
an edge-based DT framework and contrast them in
terms of latency, scalability, mobility, and centraliz-
ing. To represent the frameworks on this basis, we im-
plemented a case study to predict cardiac patients and
monitor the risks of heart diseases, reporting those
to users and healthcare providers in real time. To
this end, we trained common ML classifiers with ex-
tracted features over electrocardiogram signals. To
improve the prediction results, we applied data pre-
processing and used various feature binning and fea-
ture selection techniques. We presented an empirical
study on a dataset based on an open-source database,
namely, PTB Diagnostic ECG Database (Bousseljot
et al., 1995; Goldberger et al., 2000). The dataset con-
sists of 549 samples collected from 290 persons (209
men and 81 women). While 69 of the samples are la-
belled as healthy, 378 of them are labelled as patients.
We applied various sampling methods with differ-
ent parameters due to the imbalanced structure of the
dataset. We compared the ML model results in terms
of accuracy, precision, recall and F-score. According
to our results, we outperformed a benchmark study
(Cardiac Twin) (Martinez-Velazquez et al., 2019) on
the same dataset.
Structure of the Paper. Section 2 summarizes pre-
vious related works on cloud-based DTs, edge-based
DTs and DT applications in healthcare. In Section 3,
we describe DT frameworks for cloud and edge, sep-
arately. Section 4 provides a case study in which we
explain the dataset and also present data preprocess-
ing, classification methods and evaluation metrics. In
Section 5, the detail of obtained results is reported and
discussed. Finally, we conclude the paper and present
the future work in Section 6.
2 RELATED WORK
Researchers have widely studied DT, which uses
other popular technologies including cloud comput-
ing, edge computing, AI and IoT, in various fields
such as manufacturing, energy, aerospace, construc-
tion and healthcare. In this section, we discuss the
studies in the literature which utilize DTs on cloud,
DTs on edge and DTs in healthcare, respectively.
2.1 Digital Twins on Cloud
The utilization of DT with cloud computing technol-
ogy which has many capabilities like unlimited stor-
age capacity, dynamic scaling, and high availability
allows considerable advancements to be experienced
in several fields (Liu et al., 2019; Wang et al., 2022,
2020). Liu et al. (2019) presented a cloud-based
DT Health (CloudDTH) reference framework includ-
ing key technologies, i.e., cloud computing, health
IoT and DT, to create effective solutions for elderly
health services. CloudDTH, which consists of eight
layer architecture, is based on DT Healthcare (DTH)
conceptual model. Moreover, they conducted a case
study that made drug recommendations to patients or
healthcare professionals using online ECG data (Liu
et al., 2019).
In (Wang et al., 2022), a Battery Management
System, which has a 4-layer architecture and utilizes
the power of cloud and DT technologies, was pro-
vided. The architecture processes big data using the
capabilities of cloud technology such as high stor-
age and processing power capacity, while also using
the capabilities of DT technology to digitize the be-
havior and real processes of the battery. Authors ar-
gue that real time optimization studies can be imple-
mented throughout the whole life cycle of batteries for
more complicated and intelligent battery management
with BMS, or various analyzes and predictions can be
ICT4AWE 2023 - 9th International Conference on Information and Communication Technologies for Ageing Well and e-Health
160
made by obtaining insights from the data using histor-
ical data and AI models. Besides, they stated that sen-
sitive management processes of systems with more
complex structures, such as large-capacity lithium-
ion battery packs, can be done effectively with DT
and cloud technologies.
Wang et al. (2020) proposed a DT framework
for connected vehicles. The proposed DT frame-
work uses the V2C (vehicle-to-cloud) communication
based Advanced Driver Assistance System to twin the
connected vehicles in the cloud. According to their
results, the proposed system can benefit transporta-
tion with acceptable communication delays.
2.2 Digital Twins on Edge
In literature, to utilize the advantages of edge comput-
ing such as latency and mobility, there are also studies
that present DT on edge in various fields (Martinez-
Velazquez et al., 2019; Bellavista et al., 2021).
Martinez-Velazquez et al. (2019) presented an
edge-based DT architecture, namely Cardio Twin,
working at the edge for monitoring the heart con-
ditions of patients. Cardio Twin consists of three
layers, i.e. Data Source, AI-Inference Engine, and
Multimodal Interaction. Besides, as a PoC study,
they implemented the AI-Inference Engine based on
those layers and built a CNN model using PhysioNet’s
”PTB Diagnostic ECG Database” dataset. They ob-
tained data on the mobile phones used as edge devices
during the real time test phases. The performance
results of the model obtained at the end of the PoC
study are accuracy 85.7%, precision 95.5% and recall
86.3%. The authors emphasized that edge-based DT
architecture takes the advantages of edge computing
and prevents the latency caused by cloud (Martinez-
Velazquez et al., 2019).
Bellavista et al. (2021) argue that manual config-
urations of networks in industrial environments are
time-consuming and error-prone. To this end, au-
thors presented the Application-Driven DT Network-
ing (ADTN) middleware. ADTN middleware consists
of semantically enriching simple DTs, namely SDT,
deployed to edge nodes and composed DTs, namely,
CDT, that perform flexible arrangements. According
to their results, ADTN middleware is the feasibility
and efficient, however, there are issues to be investi-
gated in the future to promote its use in real industrial
environments
In (Glatt et al., 2021), the edge-based DT concept
was introduced to ensure sustainability through the
assessment of ecological conditions in cross-company
production networks. The presented concept consists
of two levels, namely the Network level and the Com-
pany level. While the Network level includes the pro-
cesses and activities of several companies, the Com-
pany level focuses on individual processes in detail.
The authors mentioned that the difficulties that may
be encountered with the application of this concept in
an industrial environment can be determined and the
effect of the presented approach on performance can
be examined (Glatt et al., 2021).
2.3 Digital Twins in Healtcare
Problems relating to an inability to accession patients’
historic data, corrupt/miss health data and undiag-
nosed or delayed diagnoses cause thousands of deaths
each year (Tyagi et al., 2016). On the other hand,
technologies facilitating the digitization of medicine,
in general, allow ML methods to be trained on suf-
ficiently large dataset and achieve clinical accuracy
that is vital in medicine (Halepmollası et al., 2021).
Thus, those technologies lead a new paradigm reduc-
ing costs while improving the quality of health ser-
vices. According to a recent research report (Kalis
et al., 2022), almost 80% of healthcare executives
stated that their organizations’ usage of IoT/Edge
devices had increased enormously during the pre-
vious three years. Also, nearly half of them be-
lieve that DT technology will make a breakthrough
in the future by building a bridge between the digi-
tal and physical worlds and have a positive impact on
healthcare (Kalis et al., 2022). Moreover, there exist
many studies presenting DT applications in health-
care based on Human/Patient (Kamel Boulos and
Zhang, 2021; Shengli, 2021), Hospital/Health Insti-
tutions (Hassani et al., 2022; Singh et al., 2022) and
Medical (Bj
¨
ornsson et al., 2020).
Personalized and holistic healthcare can be pro-
vided for the whole life cycle of humans through
Human/Patient-oriented DT research. Those health
services include monitoring persons’ health state,
early detecting and diagnosing diseases, applying
personalized treatment methods according to ge-
netic, physiological, and other characteristics of per-
sons, and examining the effect of the therapies used
(Kamel Boulos and Zhang, 2021; Singh et al., 2022).
DT after life can be benefit for organ transplant pro-
cedures such as recipient-donor matching and organ
transplant (Hassani et al., 2022). Shengli (2021) pre-
sented Human DT (HDT) that is based on the Aug-
mented DT conceptual model to provide the lifecy-
cle management of a human. To represent human
in cyberspace, the proposed model overcome chal-
lenges like the complexity of human, social ethic is-
sues, safety etc. Thus, author states that DT is an im-
portant technology that provides the interaction be-
A Comparative Study on Cloud-based and Edge-Based Digital Twin Frameworks for Prediction of Cardiovascular Disease
161
tween physical and cyberspace (Shengli, 2021).
Although human DTs have the aforementioned
benefits, they also have several challenges that must
be overcome to create them holistically and compre-
hensively. For instance, collecting health data such
as blood analysis and X-ray required to create hu-
man DTs can be time consuming and costly (Shengli,
2021). Also, the collected data can be person-based
and diverse and has security and privacy concerns
(Shengli, 2021; Jimenez et al., 2020).
The services provided by the various organiza-
tional structure components of the health institutes
can be utilized effectively through Hospital-oriented
DT research. For example, to improve patient care
and treatment, health services can be improved by
scheduling further studies in all processes (Singh
et al., 2022). Besides, monitoring electronic devices
in terms of maintenance and repairs can save money
and time while also ensuring that patients receive non-
stop service and preventing the breakdown of devices.
Hospitals that increase staff productivity, treatment
success rate, patient satisfaction, and effective execu-
tion of operations can be designed and constructed via
DT technology (Hassani et al., 2022; Karakra et al.,
2019). For instance, Siemens Healthineers use DT
techniques to streamline hospital operations in the ra-
diology department of a hospital in Dublin, Ireland
(Scharff, 2018). Moreover, in (Hassani et al., 2022;
Tao et al., 2022), authors stated that medical-oriented
DT benefits to observe the effects and determine the
best intervention method can be obtained by perform-
ing applications on DT before any surgical interven-
tion or performing experiments on DTs in the de-
velopment of new medical equipment and drugs or
healthcare education can be done practically on DTs.
3 DIGITAL TWIN
In this study, we aim to compare the performance of
two different DT frameworks in predicting cardiac pa-
tients. While the first framework is based on cloud
computing, the second framework is based on edge
computing (Figure 1).
We offered a DT framework on cloud that pro-
vides scalability, resource sharing, and service-on-
demand. As illustrated in Figure 1a, the framework
consists of three layers:
Edge Layer. In this layer, there exist devices (e.g.
cell phone or smart watch) used to create digital
copies of the physical assets. The ECG signals of the
patients are sent from this layer to the cloud layer.
Cloud Layer. The digital copy of the physical en-
tity is created on this layer that contains the storage
and AI modules. The storage module contains two
databases, one holding the historical ECG signals of
the patients in the system and the other holding pre-
diction results. As shown in Figure 2, AI module has
two main tasks (i) the first task is to preprocess the
historic data and build ML models; (ii) the second
task is to predict cardiac or not through deployed ML
model in real time and send prediction results to the
database on storage module.
Application Layer. The status of the physical asset
can be continuously monitored in this layer. Thus, the
layer includes screens on which prediction results can
be visualized and also rules that send notifications to
the expert based on prediction results.
3.1 Digital Twin Framework on Edge
We also provided a DT framework on edge that deals
with latency issues and allow mobility. As shown
in Figure 1b, the edge-based DT framework also in-
cludes three layers:
Edge Layer. In this layer, there are devices used to
create digital copies of assets. Also, in the edge-based
DT framework, ML models that were trained on the
cloud are deployed to those devices on edge.
Cloud Layer. It is similar to the cloud-based DT
framework, cloud layer includes Storage and AI mod-
ules. On the other hand, ML models are trained on
cloud layer and utilized on edge layer in predicting
cardiac patients.
Application Layer. In edge-based DT framework,
this layer is not only available on the cloud, but also
on the edge. Thus, users can monitor their own status
in real time and notifications can be sent to experts
according to the prediction results.
4 CASE STUDY
In this section, we define the problem statement,
present the details of the dataset obtained from PTB
Diagnostic ECG Database (PTBD) (Bousseljot et al.,
1995; Goldberger et al., 2000) and explain the details
of AI module in which ML models were constructed.
Also, we describe the evaluation metrics used to com-
pare the prediction results of ML models.
ICT4AWE 2023 - 9th International Conference on Information and Communication Technologies for Ageing Well and e-Health
162
(a) Cloud-based Digital Twin Framework (b) Edge-based Digital Twin Framework
Figure 1: Cloud-based and Edge-based Digital Twin Framework Structures and Components.
4.1 Problem Statement
It’s crucial to continuously monitor cardiac patients
to reduce the rate of sudden death that heart diseases
often might result in it (WHO, 2020).Therefore, per-
sonal health monitoring tools, such as mobile apps or
built-in sensors, can continuously monitor key health
indicators of a user (e.g. ECG, blood pressure, heart
rate, etc.) and reduces the risk of incorrect data en-
try. Meanwhile, anonymous data can be captured and
transferred to the cloud by those devices and com-
pared with historical data to detect any disease or no-
tify the appropriate health personnel. In this context,
monitoring heart health indicators enables quick inter-
vention in emergency situations and provides early di-
agnosis of diseases by predicting possible risks. Thus,
healthcare services and patients’ quality of life can
improve.
Creating a virtual representation of each patient
could be one of the best ways for healthcare systems
in monitoring key health indicators, increasing con-
trol over health, and enhancing healthcare services.
To this end, in this study, we offer two different DT
frameworks that allow continuously monitoring of pa-
tients’ heart health indicators and compare them in
terms of latency, scalability, etc.-. Also, we build
an ML model to predict in real time whether people
whose heart health data are monitored are healthy or
have heart disease. For this purpose, we combine DT
with data analytics, ML, cloud computing, edge com-
puting and IoT technologies in both cloud-based DT
and edge-based DT frameworks.
4.2 Dataset
We used a publicly available dataset obtained from
PhysioNet’s ”PTB Diagnostic ECG Database” (Bous-
seljot et al., 1995; Goldberger et al., 2000). In this
dataset, the ECGs were recorded using an experimen-
tal, non-commercial PTB recorder that satisfied the
various requirements including 16 input channels, in-
put voltage, input resistance, resolution, bandwidth,
noise voltage, online recording of skin resistance, and
noise level recording during signal collection. The
details of requirements are explained in (Goldberger
et al., 2000).
The dataset contains 549 records from 290 people,
209 men and 81 women. The registered individuals
range in age from 17 to 87 and mean age of men is
55.5 while mean age of women is 61.6. Also, some
people may have 5 records, while others only have
one record. Each record contains 15 signals measured
simultaneously (i, ii, iii, avr, avl, avf, v1, v2, v3, v4,
v5, v6, vx, vy, vz). In the dataset, there are 69 samples
with healthy labels, 378 have patient labels and the
rest have no labels.
4.3 Preprocessing Steps in AI Module
There are 15 different ECG signals that are generally
sensitive to noise. We used v4 signals as it is received
from the closest location to the heart and best rep-
resent the status of the heart. Moreover, five fiducial
points P, Q, R, S, and T were extracted from the signal
data and also obtained the distance, amplitude, angle,
slope and height between the points using the fiducial
points. The stages performed on the signal data are as
follows (Figure 2):
4.3.1 Cleaning the Signal Data, Denoising and
Smoothing
ECG signals define how the heart beats electrically.
The ECG signals are produced when the heart’s atrial
and ventricular muscles contract and relax. How-
ever, ECG signals have four primary types of artifacts,
i.e., baseline wander, powerline interference, EMG
noise, and electrode motion (Kher et al., 2019). Ar-
tifacts are unwanted signals and sometimes prevent
doctors from making a correct diagnosis. Therefore,
to remove artifacts from ECGs, we used appropriate
signal-processing filters that are generally utilized to
remove or reduce noise and cure data quality.
A Comparative Study on Cloud-based and Edge-Based Digital Twin Frameworks for Prediction of Cardiovascular Disease
163
Figure 2: AI Module Flow Diagram.
High-pass Filter. Baseline wander is an effect that
causes a signal to zigzag rather than to straight. Those
zigzags may cause the signal to move from regular
base and can be eliminated by using a high-pass filter.
The maximum ripple of the filter is set to 12 db and
the Kaiser window technique (Wang et al., 2022) is
used to determine the filter window parameters.
Band Stop Filter. Powerline interference repre-
sents a common noise source caused by electromag-
netic fields and muscle contractions. The noise is
identified by 50 or 60 Hz sinusoidal interference and
affects low-frequency ECG waves. We determined
the cut-off frequencies, used for the band-stop filter,
as 59.5 and 60.5 Hz to remove noise.
Low-pass Filter. We used to Low-pass filter to
eliminate high order harmonics.
Smoothing Filter. We applied smoothing filter,
namely Savitzky-Golay filter (Luo et al., 2005), on
the signal after removing the noise. As a result of this
filter, we can capture important patterns on the signal
and detect peaks with high accuracy.
4.3.2 Peak Extraction
We extracted the peak points over the ECG signal. To
this end, we firstly identified the R peaks over the sig-
nal then we identified the T, Q, P and S peaks using
the R peaks.
4.3.3 Eliminate and Impute the NaN Values
We performed K Nearest Neighbour method on each
list of peaks to deal with the NaN values. For this pur-
pose, we calculated a mean of its k nearest neighbours
for each missing data in the training set and used those
to impute the NaN values with the means.
4.4 Methodology in AI Module
Besides to aforementioned preprocessing steps, we
also applied the feature engineering techniques - i.e.,
feature binning, feature selection methods and sam-
pling, respectively- to improve the ML model results
(Figure 2).
4.4.1 Feature Binning
The process of converting continuous or numerical
values into categorical features is called binning or
discretization. In this study, the features extracted
from the signal data are continuous. Hence, to inves-
tigate the effects of discrete features on the prediction
process, we binned the features. For this purpose, we
employed a quantile-based discretization function.
Quantile-Based Discretization. It is the process of
creating equal-sized bins by discretizing the variable
based on order or sample quantities. We applied
Quantile-based discretization for each continuous fea-
ture column (quantile number is 4).
ICT4AWE 2023 - 9th International Conference on Information and Communication Technologies for Ageing Well and e-Health
164
4.4.2 Feature Selection
The success of the prediction models is directly re-
lated to the use of relevant feature selection meth-
ods. In order to select the features that best represent
the model, we applied two feature selection methods,
i.e., Chi-square and mRMR, which are the most pre-
ferred in the literature (Rachburee and Punlumjeak,
2015) and we also compared their performance. We
extracted 58 different features over the ECG signals
by implementing the above methods.
Minimum Redundancy and Maximum Relevance
(mRMR). It is the filtering process that selects the
features with the highest correlation with the target
classes using the relationship between the feature and
the target class (Rachburee and Punlumjeak, 2015).
In the study, in order to observe the effect of the
selected feature number on the model performance,
the dataset obtained by changing the selected fea-
ture number with mRMR were used in model training
(The number of different selected features is 4, 10, 15,
18 and 20).
Chi-Square. Chi-square makes feature selection by
guessing whether the class label is independent of
a feature(Rachburee and Punlumjeak, 2015). In the
study, different results were interpreted by choosing
10 and 15 features with this method.
4.4.3 Sampling
In our dataset, each instance has a corresponding
label of “0” or “1”, where “1” means cardiac and
“0” means healthy persons. The samples labelled as
healthy persons only account for a small portion of
the whole dataset (15,4%). Meanwhile, the imbal-
anced distribution of classes in the dataset directly
affects the prediction performance as ML algorithms
usually suppose a balanced class distribution (Mur-
phy, 2018) Furthermore, training classification mod-
els directly with imbalanced data may cause bias in
the prediction performance and result in a low pre-
diction score in terms of some evaluation metrics.
Thus, we implemented sampling methods to address
the problem of a serious imbalance between cardiac
and healthy classes. We performed several sampling
techniques including random under-sampling, ran-
dom over-sampling and SMOTE with several rates.
Random Under Sampling (RUS). To deal with is-
sues caused by imbalanced dataset and obtain a bal-
anced dataset, we applied the random under-sampling
technique. Under-sampling creates a balanced dataset
consisting of classes with the same number of sam-
ples by making as much selection as the minor class
from the major class in the imbalanced dataset.
Random over Sampling (ROS). There are a num-
ber of methods available to obtain balanced dataset.
One of them is oversampling that creates multiple
copies of the minority class in the training data, up
to the number of members of the major class.
SMOTE: Synthetic Minority Oversampling Tech-
nique (SMOTE) is one of the most frequently used
method in the literature to balance the number of sam-
ples in classes (Turlapati and Prusty, 2020). Minority
instances are increased by using linear interpolation
for training to balance the number of samples between
the two classes (Turlapati and Prusty, 2020).
4.4.4 Stratified K-fold Cross-validation
K-fold cross-validation involves splitting the dataset
into k folds. With this method, iteratively the k-
1 fold is used in training and the k. fold is used
in the test, thus allowing each k to be used as test
data. In our study, we used the stratified k-fold cross-
validation method, which is suitable for unbalanced
dataset, which is more suitable for our problem as it
preserves the class distribution in each k (used as k 5
in the study).
4.4.5 Machine Learning Model
Our objective in the study is to develop a DT that
predicts whether the person is sick using ECG signal
data and sends a notification to the specialist in case
of illness. In this way, stream ECG signal data will
be processed and emergency intervention will be pro-
vided in case the person is sick. For this purpose, we
first performed preprocessing, feature binning, selec-
tion and sampling operations on the data, and then we
used the model that gave the best results in DT frame-
work by obtaining the performance results in different
ML methods. In this study, we built the cardiac pa-
tient prediction model using ML algorithms, namely,
Gradient Boosted Tree and Linear Regression as they
are commonly used techniques for binary classifica-
tion problems.
Logistic Regression (LR): In this study, we used
LR is often used in binary classification problems in
modelling the probability of a discrete outcome cor-
responding to an input variable. We also performed
hyperparameter tuning on this model and the best re-
sults were obtained with default parameters (penalty:
A Comparative Study on Cloud-based and Edge-Based Digital Twin Frameworks for Prediction of Cardiovascular Disease
165
l2, solver: lbfgs and maximum iteration: 100) (Mc-
Cullagh and Nelder, 2019).
Gradient Boosting Classifier (GB). Boosting is an
ensemble transforming method for weak learners into
strong learners by adding new models to fix the er-
rors made by existing models. Models are added in-
crementally with iterations until no improvement is
detected. Gradient Boosting creates new trees after
creating the first leaf, taking into account the predic-
tion errors and utilizes the gradient descent algorithm
to minimise the loss. We also performed hyperparam-
eter tuning on GB using the grid search and the best
results were obtained when the learning rate was 1,
max depth 9 and the number of estimators 50 (Fried-
man, 2001).
4.5 Evalutation Metrics
In this study, we chose to analyze accuracy, recall,
precision, F-score over confusion matrices to assess
the predictive performance of classifiers. Accuracy
is the most common evaluation metric to identify the
correct prediction rate of classifiers. However, pre-
cision, recall, and F-measure metrics should be used
together with accuracy which can be misleading in
dataset where the imbalance and predictions belong-
ing to the less class are important.
Accuracy =
T P+T N
T P+FP+T N+FN
(1)
Precision is important when FPs are costly for
us, as it gives information about percentage of actual
cardiac patients among the predicted cardiac patients
(Eq. 1).
Precision =
T P
T P+FP
(1)
On the other hand, Recall is also important when
the FNs are critical, and calculated as shown in Eq. 2.
This metric shows to what proportion of accurately a
classifier predicts cardiac patients.
Recall =
T P
T P+FN
(2)
We also would like to measure he trade-off be-
tween recall and precision. Therefore, the F-score is
used as the harmonic mean of these metrics (Eq. 3).
F score =
2recallprecision
recall+precision
(3)
5 RESULTS AND DISCUSSION
In this section, we explain the details of the experi-
mental results obtained by several ML models. Also,
we compare cloud-based DT against edge-based DT.
In this article, we presented DT solution for real
time monitoring of heart disease, which is one of the
diseases that affect human life most with fast inter-
vention.
Figure 3: Comparison of AI Module Best Results and Car-
dioTwin.
5.1 Machine Learning Results
The proposed DT frameworks allow the users in the
system to be monitored in real time whether they are
cardiac or not. Also, they send notifications to the
experts in case of cardiac. In this study, we used two
different ML classifiers, namely GB and LR, with dif-
ferent preprocessing, feature selection, and sampling
methods. Then, we selected the model that has the
best prediction result to deploy DT. To validate mod-
els, we performed 5-fold cross validation method and
the analyzed results through different experiments.
Table 2 summarizes the results obtained from the ex-
periments of all cardiac patients prediction models.
When the results were examined, we obtained the
best results with the LR when the feature binning (the
number of bins is 4) and chi2 feature selection (the
number of features is 15) methods were applied to
the data. This combination achieves 86% accuracy
rate, 98% recall rate and 92% F-score rate. Also,
we achieved the highest precision rate (92%) with the
combination of GB+Binning+mRMR+ROS model in
predicting cardiac. On the other hand, the results in
our reference study called CarrdioTwin; accuracy is
85% and F-score is 90%.
When we analyzed the results in Table 1, we ob-
served that the best results were obtained LR + Bin-
ning + chi2 combination. The reason why the results
obtained with feature binning are better may be be-
cause the features in the dataset we use are continu-
ous and they need to be expressed categorically. The
better results we get when feature selection is applied
may be due to the fact that both the features (it is
applied after feature binning) and the label are com-
posed of discrete values. In addition, during the ex-
ICT4AWE 2023 - 9th International Conference on Information and Communication Technologies for Ageing Well and e-Health
166
Table 1: AI Module Benchmark Results.
Model ACC Precision Recall F-score
CardioTwin 0.85 0.95 0.86 0.90
Logistic Regression (LR) 0.82 0.84 0.97 0.9
LR + Binning 0.83 0.89 0.91 0.9
LR + Binning + mRMR 0.86 0.88 0.97 0.92
LR + Binning + chi2 0.86 0.87 0.98 0.92
LR + Binning + chi2 + RUS 0.79 0.91 0.84 0.87
LR + Binning + chi2 + ROS 0.81 0.91 0.85 0.88
LR + Binning + chi2 + SMOTE 0.8 0.9 0.86 0.88
LR + Binning + mRMR + RUS 0.8 0.9 0.85 0.88
LR + Binning + mRMR + ROS 0.77 0.91 0.8 0.85
LR + Binning + mRMR + SMOTE 0.8 0.9 0.86 0.88
Gradient Boosted Tree (GB) 0.84 0.89 0.92 0.91
GB + Binning 0.82 0.88 0.91 0.9
GB + Binning + mRMR 0.85 0.91 0.91 0.91
GB + Binning + chi2 0.82 0.88 0.91 0.9
GB + Binning + chi2 + RUS 0.68 0.89 0.71 0.79
GB + Binning + chi2 + ROS 0.79 0.88 0.87 0.88
GB + Binning + chi2 + SMOTE 0.76 0.88 0.83 0.85
GB + Binning + mRMR + RUS 0.73 0.9 0.78 0.83
GB + Binning + mRMR + ROS 0.85 0.92 0.92 0.91
GB + Binning + mRMR + SMOTE 0.81 0.91 0.87 0.89
(a) Latency (ms) of Cloud-based Digital Twin
Framework
(b) Latency (ms) of Edge-based Digital Twin
Framework
Figure 4: Latency Comparison of Cloud-based and Edge-based Digital Twin Framework.
amining of the results, it is seen that the scores ob-
tained by over sampling are generally better than un-
der sampling.
When the results we obtained in the study are
compared with the results of CardioTwin (Figure 3),
our models are more successful in terms of accuracy,
recall and F-score. On the other hand, CardioTwin
reach higher rate in terms of precision. However, re-
call rate is vital in healthcare studies. Therefore, we
tuned the models according to recall metric.
5.2 Digital Twin Frameworks
In this study, edge-based and cloud-based DT frame-
works are presented and compared in terms of scala-
bility, flexibility, latency and security. The incredible
increase in IoT devices and generated data has made
scalability an important criterion. Adding new nodes
in edge-based DT is considerably better than cloud-
based DT in terms of scalability as it has little effect
on system latency performance (Khan et al., 2022).
In order to keep up with the diversity of digitaliza-
tion and IoT devices in every field, systems must meet
very high requirements. DTs implemented in Edge
meet this flexibility requirement quite well compared
to cloud-based DTs (Khan et al., 2022). The latency is
a major issue in scenarios where results are required
in real time. To produce results on cloud-based DT,
the data generated at the edge must be transferred to
the cloud environment. The latency problem is mini-
mized as the ML model on edge-based DT produces
results where the data is generated. The results of our
study support this, according to the results (Figure 4),
it was observed that the latency of edge-based DT (av-
A Comparative Study on Cloud-based and Edge-Based Digital Twin Frameworks for Prediction of Cardiovascular Disease
167
erage 10.6 ms, Figure 4b) is lower than cloud-based
DT (average 18.8 ms, Figure 4a). Security is impor-
tant to provide secure and reliable services to users,
so it is a metric we should consider in the systems
we develop (Asim et al., 2020). Edge-based systems
are more secure than cloud-based systems because of
their decentralized architecture, whereas cloud-based
systems are more vulnerable to attacks because they
transmit long distances between users and the cloud
(Asim et al., 2020).
6 CONCLUSION
The concept of Industry 4.0, which combines the do-
mains of Informatics and Industry, has spread from
the industrial sector to all other sectors. IoT, 5G and
6G networks, cloud and edge computing, big data, AI
and DT technologies are at the center of these devel-
opments.
In this paper, we discussed the comparison of
cloud-based DT and edge-based DT via a case study.
In this study, we built ML models on PTBD health-
care dataset to predict human heart diseases in real
time and thus to apply quick treatments. In this
context, we performed preprocessing stages such as
cleaning the signal data, denoising, smoothing, peak
extraction, eliminate the NaN values, imputer for
missing values and feature engineering stages such
as feature binning, feature selection, sampling. Even
though in Cardio Twin (Martinez-Velazquez et al.,
2019) paper was obtained the highest precision rate,
we tuning it based on the recall metric because TP
value is more vital in detecting diseases in the health
field. Thus, we outperformed better in terms of recall,
F-score and accuracy.
A major future step of this study is to apply a so-
lution to the data security and privacy concern, which
is frequently encountered in health studies, by com-
bining cloud computing, edge computing, federated
learning and DT technologies. In addition, our future
work also will include on trying different ML mod-
els with the new feature dataset containing the clini-
cal findings of the patients and validating models with
different health dataset. Additionally, a case study in-
cludes data which is collected from sensors can be
added for the real world usage experiment.
REFERENCES
Asim, M., Wang, Y., Wang, K., and Huang, P.-Q. (2020).
A review on computational intelligence techniques
in cloud and edge computing. IEEE Transactions
on Emerging Topics in Computational Intelligence,
4(6):742–763.
Bellavista, P., Giannelli, C., Mamei, M., Mendula, M., and
Picone, M. (2021). Application-driven network-aware
digital twin management in industrial edge environ-
ments. IEEE Transactions on Industrial Informatics,
17(11):7791–7801.
Bj
¨
ornsson, B., Borrebaeck, C., Elander, N., Gasslander, T.,
Gawel, D. R., Gustafsson, M., J
¨
ornsten, R., Lee, E. J.,
Li, X., Lilja, S., et al. (2020). Digital twins to person-
alize medicine. Genome medicine, 12(1):1–4.
Bousseljot, R., Kreiseler, D., and Schnabel, A. (1995).
Nutzung der ekg-signaldatenbank cardiodat der ptb
¨
uber das internet.
Dritsas., E., Alexiou., S., and Moustakas., K. (2022).
Cardiovascular disease risk prediction with super-
vised machine learning techniques. In Proceedings
of the 8th International Conference on Information
and Communication Technologies for Ageing Well
and e-Health - ICT4AWE,, pages 315–321. INSTICC,
SciTePress.
Friedman, J. H. (2001). Greedy function approximation: a
gradient boosting machine. Annals of statistics, pages
1189–1232.
Fuller, A., Fan, Z., Day, C., and Barlow, C. (2020). Digi-
tal twin: Enabling technologies, challenges and open
research. IEEE access, 8:108952–108971.
Ge, X., Zhou, R., and Li, Q. (2019). 5g nfv-based tactile
internet for mission-critical iot services. IEEE Internet
of Things Journal, 7(7):6150–6163.
Githens, G. (2007). Product lifecycle management: driv-
ing the next generation of lean thinking by michael
grieves.
Glatt, M., K
¨
olsch, P., Siedler, C., Langlotz, P., Ehmsen, S.,
and Aurich, J. C. (2021). Edge-based digital twin to
trace and ensure sustainability in cross-company pro-
duction networks. Procedia CIRP, 98:276–281.
Goldberger, A. L., Amaral, L. A., Glass, L., Hausdorff,
J. M., Ivanov, P. C., Mark, R. G., Mietus, J. E., Moody,
G. B., Peng, C.-K., and Stanley, H. E. (2000). Phys-
iobank, physiotoolkit, and physionet: components of
a new research resource for complex physiologic sig-
nals. circulation, 101(23):e215–e220.
Grieves, M. (2002). Plm initiatives [powerpoint slides]. In
Product Lifecycle Management Special Meeting.
Grieves, M. W. (2005). Product lifecycle management: the
new paradigm for enterprises. International Journal
of Product Development, 2(1-2):71–84.
Grieves, M. W. (2019). Virtually intelligent product sys-
tems: digital and physical twins.
Halepmollası, R., Zeybel, M., Eyvaz, E., Arkan, R., Genc,
A., Bilgen, I., and Haklidir, M. (2021). Towards fed-
erated learning in identification of medical images: A
case study. Artificial Intelligence Theory and Appli-
cation, Volume: 1:30–39.
Hassani, H., Huang, X., and MacFeely, S. (2022). Impactful
digital twin in the healthcare revolution. Big Data and
Cognitive Computing, 6(3):83.
Huang., J., Chang., L., and Lin., H. (2021). Implementation
of iot, wearable devices, google assistant and google
ICT4AWE 2023 - 9th International Conference on Information and Communication Technologies for Ageing Well and e-Health
168
cloud platform for elderly home care system. In Pro-
ceedings of the 7th International Conference on Infor-
mation and Communication Technologies for Ageing
Well and e-Health - ICT4AWE,, pages 203–212. IN-
STICC, SciTePress.
Jimenez, J. I., Jahankhani, H., and Kendzierskyj, S. (2020).
Health care in the cyberspace: Medical cyber-physical
system and digital twin challenges. In Digital twin
technologies and smart cities, pages 79–92. Springer.
Kalis, B., McHugh, J., Safavi, K., and Truscott, A. (2022).
Accenture digital health technology vision 2022.
Kamel Boulos, M. N. and Zhang, P. (2021). Digital twins:
from personalised medicine to precision public health.
Journal of Personalized Medicine, 11(8):745.
Karakra, A., Fontanili, F., Lamine, E., and Lamothe, J.
(2019). Hospit’win: a predictive simulation-based
digital twin for patients pathways in hospital. In 2019
IEEE EMBS international conference on biomedical
& health informatics (BHI), pages 1–4. IEEE.
Khan, L. U., Saad, W., Niyato, D., Han, Z., and Hong, C. S.
(2022). Digital-twin-enabled 6g: Vision, architectural
trends, and future directions. IEEE Communications
Magazine, 60(1):74–80.
Kher, R. et al. (2019). Signal processing techniques for re-
moving noise from ecg signals. J. Biomed. Eng. Res,
3(101):1–9.
Liu, Y., Zhang, L., Yang, Y., Zhou, L., Ren, L., Wang, F.,
Liu, R., Pang, Z., and Deen, M. J. (2019). A novel
cloud-based framework for the elderly healthcare ser-
vices using digital twin. IEEE access, 7:49088–
49101.
Luo, J., Ying, K., and Bai, J. (2005). Savitzky–golay
smoothing and differentiation filter for even number
data. Signal processing, 85(7):1429–1434.
Martinez-Velazquez, R., Gamez, R., and El Saddik, A.
(2019). Cardio twin: A digital twin of the human heart
running on the edge. In 2019 IEEE International Sym-
posium on Medical Measurements and Applications
(MeMeA), pages 1–6. IEEE.
McCullagh, P. and Nelder, J. A. (2019). Generalized linear
models. Routledge.
Murphy, K. P. (2018). Machine learning: A probabilistic
perspective (adaptive computation and machine learn-
ing series).
Piascik, B., Vickers, J., Lowry, D., Scotti, S., Stewart, J.,
and Calomino, A. (2012). Materials, structures, me-
chanical systems, and manufacturing roadmap. NASA
TA, pages 12–2.
Rachburee, N. and Punlumjeak, W. (2015). A comparison
of feature selection approach between greedy, ig-ratio,
chi-square, and mrmr in educational mining. In 2015
7th international conference on information technol-
ogy and electrical engineering (ICITEE), pages 420–
424. IEEE.
Scharff, S. (2018). From digital twin to improved patient
experience.
Shengli, W. (2021). Is human digital twin possible? Com-
puter Methods and Programs in Biomedicine Update,
1:100014.
Singh, M., Srivastava, R., Fuenmayor, E., Kuts, V., Qiao,
Y., Murray, N., and Devine, D. (2022). Applications
of digital twin across industries: A review. Applied
Sciences, 12(11):5727.
Tao, F., Xiao, B., Qi, Q., Cheng, J., and Ji, P. (2022). Digital
twin modeling. Journal of Manufacturing Systems,
64:372–389.
Turlapati, V. P. K. and Prusty, M. R. (2020). Outlier-smote:
A refined oversampling technique for improved de-
tection of covid-19. Intelligence-based medicine,
3:100023.
Tyagi, S., Agarwal, A., and Maheshwari, P. (2016). A
conceptual framework for iot-based healthcare sys-
tem using cloud computing. In 2016 6th International
Conference-Cloud System and Big Data Engineering
(Confluence), pages 503–507. IEEE.
Wang, Y., Xu, R., Zhou, C., Kang, X., and Chen, Z. (2022).
Digital twin and cloud-side-end collaboration for in-
telligent battery management system. Journal of Man-
ufacturing Systems, 62:124–134.
Wang, Z., Liao, X., Zhao, X., Han, K., Tiwari, P., Barth,
M. J., and Wu, G. (2020). A digital twin paradigm:
Vehicle-to-cloud based advanced driver assistance
systems. In 2020 IEEE 91st Vehicular Technology
Conference (VTC2020-Spring), pages 1–6. IEEE.
WHO (2020). The top 10 causes of death.
A Comparative Study on Cloud-based and Edge-Based Digital Twin Frameworks for Prediction of Cardiovascular Disease
169