Design of an Iterative Method for Deep Multimodal Feature Fusion in
Heart Disease Diagnostics Utilizing Explainable AI
Sony K. Ahuja
a
, Deepti D. Shrimankar
b
and Aditi R. Durge
c
Visvesvaraya National Institute of Technology, Nagpur, India
Keywords: Multimodal Integration, Heart Disease Diagnostics, Explainable AI, Federated Learning,
Continual Learning Process.
Abstract: This research addresses the critical need for advanced diagnostic methodologies in heart disease, a leading
cause of mortality worldwide. Traditional diagnostic models, which often analyze genomic, clinical, and
medical imaging data in isolation, fall short in providing a holistic understanding of the disease due to their
fragmented approach. Such methods also grapple with significant challenges including data privacy concerns,
lack of interpretability, and an inability to adapt to the continuously evolving landscape of medical data
samples. In response, this study introduces an innovative approach known as Deep Multimodal Feature Fusion,
designed to integrate genomic data, clinical history, and medical imaging into a cohesive analysis framework.
This method leverages the unique strengths of each data modality, offering a more comprehensive patient
profile than traditional, one-dimensional analyses. The integration of Explainable Artificial Intelligence with
Clinical Data Interpretation enhances model transparency and interpretability, crucial for healthcare
applications. The use of Transfer Learning with Pre-trained Models on medical imaging data and Continual
Learning for Adaptive Genomics ensures diagnostic accuracy and model adaptability over temporal instance
sets. Federated Learning for Privacy-Preserving Analysis is employed to address data privacy, allowing for
collaborative model training without compromising patient confidentiality. Testing across diverse datasets
demonstrated substantial improvements in diagnostic Precision, Accuracy, Recall, and other metrics,
indicating a major advancement over existing methods. Practically, it exemplifies the application of advanced
AI techniques in clinical settings, narrowing the gap between theoretical research and practical healthcare
solutions.
1 INTRODUCTION
The domain of cardiovascular diagnostics stands at a
pivotal juncture, challenged by the complexities
inherent in heart disease—the leading cause of
mortality globally(Ullah et al., 2023). Traditional
diagnostic paradigms have relied on siloed analyses
of genomic data, clinical records, and medical
imaging. This fragmented approach, while
contributing valuable insights individually, often fails
to capture the intricate, multifaceted nature of heart
disease. Recognizing this gap, the advent of
integrative multimodal genomics heralds a
transformative shift, aiming to synthesize diverse data
modalities for a comprehensive understanding of
a
https://orcid.org/0009-0000-9762-8686
b
https://orcid.org/0000-0002-6212-0986
c
https://orcid.org/0000-0002-9733-9706
heart disease(Arneson et al., 2017; A. Durge &
Shrimankar, 2023).
The necessity for an integrative approach stems
from the nuanced interaction between genetic
predispositions and environmental or lifestyle factors
in the manifestation of heart disease. Genomic data,
for instance, provides insights into hereditary risks,
whereas clinical histories and medical imaging (such
as MRI and CT scans) offer context on the disease's
progression and anatomical impact. However, the
integration of these data streams presents significant
challenges, including but not limited to data privacy
concerns, the interpretability of complex models, and
the adaptability of diagnostic tools to evolving
datasets(Said et al., 2019; Usova et al., 2021).
Ahuja, S., Shrimankar, D. and Durge, A.
Design of an Iterative Method for Deep Multimodal Feature Fusion in Heart Disease Diagnostics Utilizing Explainable AI.
DOI: 10.5220/0012899400003886
Paper published under CC license (CC BY-NC-ND 4.0)
In Proceedings of the 1st International Conference on Explainable AI for Neural and Symbolic Methods (EXPLAINS 2024), pages 87-95
ISBN: 978-989-758-720-7
Proceedings Copyright © 2024 by SCITEPRESS Science and Technology Publications, Lda.
87
This research introduces the Design of an Iterative
Method for Deep Multimodal Feature Fusion
(DMFF), an innovative framework that leverages the
strengths of genomic data, clinical histories, and
medical imaging to forge a comprehensive patient
analysis. This fusion goes beyond mere aggregation,
employing sophisticated algorithms to extract and
harmonize features from each modality, thereby
providing a holistic patient profile that significantly
enhances diagnostic accuracy.
Central to enhancing the DMFF model's utility is
the incorporation of Explainable Artificial
Intelligence (XAI) (Amann et al., 2022) with Clinical
Data Interpretation (CDI), which ensures that the
diagnostic process is transparent and interpretable.
This integration is crucial in healthcare, where the
rationale behind diagnostic decisions must be
understandable to clinicians and patients alike.
Moreover, the application of Transfer Learning with
Pre-trained Models (TLP) specifically to medical
imaging data like echocardiograms exemplifies the
method's innovative use of existing Artificial
Intelligence (AI) resources to improve diagnostic
precision.
Addressing the dynamic nature of genomic and
clinical data, the research introduces Continual
Learning for Adaptive Genomics (CLAG), a method
ensuring that the diagnostic model remains accurate
and relevant over time by adapting to new data
samples. In parallel, Federated Learning for Privacy-
Preserving Analysis (FLPPA) offers a solution to data
privacy concerns, enabling collaborative model
training across institutions without compromising
patient confidentiality(A. R. Durge & Shrimankar,
2024; Loftus et al., 2022). The iterative design of
DMFF not only addresses the limitations of
traditional diagnostic models but also paves the way
for precision medicine, where personalized treatment
strategies are informed by a deep, multidimensional
understanding of heart disease.
2 REVIEW OF EXISTING
MODELS FOR GENOMIC
ANALYSIS
The exponentially expanding field of heart disease
diagnostics and treatment has witnessed an
unprecedented integration of genomic data, machine
learning algorithms, and imaging techniques. The
exploration of genetic predispositions, alongside
environmental and lifestyle factors, has become
central to understanding and combating this leading
cause of mortality worldwide(Ahuja et al., 2023; A.
R. Durge et al., 2022). Despite remarkable progress,
existing methodologies often grapple with challenges
such as data integration, interpretability, privacy, and
adaptability to new data samples. This landscape
presents fertile ground for innovative approaches that
leverage multimodal data to provide a holistic
understanding of heart disease, thus guiding the
motivation behind the current research.
Recent studies in biomedical and machine
learning fields have provided significant insights into
disease classification, genetic analysis, and novel
modeling techniques. For instance, (Manduchi et al.,
2022) utilized a tree-based automated machine
learning approach with biology-based feature
selection to investigate the genetic factors
contributing to coronary artery disease. This study
advanced the understanding of the genetic basis of the
disease, although its focus on coronary artery disease
may have overlooked broader cardiovascular
conditions. In a different study, (Zheng et al., 2022)
applied a graph-transformer method to classify
whole-slide images, particularly in lung cancer
pathology. (Xu et al., 2022) introduced an innovative
tissue engineering technique by generating heart
microtissues in a Möbius strip configuration. This
novel approach showed promise for advanced disease
modeling.
Further studies have explored genetic and
phenotypic aspects of heart diseases. (Soibam, 2022)
identified super-enhancers and long noncoding RNAs
(lncRNAs) during mouse heart development,
contributing to our understanding of heart
development and disease. However, this research is
based on mouse models, and its implications for
human health need further validation. (Yu et al.,
2022) used machine learning to analyse electronic
health records (EHR) and genetic data, predicting
heart failure risk in cancer patients with high
accuracy. While effective, this study is limited to the
cancer patient population, leaving its broader
applicability unexamined. (Wang et al., 2022)
explored the genetic correlation between coronary
heart disease and electrocardiogram (ECG) traits,
suggesting a genetic causality.
Advanced machine learning techniques have also
been applied in classification tasks related to heart
disease. (Bao et al., 2023) utilized diffusion-based
synthetic image augmentation to improve the
classification of rare heart transplant rejection events,
demonstrating enhanced sensitivity in identifying
such rejections. Despite its success, the study focuses
solely on heart transplant rejection, with no
discussion of broader applications. Similarly, (Jose
EXPLAINS 2024 - 1st International Conference on Explainable AI for Neural and Symbolic Methods
88
Triny et al., 2023) optimized biomarkers for cancer
prognosis using microarray-based genomic analysis.
While this research enhanced the accuracy of disease
prediction and severity analysis, its relevance to heart
disease remains uncertain due to its cancer-specific
focus.
The literature reviewed provides a foundational
understanding of current methodologies and their
limitations, offering a backdrop against which the
contributions of this research are highlighted. This
research not only addresses the critical challenges
identified in the literature but also pioneers a path
towards personalized, precise, and privacy-
preserving diagnostics in heart disease.
3 DESIGN OF AN ITERATIVE
METHOD FOR DMFF IN
HEART DISEASE
DIAGNOSTICS UTILIZING
EXPLAINABLE AI
To overcome issues of low efficiency & high
complexity, the proposed model uses integration of
pre-trained U-Net for segmentation coupled with
VGG19 for classification process. This delineates a
novel approach towards diagnosing heart disease
types using MRI and CT scans. The U-Net
architecture, initially devised for biomedical image
segmentation, operates on the principle of a
convolutional network that is symmetric, facilitating
precise localization and the use of context in the
segmentation process. This is augmented by the
incorporation of a VGG19 model, renowned for its
depth and simplicity, primarily comprising 3x3
convolutional layers stacked in increasing depth,
culminating in three fully connected layers for
classification.
As per figure 1, the segmentation process
begins with the U-Net model, which employs a series
of convolutional operations to extract features from
input images. Let I represent the input image and Fl
the feature map at layer l, the operation within each
convolutional layer is mathematically represented
via equation 1,
𝐹𝑙 = 𝜎
(
𝑊𝑙 ∗ 𝐹(𝑙 − 1) + 𝑏𝑙
)
(1)
Where, Wl and bl are the weights and biases at layer
l, σ is the ReLU activation function, and represents
the convolution operation. U-Net's architecture
allows for the capture of context and fine-grained
details through its contracting and expansive paths,
respectively. The contracting path follows the typical
architecture of a convolutional network, involving
successive convolution and pooling operations,
thereby compressing the input image into a feature-
rich representation for this process.
Figure 1: Model Architecture of the Proposed Privacy
Inspired Classification Process.
The expansive path then employs transposed
convolutions to project these features back onto the
pixel space, aiming to reconstruct the segmentation
map corresponding to the input image samples. This
process is encapsulated via equation 2,
𝐹𝑙
=𝜎
(
𝑊𝑙
⋅𝐹
(
𝑙
+1
))
(2)
Where, Fland Wlare the feature maps and weights
in the expansive path, respectively, and signifies the
transposed convolution operation. Transitioning to
the classification phase, the segmented images are
then processed through the VGG19 model, which
comprises multiple convolutional layers followed by
fully connected layers. The initial layers of VGG19
are designed to capture image features such as edges
and textures, which are then progressively combined
into more complex patterns in subsequent layers. The
final classification is achieved through the dense
layers of the network, where the feature
representation FL obtained from the last
convolutional layer is transformed into class
probabilities via equation 3,
Design of an Iterative Method for Deep Multimodal Feature Fusion in Heart Disease Diagnostics Utilizing Explainable AI
89
𝑃
(
𝐶
𝐹𝐿
)
=


⋅

(

⋅

)
(3)
Where, P(CFL′) represents the probability of class C
given the features FL′, Wc and bc are the weights and
biases corresponding to class C, and the summation
in the denominator extends over all possible classes.
The synergy between U-Net's segmentation
prowess and VGG19's classification capabilities
enables the precise delineation and categorization of
heart disease types from MRI and CT scans. Through
the sequential application of these models, the
methodology not only ensures the accurate
segmentation of heart structures but also leverages
deep learning's feature extraction capabilities to
classify the segmented images into specific heart
disease types. This dual-stage process, encapsulated
by the seamless integration of segmentation and
classification operations, represents a comprehensive
approach to diagnosing heart diseases, embodying a
significant advancement in the application of deep
learning techniques within the realm of medical
imaging operations.
Next, the model aims to classify genomic scans,
specifically mRNA sequences, into disease types
employs a sophisticated process combining Single
Nucleotide Variant (SNV) analysis with a 1D
Convolutional Neural Network (CNN) comprising 20
layers. This methodological framework is pivotal for
deciphering the complex genomic underpinnings of
heart diseases, leveraging the granular specificity of
SNVs mutations that occur at a single nucleotide
position in the genome, which is instrumental in
disease classification operations.
The classification process initiates with the
extraction of SNVs from the genomic samples. Let S
represent a sequence of nucleotides, where S={s1,s2
,...,sn} and each si represents a nucleotide (A, C, G,
or T) for different use cases. The detection of SNVs
within these sequences is formalized as identifying
positions i where 𝑠𝑖 𝑠𝑖′, with si representing the
corresponding nucleotide in a reference sequence
process. This comparison yields a binary sequence
B={b1,b2,...,bn}, where bi=1 if an SNV is detected at
position i and bi=0 otherwise.
Following SNV extraction, the binary sequence B
is input into the 1D CNN, which is designed to
capture and learn patterns associated with specific
heart disease types. The architecture of the 1D CNN
is composed of convolutional layers that perform
feature extraction, followed by pooling layers that
reduce dimensionality, and fully connected layers that
accomplish the classification task. The convolutional
operation in the k
-th layer is mathematically via
equation 4,
𝐹𝑘 = 𝜎
(
𝑊𝑘 ∗ 𝐵 + 𝑏𝑘
)
(4)
Where, Fk represents the feature map produced by
layer k, Wk and bk are the layer's weights and biases,
respectively, σ is the ReLU activation function, and
symbolizes the convolution operation.
The depth of the network, with its 20 layers,
allows for the extraction of increasingly abstract
features from the input sequence. In this context, the
depth encompasses multiple convolutional layers,
each followed by an activation function, defined via
equation 5,
𝜎
(
𝑥
)
=𝑚𝑎𝑥
(
0, 𝑥
)
(5)
This enables the model to introduce non-linearity
into the model. Pooling layers interspersed among the
convolutional layers serve to reduce the spatial size
of the representation, thereby decreasing the number
of parameters and computation in the network. The
max pooling is represented via equation 6,
𝑃𝑘 = 𝑚𝑎𝑥
(
𝐹(𝑘,𝑖:𝑖 + 𝑝)
)
(6)
Where, Pk is the pooled feature map, 𝐹(𝑘,𝑖:𝑖 + 𝑝)
represents a segment of the feature map Fk, and p is
the pooling size for this process.
The culmination of the convolutional and pooling
layers is followed by one or more fully connected
layers, which integrate the high-level features
extracted by the preceding layers for the purpose of
classification. The operation in a fully connected
layer is expressed via equation 7,
𝐶𝑗 = 𝜎
(
𝑊𝑗 ⋅ 𝐹𝐿 + 𝑏𝑗
)
(7)
Where, Cj represents the output of the j-th fully
connected layer, FL is the flattened feature map from
the last convolutional or pooling layer, and Wj and bj
are the weights and biases of the fully connected
layers.
Finally, the classification output is generated
through a softmax layer, which converts the logits
from the fully connected layer into probabilities for
each heart disease type. The softmax function is
defined via equation 8,
𝑃
(
𝑐𝑖
𝐶
)
=

∑
(8)
Where, P(ciC) represents the probability of class ci
given the output vector C, and Ci is the logit
corresponding to class ci sets. Through this intricate
process of SNV extraction and subsequent pattern
learning via a deep 1D CNN, the proposed
methodology adeptly classifies genomic samples into
specific heart disease types. The combination of
EXPLAINS 2024 - 1st International Conference on Explainable AI for Neural and Symbolic Methods
90
precise genetic mutation identification with advanced
feature extraction and classification techniques
represents a significant leap forward in the domain of
genomic-based heart disease diagnostics, offering a
nuanced and highly effective tool for understanding
and combating these conditions.
Next, the integration of XAI with CDI represents
a paradigm shift, enhancing the transparency and
interpretability of complex models used for analyzing
classified scans and genomic samples. A cornerstone
of this integration is the application of Gradient-
weighted Class Activation Mapping (GradCAM), a
technique that provides visual explanations for
decisions made by CNNs, thereby elucidating the
model's focus on specific features within the input
data that influenced its predictions (Selvaraju et al.,
2020).The GradCAM process commences by
identifying the feature maps generated by the final
convolutional layer of the CNN, which are
instrumental in the classification decision. Let Ak
represent the k-th feature map in the final
convolutional layer, where k ranges from 1 to K, with
K being the total number of feature maps in that layer.
The importance of each feature map Ak towards a
specific class c is determined by calculating the
gradient of the score for class c (represented as yc)
with respect to the feature map activations. This
gradient, averaged across the width and height
dimensions (indexed by i and j, respectively), yields
the neuron importance weights αkc, which is formally
expressed via equation 9
𝛼𝑘𝑐 =
∑∑


(9)
Where, Z represents the total number of units in the
feature map, and


signifies the gradient of the
class score with respect to each unit of the feature
maps. The next step involves computing the weighted
combination of forward activation maps, followed by
a ReLU function to obtain the GradCAM heatmap,
LGradCAMc. This heatmap highlights the regions of
the input image most influential for the model’s
prediction of class c. Mathematically, LGradCAMc is
derived via equation 10,
𝐿𝐺𝑟𝑎𝑑𝐶𝐴𝑀𝑐 = 𝑅𝑒𝐿𝑈
(
∑𝛼𝑘𝑐 ∗ 𝐴𝑘
)
(10)
The application of the ReLU function ensures that
only features with a positive influence on the class of
interest are visualized, thereby focusing on the
regions of the input that contribute most significantly
to the model's predictions. For genomic samples, the
interpretation via GradCAM adapts to the 1D nature
of the data samples. Although originally designed for
2D images, the core principle of highlighting
influential regions is applied to genomic sequences by
visualizing the segments of the sequence that led to
specific classifications. This requires adjusting the
GradCAM process to handle 1D convolutional
outputs, yet the foundational equations remain
applicable, demonstrating the method's versatility for
clinical use cases. The outcome of the GradCAM
process is a set of visual heatmaps that is
superimposed on the original medical scans or
genomic sequences, providing clinicians and
researchers with intuitive visual cues about the
regions or segments most critical to the model’s
diagnostic decisions. Furthermore, by revealing the
model's focus areas, GradCAM facilitates the
identification of potential biases or errors in the
model's reasoning, enabling continuous refinement
and improvement of the diagnostic tool.
The model initiates an FLPPA Mechanism, which
stands at the forefront of innovative methodologies
designed to safeguard patient confidentiality while
facilitating collaborative model training across
disparate healthcare entities. The essence of FLPPA
is to decentralize the learning process, thereby
ensuring that sensitive patient data remains within the
confines of its origin, such as a hospital or a clinical
laboratory, while still contributing to the collective
intelligence of a global model. At the core of the
FLPPA process, the interaction between the local and
global models is governed by a series of mathematical
operations designed to optimize model performance
while preserving privacy. Let Mg represent the global
model, and Mli represent the local model associated
with the i-th participant. The global model is
initialized and disseminated to all participants, who
then adapted this model based on their local datasets,
Di, through a series of training epochs. The update
from each local model, ΔMli, is represented by the
difference between the parameters of the locally
updated model and the initial global model
parameters, formalized via equation 11,
ΔMli = Mlinew Mg (11)
Each local model's update is then securely
transmitted to a central server, where an aggregation
algorithm, typically Federated Averaging (FedAvg),
is employed to update the global model. The
aggregation process is mathematically expressed via
equation 12,
𝑀𝑔𝑛𝑒𝑤 = 𝑀𝑔 + 𝜂

𝛥𝑀𝑙𝑖

(12)
Where, N is the total number of participants, ni is
the size of the i-th local dataset, and η is a learning
rate parameter that influences the extent to which
local updates affect the global model.
Design of an Iterative Method for Deep Multimodal Feature Fusion in Heart Disease Diagnostics Utilizing Explainable AI
91
To further enhance privacy, Differential Privacy
(DP) techniques are integrated into FLPPA,
introducing stochastic noise to the model updates
before aggregation. This is represented via equation
13,
𝑀𝑙𝑖𝐷𝑃 = 𝛥𝑀𝑙𝑖 + 𝑁
(
0, 𝜎
𝐼
)
(13)
Where, N(0,σ
2
I) represents the addition of noise
drawn from a Gaussian distribution with mean 0 and
variance σ
2
, and I is the identity matrix corresponding
to the dimensions of the model parameters.
The iterative nature of FLPPA allows for
continuous refinement of the global model, with each
cycle of local training and aggregation bringing the
model closer to optimal performance. The
convergence of the global model, Mg, is evaluated
through a loss function, L(Mg, D), where D
represents the aggregated dataset from all
participants. The objective is to minimize this loss
function, which is represented via equation 14,
𝑚𝑖𝑛𝐿
(
𝑀𝑔, 𝐷
)
=
𝐿
(
𝑀𝑔, 𝐷𝑖
)

(14)
The security and privacy of the FLPPA process
are bolstered by encryption protocols during the
transmission of model updates, ensuring that data
remains confidential and secure against potential
breaches. Encryption is modeled as a function
EMli), where the encrypted update is decrypted by
the central server before aggregation. Upon receiving
the aggregated and enhanced global model,
participants apply the XAI results to this model to
generate privacy-preserved insights for clinical
inference operations. This application involves
mapping the GradCAM interpretability layer onto the
global model to elucidate decision- making processes
without exposing sensitive data samples. The final
output, privacy-preserved results, embodies the
culmination of collaborative learning and
interpretability, ensuring that stakeholders can glean
actionable insights with the assurance of patient data
privacy.
4 RESULT ANALYSIS
The experimental setup for validating the proposed
method was meticulously designed to assess its
performance comprehensively across multiple
datasets and samples. This section outlines the key
components of the experimental setup, including the
datasets utilized and training parameters.
4.1 Datasets
The experimental evaluation was conducted on three
diverse datasets, each representing a distinct aspect of
heart disease:
Genomic Dataset (https://cardiodb.org/):
This dataset comprises mRNA expression profiles
obtained from a cohort of patients diagnosed with
various forms of heart disease. It includes genomic
features such as gene expression levels, SNVs, and
gene-disease associations.
Clinical Dataset (https://www.kaggle.com/datasets
/sulianova/cardiovascular-disease-dataset):
The clinical dataset consists of patient demographic
information, medical history, and diagnostic records
obtained from electronic health records (EHRs) and
clinical databases. This dataset provides crucial
context and patient-specific information for
enhancing diagnostic accuracy.
Imaging Dataset(https://www.kaggle.com/datasets/
rahimanshu/cardiomegaly-disease-prediction-
using-cnn):
The imaging dataset contains a collection of MRI and
CT scans of patients' hearts, capturing detailed
structural and anatomical information. These imaging
modalities offer valuable insights into cardiac
morphology and pathology.
4.2 Training Parameters
The model was trained using the following
parameters:
Batch Size: 32
Learning Rate: 0.001
Optimizer: Adam optimizer with momentum (β1 =
0.9, β2 = 0.999)
Loss Function: Binary cross-entropy loss for
classification tasks
Regularization: L2 regularization (weight decay =
0.001) to prevent overfitting.
4.3 Used Values
Genomic Dataset: 10,000 samples with 20,000 gene
expression features.
Clinical Dataset: 5,000 patient records with
demographic information and medical history.
Imaging Dataset: 2,000 MRI and CT scans with
256x256 pixel resolution.
The paper addresses the problem of disjoint
datasets about different patients through an
innovative data fusion process. In this regard, the
EXPLAINS 2024 - 1st International Conference on Explainable AI for Neural and Symbolic Methods
92
DMFF-XAI model presented here comes up with a
multi-modal integration framework that can handle
and align the heterogeneous datasets even in the
scenario when the data source is multiple.
In this results section of the study, the
performance of the proposed model DMFF-XAI is
meticulously compared against three established
methods: Tree based Pipeline Optimization Tool
(TPOT) (Manduchi et al., 2022), Empirical Fuzzy
Multiobjective Multifactor Dimensionality
Reduction (EFMOMDR) and Graph Transformer
(GT) (Zheng et al., 2022). The comparative analysis
is encapsulated in four figures, each elucidating
different facets of performance metrics including
Accuracy, Precision, Recall, Specificity, AUC and
Computational efficiency.
Figure 2 showcases the superior Accuracy and
Precision of DMFF-XAI over the referenced
methods. The enhanced Accuracy (94.5%) and
Precision (93.8%) of DMFF-XAI underscore its
efficacy in correctly identifying and categorizing
heart disease types, significantly outperforming the
comparative models. This improvement is attributed
to the model's ability to synergistically integrate
multi-modal data, leveraging the strengths of
genomic, clinical, and imaging data to provide a more
nuanced and comprehensive analysis.
Figure 2: Diagnostic Accuracy and Precision.
In figure 3, DMFF-XAI demonstrates a
remarkable Recall (95.2%) and Specificity (94.1%),
indicating not only its proficiency in identifying true
positive cases but also in minimizing false positives.
This is particularly crucial in medical diagnostics,
where the cost of false negatives is high. The DMFF-
XAI's performance in these metrics reflects its
robustness and reliability in clinical settings.
Figure 3: Recall and Specificity.
Figure 4 presents the AUC values, with DMFF-
XAI achieving an AUC of 0.961, surpassing the
comparative methods. This high AUC value implies
that DMFF-XAI possesses a superior ability to
discriminate between the disease classes across all
possible thresholds, highlighting its effectiveness in
varying clinical scenarios
.
Figure 4: Area under the curve(AUC).
Table 1 below evaluates the Computational
efficiency of DMFF-XAI against the referenced
methods. Despite its sophisticated integration of
multi-modal data and the added complexity of
explainable AI components, DMFF-XAI exhibits
competitive training and inference times. The
relatively short training time (6.5 hours) and swift
inference time (0.45 seconds) are indicative of the
model's optimized architecture and algorithmic
efficiencies, making it viable for real-world
applications where time is of the essence.
82
84
86
88
90
92
94
96
Accuracy (%) Precision (%)
Accuracy and Precision
DMFF-XAI TPOT EFMOMDR GT
80
82
84
86
88
90
92
94
96
DMFF-XAI TPOT EFMOMDR GT
Recall and Specificity
Recall (%) Specificity (%)
0,82
0,84
0,86
0,88
0,9
0,92
0,94
0,96
0,98
DMFF-XAI TPOT EFMOMDR GT
AUC
Design of an Iterative Method for Deep Multimodal Feature Fusion in Heart Disease Diagnostics Utilizing Explainable AI
93
Table 1: Computational Efficiency.
Model Training Time
(hrs)
Inference Time
(sec)
DMFF-XAI 6.5 0.45
TPOT 8.2 0.67
EFMOMDR 7.9 0.62
GT 7.4 0.59
The results obtained collectively illustrate the
significant performance enhancements achieved by
DMFF-XAI. Its ability to deliver higher accuracy,
precision, recall, and specificity, alongside
impressive AUC values and computational
efficiency, underscores the model's potential to
revolutionize heart disease diagnostics. These
advancements highlight the critical role of integrating
multimodal data and explainable AI to improve
diagnostic outcomes and patient trust in automated
systems.
4 CONCLUSION AND FUTURE
SCOPE
In conclusion, the research presented herein
introduces a groundbreaking method, DMFF-XAI,
which significantly advances the domain of heart
disease diagnostics. Through a sophisticated
integration of multimodal data—encompassing
genomic information, clinical histories, and medical
imaging—coupled with the transparency afforded by
explainable AI, DMFF-XAI has demonstrated
superior performance across a range of critical
metrics when compared to existing methodologies.
The empirical results underscore the efficacy of
DMFF-XAI, highlighting its enhanced diagnostic
accuracy, precision, recall, specificity, and
computational efficiency. Notably, the model
achieved a diagnostic accuracy of 94.5% and a
precision of 93.8%, outperforming referenced
methods by a substantial margin. Such improvements
are pivotal, particularly in the realm of heart disease,
where early and accurate diagnosis can significantly
influence patient outcomes. The integration of
Explainable AI not only augments the model's
interpretability but also fosters a greater degree of
trust among clinicians and patients alike, ensuring
that the diagnostic process is both transparent and
accountable.
Looking to the future, the scope for extending and
refining DMFF-XAI is vast. One immediate avenue
of exploration is the application of this model to other
complex diseases, where the integration of
multimodal data could unlock new insights and
diagnostic capabilities. Additionally, the potential for
incorporating real-time data, such as from wearable
health devices, into the DMFF-XAI framework could
further enhance its predictive accuracy and utility in
ongoing health monitoring and preventive medicine.
DMFF-XAI sets a new benchmark for future research
and development, promising to revolutionize the
landscape of medical diagnostics and patient care use
cases.
REFERENCES
Ahuja, S. K., Shrimankar, D. D., & Durge, A. R. (2023). A
Study and Analysis of Disease Identification using
Genomic Sequence Processing Models: An Empirical
Review. Current Genomics, 24(4), 207–235.
https://doi.org/10.2174/0113892029269523231101051
455
Amann, J., Vetter, D., Blomberg, S. N., Christensen, H. C.,
Coffee, M., Gerke, S., Gilbert, T. K., Hagendorff, T.,
Holm, S., Livne, M., Spezzatti, A., Strümke, I., Zicari,
R. V., & Madai, V. I. (2022). To explain or not to
explain?—Artificial intelligence explainability in
clinical decision support systems. PLOS Digital Health,
1(2), e0000016. https://doi.org/10.1371/journal.
pdig.0000016
Arneson, D., Shu, L., Tsai, B., Barrere-Cain, R., Sun, C., &
Yang, X. (2017). Multidimensional Integrative
Genomics Approaches to Dissecting Cardiovascular
Disease. Frontiers in Cardiovascular Medicine,
4(February). https://doi.org/10.3389/fcvm.2017.00008
Bao, H., Deng, J., Xing, S., Zhong, Y., Shi, W., Marteau,
B., Das, B., Shehata, B., Deshpande, S., & Wang, M.
D. (2023). Rare Heart Transplant Rejection
Classification Using Diffusion-Based Synthetic Image
Augmentation. BHI 2023 - IEEE-EMBS International
Conference on Biomedical and Health Informatics,
Proceedings. https://doi.org/10.1109/BHI58575.2023.
10313377
Durge, A. R., & Shrimankar, D. D. (2024). DHFS-ECM:
Design of a Dual Heuristic Feature Selection-based
Ensemble Classification Model for the Identification of
Bamboo Species from Genomic Sequences. Current
Genomics, 25. https://doi.org/10.2174/0113892029
268176240125055419
Durge, A. R., Shrimankar, D. D., & Sawarkar, A. D. (2022).
Heuristic Analysis of Genomic Sequence Processing
Models for High Efficiency Prediction: A Statistical
EXPLAINS 2024 - 1st International Conference on Explainable AI for Neural and Symbolic Methods
94
Perspective. Current Genomics, 23(5), 299–317.
https://doi.org/10.2174/1389202923666220927105311
Durge, A., & Shrimankar, D. (2023). MRQPMS: Design of
a Map Reduce Bioinspired Model for Solving Quorum
Planted Motif Search for High-Speed Deployments.
3(Biostec), 123–130. https://doi.org/10.5220/0011616
500003414
Jose Triny, K., Kesavan, S., Navadeepan, H., & Ranjith
Kumar, P. (2023). Microarray based Geonomic
Biomarker Optimization for Cancer Prognosis.
Proceedings of the 2023 2nd International Conference
on Electronics and Renewable Systems, ICEARS 2023,
478–483. https://doi.org/10.1109/ICEARS56392.
2023.10084989
Loftus, T. J., Ruppert, M. M., Shickel, B., Ozrazgat-
Baslanti, T., Balch, J. A., Efron, P. A., Upchurch, G. R.,
Rashidi, P., Tignanelli, C., Bian, J., & Bihorac, A.
(2022). Federated learning for preserving data privacy
in collaborative healthcare research. Digital Health, 8,
1–5. https://doi.org/10.1177/20552076221134455
Manduchi, E., Le, T. T., Fu, W., & Moore, J. H. (2022).
Genetic Analysis of Coronary Artery Disease Using
Tree-Based Automated Machine Learning Informed By
Biology-Based Feature Selection. IEEE/ACM
Transactions on Computational Biology and
Bioinformatics, 19(3), 1379–1386. https://doi.org/10.
1109/TCBB.2021.3099068
Said, M. A., van de Vegte, Y. J., Zafar, M. M., van der
Ende, M. Y., Raja, G. K., Verweij, N., & van der Harst,
P. (2019). Contributions of Interactions Between
Lifestyle and Genetics on Coronary Artery Disease
Risk. Current Cardiology Reports, 21(9), 1–8.
https://doi.org/10.1007/s11886-019-1177-x
Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R.,
Parikh, D., & Batra, D. (2020). Grad-CAM: Visual
Explanations from Deep Networks via Gradient-Based
Localization. International Journal of Computer
Vision, 128(2), 336–359. https://doi.org/10.1007/s
11263-019-01228-7
Soibam, B. (2022). Genome-wide compendium of super-
long noncoding RNAs during mouse heart
development. Proceedings - 2022 IEEE International
Conference on Bioinformatics and Biomedicine, BIBM
2022, 3315–3319. https://doi.org/10.1109/BIBM
55620.2022.9995496
Ullah, A., Kumar, M., Sayyar, M., Sapna, F., John, C.,
Memon, S., Qureshi, K., Agbo, E. C., Ariri, H. I.,
Chukwu, E. J., Varrassi, G., Khatri, M., Kumar, S.,
Elder, N. M., & Mohamad, T. (2023). Revolutionizing
Cardiac Care: A Comprehensive Narrative Review of
Cardiac Rehabilitation and the Evolution of
Cardiovascular Medicine. Cureus, 15(10).
https://doi.org/10.7759/cureus.46469
Usova, E. I., Alieva, A. S., Yakovlev, A. N., Alieva, M. S.,
Prokhorikhin, A. A., Konradi, A. O., Shlyakhto, E. V.,
Magni, P., Catapano, A. L., & Baragetti, A. (2021).
Integrative analysis of multi-omics and genetic
approaches— A new level in atherosclerotic
cardiovascular risk prediction. Biomolecules, 11(11),
1–16. https://doi.org/10.3390/biom11111597
Wang, X., Zhang, H., Xiu, X., Qi, M., Yang, Y., & Zhao,
H. (2022). Genetic and phenotypic relationships
between coronary atherosclerotic heart disease and
electrocardiographic traits. Proceedings - 2022 IEEE
International Conference on Bioinformatics and
Biomedicine, BIBM 2022, 241–246. https://doi.
org/10.1109/BIBM55620.2022.9995557
Xu, Y., Qi, J., Zhou, W., Liu, X., Zhang, L., Yao, X., & Wu,
H. (2022). Generation of ring-shaped human iPSC-
derived functional heart microtissues in a Möbius strip
configuration. Bio-Design and Manufacturing, 5(4),
687–699. https://doi.org/10.1007/s42242-022-00204-4
Yu, Z., Yang, X., Chen, Y., Fang, R., Hogan, W. R., Gong,
Y., & Wu, Y. (2022). Identify Cancer Patients at Risk
for Heart Failure using Electronic Health Record and
Genetic Data. Proceedings - 2022 IEEE 10th
International Conference on Healthcare Informatics,
ICHI 2022, 138–142. https://doi.org/10.1109/
ICHI54592.2022.00032
Zheng, Y., Gindra, R. H., Green, E. J., Burks, E. J., Betke,
M., Beane, J. E., & Kolachalama, V. B. (2022). A
Graph-Transformer for Whole Slide Image
Classification. IEEE Transactions on Medical Imaging,
41(11), 3003–3015. https://doi.org/10.1109/TMI.
2022.3176598.
Design of an Iterative Method for Deep Multimodal Feature Fusion in Heart Disease Diagnostics Utilizing Explainable AI
95