Design of an Iterative Method for Deep Multimodal Feature Fusion in

Heart Disease Diagnostics Utilizing Explainable AI

Sony K. Ahuja

, Deepti D. Shrimankar

and Aditi R. Durge

Visvesvaraya National Institute of Technology, Nagpur, India

Keywords: Multimodal Integration, Heart Disease Diagnostics, Explainable AI, Federated Learning,

Continual Learning Process.

Abstract: This research addresses the critical need for advanced diagnostic methodologies in heart disease, a leading

cause of mortality worldwide. Traditional diagnostic models, which often analyze genomic, clinical, and

medical imaging data in isolation, fall short in providing a holistic understanding of the disease due to their

fragmented approach. Such methods also grapple with significant challenges including data privacy concerns,

lack of interpretability, and an inability to adapt to the continuously evolving landscape of medical data

samples. In response, this study introduces an innovative approach known as Deep Multimodal Feature Fusion,

designed to integrate genomic data, clinical history, and medical imaging into a cohesive analysis framework.

This method leverages the unique strengths of each data modality, offering a more comprehensive patient

profile than traditional, one-dimensional analyses. The integration of Explainable Artificial Intelligence with

Clinical Data Interpretation enhances model transparency and interpretability, crucial for healthcare

applications. The use of Transfer Learning with Pre-trained Models on medical imaging data and Continual

Learning for Adaptive Genomics ensures diagnostic accuracy and model adaptability over temporal instance

sets. Federated Learning for Privacy-Preserving Analysis is employed to address data privacy, allowing for

collaborative model training without compromising patient confidentiality. Testing across diverse datasets

demonstrated substantial improvements in diagnostic Precision, Accuracy, Recall, and other metrics,

indicating a major advancement over existing methods. Practically, it exemplifies the application of advanced

AI techniques in clinical settings, narrowing the gap between theoretical research and practical healthcare

solutions.

1 INTRODUCTION

The domain of cardiovascular diagnostics stands at a

pivotal juncture, challenged by the complexities

inherent in heart disease—the leading cause of

mortality globally(Ullah et al., 2023). Traditional

diagnostic paradigms have relied on siloed analyses

of genomic data, clinical records, and medical

imaging. This fragmented approach, while

contributing valuable insights individually, often fails

to capture the intricate, multifaceted nature of heart

disease. Recognizing this gap, the advent of

integrative multimodal genomics heralds a

transformative shift, aiming to synthesize diverse data

modalities for a comprehensive understanding of

https://orcid.org/0009-0000-9762-8686

https://orcid.org/0000-0002-6212-0986

https://orcid.org/0000-0002-9733-9706

heart disease(Arneson et al., 2017; A. Durge &

Shrimankar, 2023).

The necessity for an integrative approach stems

from the nuanced interaction between genetic

predispositions and environmental or lifestyle factors

in the manifestation of heart disease. Genomic data,

for instance, provides insights into hereditary risks,

whereas clinical histories and medical imaging (such

as MRI and CT scans) offer context on the disease's

progression and anatomical impact. However, the

integration of these data streams presents significant

challenges, including but not limited to data privacy

concerns, the interpretability of complex models, and

the adaptability of diagnostic tools to evolving

datasets(Said et al., 2019; Usova et al., 2021).

Ahuja, S. K., Shrimankar, D. D. and Durge, A. R.

Design of an Iterative Method for Deep Multimodal Feature Fusion in Heart Disease Diagnostics Utilizing Explainable AI.

DOI: 10.5220/0012899400003886

In Proceedings of the 1st International Conference on Explainable AI for Neural and Symbolic Methods (EXPLAINS 2024), pages 87-95

ISBN: 978-989-758-720-7

This research introduces the Design of an Iterative

Method for Deep Multimodal Feature Fusion

(DMFF), an innovative framework that leverages the

strengths of genomic data, clinical histories, and

medical imaging to forge a comprehensive patient

analysis. This fusion goes beyond mere aggregation,

employing sophisticated algorithms to extract and

harmonize features from each modality, thereby

providing a holistic patient profile that significantly

enhances diagnostic accuracy.

Central to enhancing the DMFF model's utility is

the incorporation of Explainable Artificial

Intelligence (XAI) (Amann et al., 2022) with Clinical

Data Interpretation (CDI), which ensures that the

diagnostic process is transparent and interpretable.

This integration is crucial in healthcare, where the

rationale behind diagnostic decisions must be

understandable to clinicians and patients alike.

Moreover, the application of Transfer Learning with

Pre-trained Models (TLP) specifically to medical

imaging data like echocardiograms exemplifies the

method's innovative use of existing Artificial

Intelligence (AI) resources to improve diagnostic

precision.

Addressing the dynamic nature of genomic and

clinical data, the research introduces Continual

Learning for Adaptive Genomics (CLAG), a method

ensuring that the diagnostic model remains accurate

and relevant over time by adapting to new data

samples. In parallel, Federated Learning for Privacy-

Preserving Analysis (FLPPA) offers a solution to data

privacy concerns, enabling collaborative model

training across institutions without compromising

patient confidentiality(A. R. Durge & Shrimankar,

2024; Loftus et al., 2022). The iterative design of

DMFF not only addresses the limitations of

traditional diagnostic models but also paves the way

for precision medicine, where personalized treatment

strategies are informed by a deep, multidimensional

understanding of heart disease.

2 REVIEW OF EXISTING

MODELS FOR GENOMIC

ANALYSIS

The exponentially expanding field of heart disease

diagnostics and treatment has witnessed an

unprecedented integration of genomic data, machine

learning algorithms, and imaging techniques. The

exploration of genetic predispositions, alongside

environmental and lifestyle factors, has become

central to understanding and combating this leading

cause of mortality worldwide(Ahuja et al., 2023; A.

R. Durge et al., 2022). Despite remarkable progress,

existing methodologies often grapple with challenges

such as data integration, interpretability, privacy, and

adaptability to new data samples. This landscape

presents fertile ground for innovative approaches that

leverage multimodal data to provide a holistic

understanding of heart disease, thus guiding the

motivation behind the current research.

Recent studies in biomedical and machine

learning fields have provided significant insights into

disease classification, genetic analysis, and novel

modeling techniques. For instance, (Manduchi et al.,

2022) utilized a tree-based automated machine

learning approach with biology-based feature

selection to investigate the genetic factors

contributing to coronary artery disease. This study

advanced the understanding of the genetic basis of the

disease, although its focus on coronary artery disease

may have overlooked broader cardiovascular

conditions. In a different study, (Zheng et al., 2022)

applied a graph-transformer method to classify

whole-slide images, particularly in lung cancer

pathology. (Xu et al., 2022) introduced an innovative

tissue engineering technique by generating heart

microtissues in a Möbius strip configuration. This

novel approach showed promise for advanced disease

modeling.

Further studies have explored genetic and

phenotypic aspects of heart diseases. (Soibam, 2022)

identified super-enhancers and long noncoding RNAs

(lncRNAs) during mouse heart development,

contributing to our understanding of heart

development and disease. However, this research is

based on mouse models, and its implications for

human health need further validation. (Yu et al.,

2022) used machine learning to analyse electronic

health records (EHR) and genetic data, predicting

heart failure risk in cancer patients with high

accuracy. While effective, this study is limited to the

cancer patient population, leaving its broader

applicability unexamined. (Wang et al., 2022)

explored the genetic correlation between coronary

heart disease and electrocardiogram (ECG) traits,

suggesting a genetic causality.

Advanced machine learning techniques have also

been applied in classification tasks related to heart

disease. (Bao et al., 2023) utilized diffusion-based

synthetic image augmentation to improve the

classification of rare heart transplant rejection events,

demonstrating enhanced sensitivity in identifying

such rejections. Despite its success, the study focuses

solely on heart transplant rejection, with no

discussion of broader applications. Similarly, (Jose

EXPLAINS 2024 - 1st International Conference on Explainable AI for Neural and Symbolic Methods

Triny et al., 2023) optimized biomarkers for cancer

prognosis using microarray-based genomic analysis.

While this research enhanced the accuracy of disease

prediction and severity analysis, its relevance to heart

disease remains uncertain due to its cancer-specific

focus.

The literature reviewed provides a foundational

understanding of current methodologies and their

limitations, offering a backdrop against which the

contributions of this research are highlighted. This

research not only addresses the critical challenges

identified in the literature but also pioneers a path

towards personalized, precise, and privacy-

preserving diagnostics in heart disease.

3 DESIGN OF AN ITERATIVE

METHOD FOR DMFF IN

HEART DISEASE

DIAGNOSTICS UTILIZING

EXPLAINABLE AI

To overcome issues of low efficiency & high

complexity, the proposed model uses integration of

pre-trained U-Net for segmentation coupled with

VGG19 for classification process. This delineates a

novel approach towards diagnosing heart disease

types using MRI and CT scans. The U-Net

architecture, initially devised for biomedical image

segmentation, operates on the principle of a

convolutional network that is symmetric, facilitating

precise localization and the use of context in the

segmentation process. This is augmented by the

incorporation of a VGG19 model, renowned for its

depth and simplicity, primarily comprising 3x3

convolutional layers stacked in increasing depth,

culminating in three fully connected layers for

classification.

As per figure 1, the segmentation process

begins with the U-Net model, which employs a series

of convolutional operations to extract features from

input images. Let I represent the input image and Fl

the feature map at layer l, the operation within each

convolutional layer is mathematically represented

via equation 1,

𝐹𝑙 = 𝜎

(

𝑊𝑙 ∗ 𝐹(𝑙 − 1) + 𝑏𝑙

)

(1)

Where, Wl and bl are the weights and biases at layer

l, σ is the ReLU activation function, and ∗ represents

the convolution operation. U-Net's architecture

allows for the capture of context and fine-grained

details through its contracting and expansive paths,

respectively. The contracting path follows the typical

architecture of a convolutional network, involving

successive convolution and pooling operations,

thereby compressing the input image into a feature-

rich representation for this process.

Figure 1: Model Architecture of the Proposed Privacy

Inspired Classification Process.

The expansive path then employs transposed

convolutions to project these features back onto the

pixel space, aiming to reconstruct the segmentation

map corresponding to the input image samples. This

process is encapsulated via equation 2,

𝐹𝑙



=𝜎

(

𝑊𝑙



⋅𝐹

(

𝑙



))

(2)

Where, Fl′ and Wl′ are the feature maps and weights

in the expansive path, respectively, and ⋅ signifies the

transposed convolution operation. Transitioning to

the classification phase, the segmented images are

then processed through the VGG19 model, which

comprises multiple convolutional layers followed by

fully connected layers. The initial layers of VGG19

are designed to capture image features such as edges

and textures, which are then progressively combined

into more complex patterns in subsequent layers. The

final classification is achieved through the dense

layers of the network, where the feature

representation FL′ obtained from the last

convolutional layer is transformed into class

probabilities via equation 3,

Design of an Iterative Method for Deep Multimodal Feature Fusion in Heart Disease Diagnostics Utilizing Explainable AI

𝑃

(

𝐶

∣

𝐹𝐿



)









⋅



 ∑

(





⋅







)

(3)

Where, P(C∣FL′) represents the probability of class C

given the features FL′, Wc and bc are the weights and

biases corresponding to class C, and the summation

in the denominator extends over all possible classes.

The synergy between U-Net's segmentation

prowess and VGG19's classification capabilities

enables the precise delineation and categorization of

heart disease types from MRI and CT scans. Through

the sequential application of these models, the

methodology not only ensures the accurate

segmentation of heart structures but also leverages

deep learning's feature extraction capabilities to

classify the segmented images into specific heart

disease types. This dual-stage process, encapsulated

by the seamless integration of segmentation and

classification operations, represents a comprehensive

approach to diagnosing heart diseases, embodying a

significant advancement in the application of deep

learning techniques within the realm of medical

imaging operations.

Next, the model aims to classify genomic scans,

specifically mRNA sequences, into disease types

employs a sophisticated process combining Single

Nucleotide Variant (SNV) analysis with a 1D

Convolutional Neural Network (CNN) comprising 20

layers. This methodological framework is pivotal for

deciphering the complex genomic underpinnings of

heart diseases, leveraging the granular specificity of

SNVs mutations that occur at a single nucleotide

position in the genome, which is instrumental in

disease classification operations.

The classification process initiates with the

extraction of SNVs from the genomic samples. Let S

represent a sequence of nucleotides, where S={s1,s2

,...,sn} and each si represents a nucleotide (A, C, G,

or T) for different use cases. The detection of SNVs

within these sequences is formalized as identifying

positions i where 𝑠𝑖 ≠ 𝑠𝑖′, with si′ representing the

corresponding nucleotide in a reference sequence

process. This comparison yields a binary sequence

B={b1,b2,...,bn}, where bi=1 if an SNV is detected at

position i and bi=0 otherwise.

Following SNV extraction, the binary sequence B

is input into the 1D CNN, which is designed to

capture and learn patterns associated with specific

heart disease types. The architecture of the 1D CNN

is composed of convolutional layers that perform

feature extraction, followed by pooling layers that

reduce dimensionality, and fully connected layers that

accomplish the classification task. The convolutional

operation in the k

-th layer is mathematically via

equation 4,

𝐹𝑘 = 𝜎

(

𝑊𝑘 ∗ 𝐵 + 𝑏𝑘

)

(4)

Where, Fk represents the feature map produced by

layer k, Wk and bk are the layer's weights and biases,

respectively, σ is the ReLU activation function, and ∗

symbolizes the convolution operation.

The depth of the network, with its 20 layers,

allows for the extraction of increasingly abstract

features from the input sequence. In this context, the

depth encompasses multiple convolutional layers,

each followed by an activation function, defined via

equation 5,

𝜎

(

𝑥

)

=𝑚𝑎𝑥

(

0, 𝑥

)

(5)

This enables the model to introduce non-linearity

into the model. Pooling layers interspersed among the

convolutional layers serve to reduce the spatial size

of the representation, thereby decreasing the number

of parameters and computation in the network. The

max pooling is represented via equation 6,

𝑃𝑘 = 𝑚𝑎𝑥

(

𝐹(𝑘,𝑖:𝑖 + 𝑝)

)

(6)

Where, Pk is the pooled feature map, 𝐹(𝑘,𝑖:𝑖 + 𝑝)

represents a segment of the feature map Fk, and p is

the pooling size for this process.

The culmination of the convolutional and pooling

layers is followed by one or more fully connected

layers, which integrate the high-level features

extracted by the preceding layers for the purpose of

classification. The operation in a fully connected

layer is expressed via equation 7,

𝐶𝑗 = 𝜎

(

𝑊𝑗 ⋅ 𝐹𝐿 + 𝑏𝑗

)

(7)

Where, Cj represents the output of the j-th fully

connected layer, FL is the flattened feature map from

the last convolutional or pooling layer, and Wj and bj

are the weights and biases of the fully connected

layers.

Finally, the classification output is generated

through a softmax layer, which converts the logits

from the fully connected layer into probabilities for

each heart disease type. The softmax function is

defined via equation 8,

𝑃

(

𝑐𝑖

∣

𝐶

)



∑

(8)

Where, P(ci∣C) represents the probability of class ci

given the output vector C, and Ci is the logit

corresponding to class ci sets. Through this intricate

process of SNV extraction and subsequent pattern

learning via a deep 1D CNN, the proposed

methodology adeptly classifies genomic samples into

specific heart disease types. The combination of

EXPLAINS 2024 - 1st International Conference on Explainable AI for Neural and Symbolic Methods

precise genetic mutation identification with advanced

feature extraction and classification techniques

represents a significant leap forward in the domain of

genomic-based heart disease diagnostics, offering a

nuanced and highly effective tool for understanding

and combating these conditions.

Next, the integration of XAI with CDI represents

a paradigm shift, enhancing the transparency and

interpretability of complex models used for analyzing

classified scans and genomic samples. A cornerstone

of this integration is the application of Gradient-

weighted Class Activation Mapping (GradCAM), a

technique that provides visual explanations for

decisions made by CNNs, thereby elucidating the

model's focus on specific features within the input

data that influenced its predictions (Selvaraju et al.,

2020).The GradCAM process commences by

identifying the feature maps generated by the final

convolutional layer of the CNN, which are

instrumental in the classification decision. Let Ak

represent the k-th feature map in the final

convolutional layer, where k ranges from 1 to K, with

K being the total number of feature maps in that layer.

The importance of each feature map Ak towards a

specific class c is determined by calculating the

gradient of the score for class c (represented as yc)

with respect to the feature map activations. This

gradient, averaged across the width and height

dimensions (indexed by i and j, respectively), yields

the neuron importance weights αkc, which is formally

expressed via equation 9

𝛼𝑘𝑐 =





∑∑





(9)

Where, Z represents the total number of units in the

feature map, and





signifies the gradient of the

class score with respect to each unit of the feature

maps. The next step involves computing the weighted

combination of forward activation maps, followed by

a ReLU function to obtain the GradCAM heatmap,

LGradCAMc. This heatmap highlights the regions of

the input image most influential for the model’s

prediction of class c. Mathematically, LGradCAMc is

derived via equation 10,

𝐿𝐺𝑟𝑎𝑑𝐶𝐴𝑀𝑐 = 𝑅𝑒𝐿𝑈

(

∑𝛼𝑘𝑐 ∗ 𝐴𝑘

)

(10)

The application of the ReLU function ensures that

only features with a positive influence on the class of

interest are visualized, thereby focusing on the

regions of the input that contribute most significantly

to the model's predictions. For genomic samples, the

interpretation via GradCAM adapts to the 1D nature

of the data samples. Although originally designed for

2D images, the core principle of highlighting

influential regions is applied to genomic sequences by

visualizing the segments of the sequence that led to

specific classifications. This requires adjusting the

GradCAM process to handle 1D convolutional

outputs, yet the foundational equations remain

applicable, demonstrating the method's versatility for

clinical use cases. The outcome of the GradCAM

process is a set of visual heatmaps that is

superimposed on the original medical scans or

genomic sequences, providing clinicians and

researchers with intuitive visual cues about the

regions or segments most critical to the model’s

diagnostic decisions. Furthermore, by revealing the

model's focus areas, GradCAM facilitates the

identification of potential biases or errors in the

model's reasoning, enabling continuous refinement

and improvement of the diagnostic tool.

The model initiates an FLPPA Mechanism, which

stands at the forefront of innovative methodologies

designed to safeguard patient confidentiality while

facilitating collaborative model training across

disparate healthcare entities. The essence of FLPPA

is to decentralize the learning process, thereby

ensuring that sensitive patient data remains within the

confines of its origin, such as a hospital or a clinical

laboratory, while still contributing to the collective

intelligence of a global model. At the core of the

FLPPA process, the interaction between the local and

global models is governed by a series of mathematical

operations designed to optimize model performance

while preserving privacy. Let Mg represent the global

model, and Mli represent the local model associated

with the i-th participant. The global model is

initialized and disseminated to all participants, who

then adapted this model based on their local datasets,

Di, through a series of training epochs. The update

from each local model, ΔMli, is represented by the

difference between the parameters of the locally

updated model and the initial global model

parameters, formalized via equation 11,

ΔMli = Mlinew − Mg (11)

Each local model's update is then securely

transmitted to a central server, where an aggregation

algorithm, typically Federated Averaging (FedAvg),

is employed to update the global model. The

aggregation process is mathematically expressed via

equation 12,

𝑀𝑔𝑛𝑒𝑤 = 𝑀𝑔 + 𝜂

∑





𝛥𝑀𝑙𝑖





(12)

Where, N is the total number of participants, ni is

the size of the i-th local dataset, and η is a learning

rate parameter that influences the extent to which

local updates affect the global model.

Design of an Iterative Method for Deep Multimodal Feature Fusion in Heart Disease Diagnostics Utilizing Explainable AI

To further enhance privacy, Differential Privacy

(DP) techniques are integrated into FLPPA,

introducing stochastic noise to the model updates

before aggregation. This is represented via equation

13,

𝑀𝑙𝑖𝐷𝑃 = 𝛥𝑀𝑙𝑖 + 𝑁

(

0, 𝜎



𝐼

)

(13)

Where, N(0,σ

I) represents the addition of noise

drawn from a Gaussian distribution with mean 0 and

variance σ

, and I is the identity matrix corresponding

to the dimensions of the model parameters.

The iterative nature of FLPPA allows for

continuous refinement of the global model, with each

cycle of local training and aggregation bringing the

model closer to optimal performance. The

convergence of the global model, Mg, is evaluated

through a loss function, L(Mg, D), where D

represents the aggregated dataset from all

participants. The objective is to minimize this loss

function, which is represented via equation 14,

𝑚𝑖𝑛𝐿

(

𝑀𝑔, 𝐷

)





∑

𝐿

(

𝑀𝑔, 𝐷𝑖

)





(14)

The security and privacy of the FLPPA process

are bolstered by encryption protocols during the

transmission of model updates, ensuring that data

remains confidential and secure against potential

breaches. Encryption is modeled as a function

E(ΔMli), where the encrypted update is decrypted by

the central server before aggregation. Upon receiving

the aggregated and enhanced global model,

participants apply the XAI results to this model to

generate privacy-preserved insights for clinical

inference operations. This application involves

mapping the GradCAM interpretability layer onto the

global model to elucidate decision- making processes

without exposing sensitive data samples. The final

output, privacy-preserved results, embodies the

culmination of collaborative learning and

interpretability, ensuring that stakeholders can glean

actionable insights with the assurance of patient data

privacy.

4 RESULT ANALYSIS

The experimental setup for validating the proposed

method was meticulously designed to assess its

performance comprehensively across multiple

datasets and samples. This section outlines the key

components of the experimental setup, including the

datasets utilized and training parameters.

4.1 Datasets

The experimental evaluation was conducted on three

diverse datasets, each representing a distinct aspect of

heart disease:

Genomic Dataset (https://cardiodb.org/):

This dataset comprises mRNA expression profiles

obtained from a cohort of patients diagnosed with

various forms of heart disease. It includes genomic

features such as gene expression levels, SNVs, and

gene-disease associations.

Clinical Dataset (https://www.kaggle.com/datasets

/sulianova/cardiovascular-disease-dataset):

The clinical dataset consists of patient demographic

information, medical history, and diagnostic records

obtained from electronic health records (EHRs) and

clinical databases. This dataset provides crucial

context and patient-specific information for

enhancing diagnostic accuracy.

Imaging Dataset(https://www.kaggle.com/datasets/

rahimanshu/cardiomegaly-disease-prediction-

using-cnn):

The imaging dataset contains a collection of MRI and

CT scans of patients' hearts, capturing detailed

structural and anatomical information. These imaging

modalities offer valuable insights into cardiac

morphology and pathology.

4.2 Training Parameters

The model was trained using the following

parameters:

Batch Size: 32

Learning Rate: 0.001

Optimizer: Adam optimizer with momentum (β1 =

0.9, β2 = 0.999)

Loss Function: Binary cross-entropy loss for

classification tasks

Regularization: L2 regularization (weight decay =

0.001) to prevent overfitting.

4.3 Used Values

Genomic Dataset: 10,000 samples with 20,000 gene

expression features.

Clinical Dataset: 5,000 patient records with

demographic information and medical history.

Imaging Dataset: 2,000 MRI and CT scans with

256x256 pixel resolution.

The paper addresses the problem of disjoint

datasets about different patients through an

innovative data fusion process. In this regard, the

EXPLAINS 2024 - 1st International Conference on Explainable AI for Neural and Symbolic Methods

DMFF-XAI model presented here comes up with a

multi-modal integration framework that can handle

and align the heterogeneous datasets even in the

scenario when the data source is multiple.

In this results section of the study, the

performance of the proposed model DMFF-XAI is

meticulously compared against three established

methods: Tree based Pipeline Optimization Tool

(TPOT) (Manduchi et al., 2022), Empirical Fuzzy

Multiobjective Multifactor Dimensionality

Reduction (EFMOMDR) and Graph Transformer

(GT) (Zheng et al., 2022). The comparative analysis

is encapsulated in four figures, each elucidating

different facets of performance metrics including

Accuracy, Precision, Recall, Specificity, AUC and

Computational efficiency.

Figure 2 showcases the superior Accuracy and

Precision of DMFF-XAI over the referenced

methods. The enhanced Accuracy (94.5%) and

Precision (93.8%) of DMFF-XAI underscore its

efficacy in correctly identifying and categorizing

heart disease types, significantly outperforming the

comparative models. This improvement is attributed

to the model's ability to synergistically integrate

multi-modal data, leveraging the strengths of

genomic, clinical, and imaging data to provide a more

nuanced and comprehensive analysis.

Figure 2: Diagnostic Accuracy and Precision.

In figure 3, DMFF-XAI demonstrates a

remarkable Recall (95.2%) and Specificity (94.1%),

indicating not only its proficiency in identifying true

positive cases but also in minimizing false positives.

This is particularly crucial in medical diagnostics,

where the cost of false negatives is high. The DMFF-

XAI's performance in these metrics reflects its

robustness and reliability in clinical settings.

Figure 3: Recall and Specificity.

Figure 4 presents the AUC values, with DMFF-

XAI achieving an AUC of 0.961, surpassing the

comparative methods. This high AUC value implies

that DMFF-XAI possesses a superior ability to

discriminate between the disease classes across all

possible thresholds, highlighting its effectiveness in

varying clinical scenarios

Figure 4: Area under the curve(AUC).

Table 1 below evaluates the Computational

efficiency of DMFF-XAI against the referenced

methods. Despite its sophisticated integration of

multi-modal data and the added complexity of

explainable AI components, DMFF-XAI exhibits

competitive training and inference times. The

relatively short training time (6.5 hours) and swift

inference time (0.45 seconds) are indicative of the

model's optimized architecture and algorithmic

efficiencies, making it viable for real-world

applications where time is of the essence.

Accuracy (%) Precision (%)

Accuracy and Precision

DMFF-XAI TPOT EFMOMDR GT

Recall and Specificity

Recall (%) Specificity (%)

0,82

0,84

0,86

0,88

0,9

0,92

0,94

0,96

0,98

DMFF-XAI TPOT EFMOMDR GT

AUC

Design of an Iterative Method for Deep Multimodal Feature Fusion in Heart Disease Diagnostics Utilizing Explainable AI

Table 1: Computational Efficiency.

Model Training Time

(hrs)

Inference Time

(sec)

DMFF-XAI 6.5 0.45

TPOT 8.2 0.67

EFMOMDR 7.9 0.62

GT 7.4 0.59

The results obtained collectively illustrate the

significant performance enhancements achieved by

DMFF-XAI. Its ability to deliver higher accuracy,

precision, recall, and specificity, alongside

impressive AUC values and computational

efficiency, underscores the model's potential to

revolutionize heart disease diagnostics. These

advancements highlight the critical role of integrating

multimodal data and explainable AI to improve

diagnostic outcomes and patient trust in automated

systems.

4 CONCLUSION AND FUTURE

SCOPE

In conclusion, the research presented herein

introduces a groundbreaking method, DMFF-XAI,

which significantly advances the domain of heart

disease diagnostics. Through a sophisticated

integration of multimodal data—encompassing

genomic information, clinical histories, and medical

imaging—coupled with the transparency afforded by

explainable AI, DMFF-XAI has demonstrated

superior performance across a range of critical

metrics when compared to existing methodologies.

The empirical results underscore the efficacy of

DMFF-XAI, highlighting its enhanced diagnostic

accuracy, precision, recall, specificity, and

computational efficiency. Notably, the model

achieved a diagnostic accuracy of 94.5% and a

precision of 93.8%, outperforming referenced

methods by a substantial margin. Such improvements

are pivotal, particularly in the realm of heart disease,

where early and accurate diagnosis can significantly

influence patient outcomes. The integration of

Explainable AI not only augments the model's

interpretability but also fosters a greater degree of

trust among clinicians and patients alike, ensuring

that the diagnostic process is both transparent and

accountable.

Looking to the future, the scope for extending and

refining DMFF-XAI is vast. One immediate avenue

of exploration is the application of this model to other

complex diseases, where the integration of

multimodal data could unlock new insights and

diagnostic capabilities. Additionally, the potential for

incorporating real-time data, such as from wearable

health devices, into the DMFF-XAI framework could

further enhance its predictive accuracy and utility in

ongoing health monitoring and preventive medicine.

DMFF-XAI sets a new benchmark for future research

and development, promising to revolutionize the

landscape of medical diagnostics and patient care use

cases.

REFERENCES

Ahuja, S. K., Shrimankar, D. D., & Durge, A. R. (2023). A

Study and Analysis of Disease Identification using

Genomic Sequence Processing Models: An Empirical

Review. Current Genomics, 24(4), 207–235.

https://doi.org/10.2174/0113892029269523231101051

455

Amann, J., Vetter, D., Blomberg, S. N., Christensen, H. C.,

Coffee, M., Gerke, S., Gilbert, T. K., Hagendorff, T.,

Holm, S., Livne, M., Spezzatti, A., Strümke, I., Zicari,

R. V., & Madai, V. I. (2022). To explain or not to

explain?—Artificial intelligence explainability in

clinical decision support systems. PLOS Digital Health,

1(2), e0000016. https://doi.org/10.1371/journal.

pdig.0000016

Arneson, D., Shu, L., Tsai, B., Barrere-Cain, R., Sun, C., &

Yang, X. (2017). Multidimensional Integrative

Genomics Approaches to Dissecting Cardiovascular

Disease. Frontiers in Cardiovascular Medicine,

4(February). https://doi.org/10.3389/fcvm.2017.00008

Bao, H., Deng, J., Xing, S., Zhong, Y., Shi, W., Marteau,

B., Das, B., Shehata, B., Deshpande, S., & Wang, M.

D. (2023). Rare Heart Transplant Rejection

Classification Using Diffusion-Based Synthetic Image

Augmentation. BHI 2023 - IEEE-EMBS International

Conference on Biomedical and Health Informatics,

Proceedings. https://doi.org/10.1109/BHI58575.2023.

10313377

Durge, A. R., & Shrimankar, D. D. (2024). DHFS-ECM:

Design of a Dual Heuristic Feature Selection-based

Ensemble Classification Model for the Identification of

Bamboo Species from Genomic Sequences. Current

Genomics, 25. https://doi.org/10.2174/0113892029

268176240125055419

Durge, A. R., Shrimankar, D. D., & Sawarkar, A. D. (2022).

Heuristic Analysis of Genomic Sequence Processing

Models for High Efficiency Prediction: A Statistical

EXPLAINS 2024 - 1st International Conference on Explainable AI for Neural and Symbolic Methods

Perspective. Current Genomics, 23(5), 299–317.

https://doi.org/10.2174/1389202923666220927105311

Durge, A., & Shrimankar, D. (2023). MRQPMS: Design of

a Map Reduce Bioinspired Model for Solving Quorum

Planted Motif Search for High-Speed Deployments.

3(Biostec), 123–130. https://doi.org/10.5220/0011616

500003414

Jose Triny, K., Kesavan, S., Navadeepan, H., & Ranjith

Kumar, P. (2023). Microarray based Geonomic

Biomarker Optimization for Cancer Prognosis.

Proceedings of the 2023 2nd International Conference

on Electronics and Renewable Systems, ICEARS 2023,

478–483. https://doi.org/10.1109/ICEARS56392.

2023.10084989

Loftus, T. J., Ruppert, M. M., Shickel, B., Ozrazgat-

Baslanti, T., Balch, J. A., Efron, P. A., Upchurch, G. R.,

Rashidi, P., Tignanelli, C., Bian, J., & Bihorac, A.

(2022). Federated learning for preserving data privacy

in collaborative healthcare research. Digital Health, 8,

1–5. https://doi.org/10.1177/20552076221134455

Manduchi, E., Le, T. T., Fu, W., & Moore, J. H. (2022).

Genetic Analysis of Coronary Artery Disease Using

Tree-Based Automated Machine Learning Informed By

Biology-Based Feature Selection. IEEE/ACM

Transactions on Computational Biology and

Bioinformatics, 19(3), 1379–1386. https://doi.org/10.

1109/TCBB.2021.3099068

Said, M. A., van de Vegte, Y. J., Zafar, M. M., van der

Ende, M. Y., Raja, G. K., Verweij, N., & van der Harst,

P. (2019). Contributions of Interactions Between

Lifestyle and Genetics on Coronary Artery Disease

Risk. Current Cardiology Reports, 21(9), 1–8.

https://doi.org/10.1007/s11886-019-1177-x

Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R.,

Parikh, D., & Batra, D. (2020). Grad-CAM: Visual

Explanations from Deep Networks via Gradient-Based

Localization. International Journal of Computer

Vision, 128(2), 336–359. https://doi.org/10.1007/s

11263-019-01228-7

Soibam, B. (2022). Genome-wide compendium of super-

long noncoding RNAs during mouse heart

development. Proceedings - 2022 IEEE International

Conference on Bioinformatics and Biomedicine, BIBM

2022, 3315–3319. https://doi.org/10.1109/BIBM

55620.2022.9995496

Ullah, A., Kumar, M., Sayyar, M., Sapna, F., John, C.,

Memon, S., Qureshi, K., Agbo, E. C., Ariri, H. I.,

Chukwu, E. J., Varrassi, G., Khatri, M., Kumar, S.,

Elder, N. M., & Mohamad, T. (2023). Revolutionizing

Cardiac Care: A Comprehensive Narrative Review of

Cardiac Rehabilitation and the Evolution of

Cardiovascular Medicine. Cureus, 15(10).

https://doi.org/10.7759/cureus.46469

Usova, E. I., Alieva, A. S., Yakovlev, A. N., Alieva, M. S.,

Prokhorikhin, A. A., Konradi, A. O., Shlyakhto, E. V.,

Magni, P., Catapano, A. L., & Baragetti, A. (2021).

Integrative analysis of multi-omics and genetic

approaches— A new level in atherosclerotic

cardiovascular risk prediction. Biomolecules, 11(11),

1–16. https://doi.org/10.3390/biom11111597

Wang, X., Zhang, H., Xiu, X., Qi, M., Yang, Y., & Zhao,

H. (2022). Genetic and phenotypic relationships

between coronary atherosclerotic heart disease and

electrocardiographic traits. Proceedings - 2022 IEEE

International Conference on Bioinformatics and

Biomedicine, BIBM 2022, 241–246. https://doi.

org/10.1109/BIBM55620.2022.9995557

Xu, Y., Qi, J., Zhou, W., Liu, X., Zhang, L., Yao, X., & Wu,

H. (2022). Generation of ring-shaped human iPSC-

derived functional heart microtissues in a Möbius strip

configuration. Bio-Design and Manufacturing, 5(4),

687–699. https://doi.org/10.1007/s42242-022-00204-4

Yu, Z., Yang, X., Chen, Y., Fang, R., Hogan, W. R., Gong,

Y., & Wu, Y. (2022). Identify Cancer Patients at Risk

for Heart Failure using Electronic Health Record and

Genetic Data. Proceedings - 2022 IEEE 10th

International Conference on Healthcare Informatics,

ICHI 2022, 138–142. https://doi.org/10.1109/

ICHI54592.2022.00032

Zheng, Y., Gindra, R. H., Green, E. J., Burks, E. J., Betke,

M., Beane, J. E., & Kolachalama, V. B. (2022). A

Graph-Transformer for Whole Slide Image

Classification. IEEE Transactions on Medical Imaging,

41(11), 3003–3015. https://doi.org/10.1109/TMI.

2022.3176598.

Design of an Iterative Method for Deep Multimodal Feature Fusion in Heart Disease Diagnostics Utilizing Explainable AI