Coronary Artery Stenosis Assessment in X-Ray Angiography Through

Spatio-Temporal Attention for Non-Invasive FFR and iFR Estimation

Raffaele Mineo

, Federica Proietto Salanitri

*, Giovanni Bellitto

, Ovidio De Filippo

Fabrizio D’Ascenzo

, Simone Palazzo

and Concetto Spampinato

PeRCeiVe Lab, University of Catania, Catania, Italy

Department of Medical Sciences, University of Turin, Turin, Italy

Keywords:

Attention Methods, Coronary Angiography, Medical Imaging Analysis.

Abstract:

Determining the degree of stenosis in coronary arteries through X-ray angiography imaging is a multifaceted

task, given their appearance variability, the overlapping of vessels, and their small size. Traditional automated

approaches utilize 2D deep models processing multiple angiography views as well as key frames. In this re-

search, we propose a new deep learning model to non-invasively evaluate the fractional ﬂow reserve (FFR)

and instantaneous wave-free ratio (iFR) of moderate coronary stenosis from angiographic videos to better ana-

lyze spatial and temporal correlation without manual preprocessing. Our strategy harnesses 3D Convolutional

Neural Networks (CNNs) to learn local spatio-temporal features and integrates self-attention layers to under-

stand broad correlations within the feature set. At training time, both FFR and iFR values are employed for

supervision, with missing targets suitably handled through multi-branch outputs. The resulting model can be

employed to predict the presence of a clinically-signiﬁcant coronary artery stenosis and to directly determine

the FFR and iFR values. We also include an explainability strategy to show which parts of a video the model

focuses on in the assessment of FFR and iFR values. Our proposed model demonstrates superior results than

competitors on a dataset of 778 angiography exams from 389 patients. Importantly, our model doesn’t require

key frames, thus reducing the efforts required by clinicians.

1 INTRODUCTION

Invasive evaluation of coronary conditions utilizing

Fractional Flow Reserve (FFR) and/or Instantaneous

Free wave Ratio (iFR) serves as an essential guide for

Percutaneous Coronary Revascularization (PCI) of in-

termediate grade coronary lesions (Neumann et al.,

2018; Knuuti and Revenco, 2020). Despite its proven

reduction of subsequent revascularization procedures

and associated prognostic beneﬁts, its real-world ap-

plication remains modest. This can be attributed to

factors such as the extensive setup and measurement

time, the considerable cost of the diagnostic probe,

and the invasive nature of the procedure that may

present a low, but not negligible, risk of complica-

tions. Furthermore, these evaluations can be sub-

ject to signiﬁcant inter-observer variations. Never-

theless, clinicians are not interested to the absolute

values of FFR/iFR, rather, if these values are un-

der or over threshold, which is set to 0.80 for FFR

and 0.89 for iFR (Neumann et al., 2018; Tonino

et al., 2009; De Bruyne et al., 2012). In light of

Equal contribution by R. Mineo and F. Proietto Salan-

itri

these challenges, Artiﬁcial Intelligence (AI) and Ma-

chine Learning (ML), with the aid of convolutional

neural networks (CNNs) and more recently vision

transformers (Dosovitskiy et al., 2020), have shown

immense potential (Proietto Salanitri et al., 2021;

Tomar et al., 2022; Salanitri et al., 2022; Valanarasu

et al., 2021). They can relax these constraints by en-

hancing risk assessment and cardiovascular imaging

analysis and automating artery stenosis quantiﬁcation

from coronary angiography. Despite the advance-

ments, existing strategies require key frame selection

alongside the incorporation of multiple angiography

views (Zhang et al., 2020; Zhang et al., 2019) (see

Fig. 1, increasing the burden on both the patients and

cardiologists.

To address the above limitations, in this paper we

propose a deep network that employs two views for

each exam, but it does not require any key frame se-

lection, thus balancing the need for comprehensive in-

formation. Our approach speciﬁcally seeks to eval-

uate stenosis severity through both direct and indi-

rect estimation of FFR/iFR values from angiography

videos. It harnesses the capabilities of both Convolu-

tional Neural Networks (CNNs) and attention mecha-

Mineo, R., Proietto Salanitri, F., Bellitto, G., De Filippo, O., D’Ascenzo, F., Palazzo, S. and Spampinato, C.

Coronary Artery Stenosis Assessment in X-Ray Angiography Through Spatio-Temporal Attention for Non-Invasive FFR and iFR Estimation.

DOI: 10.5220/0012449200003657

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 17th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2024) - Volume 1, pages 305-312

ISBN: 978-989-758-688-0; ISSN: 2184-4305

305

Figure 1: Two sample angiography views for the same patient. Red bounding boxes show the major stenosis.

nism to draw out meaningful spatio-temporal features

and capture long-range dependencies within the in-

put video. The CNN architecture excels at extract-

ing meaningful spatio-temporal features from the in-

put video, while the attention mechanism is adept at

capturing long-range dependencies within the video.

This combination allows our model to focus on the

most relevant features for the task at hand, enhanc-

ing its predictive capabilities. Our approach is unique

in that it assesses stenosis severity from two perspec-

tives: classiﬁcation and regression. The classiﬁca-

tion perspective allows us to categorize the severity

of stenosis, while the regression perspective enables

us to predict the precise FFR and iFR values. This

dual-faceted approach provides a more nuanced un-

derstanding of the patient’s condition, offering sig-

niﬁcant support to clinicians in their decision-making

process.

We validated the feasibility and accuracy of our

approach using a dataset collected from multiple Ital-

ian hospitals, consisting of 778 angiographic exams

from 389 patients. Our approach demonstrated su-

perior performance compared to conventional meth-

ods, underscoring its potential as a robust, adaptable,

and effective solution for stenosis assessment based

on coronary angiography. Moreover, we delved into

the interpretability of our model to provide a more

comprehensive understanding of its functionality.

Thus, the contributions of our paper can be sum-

marised as follows:

• We put forward a novel convolutional model

speciﬁcally designed to process and analyze X-

Ray angiography videos, thereby addressing a sig-

niﬁcant gap in the current literature.

• We pioneer a multi-branch architecture that al-

lows for diverse assessment modalities, including

both classiﬁcation and regression. This innovative

design not only promotes robust feature learning

but also facilitates the training of heterogeneous

datasets, thereby enhancing the model’s versatil-

ity and applicability.

• We conduct an exhaustive experimental analysis

to validate the efﬁcacy of our proposed method.

The results clearly demonstrate the superior per-

formance of our model, outperforming existing

solutions in terms of accuracy and robustness.

2 RELATED WORK

Coronary stenosis is a leading cause of heart failure

due to impaired blood ﬂow resulting from vessel nar-

rowing. The severity of the condition may indicate

its possible treatment, either through pharmaceutical

methods or surgery (Neumann et al., 2018). Over the

past decade, deep learning has been extensively uti-

lized for diagnosing the severity of stenosis, its de-

tection, and FFR quantiﬁcation from imaging data.

In particular, two main categories of methods exist:

2D approaches that analyze individual frames from

angiography videos, and 3D models that directly ex-

tract spatio-temporal features from the entire video.

Most 2D methods classify stenosis by severity levels

or identify hemodynamically-signiﬁcant stenosis by

thresholding FFR/iFR values. Key frames are typ-

ically identiﬁed through CNN architectures (Moon

et al., 2021; Rodrigues et al., 2021) or through

a combination of convolutional and recurrent net-

works (Cong et al., 2019; Ma et al., 2017; Ovalle-

Magallanes et al., 2022). A subset of these tech-

niques limit the analysis to blood vessels by incorpo-

BIOIMAGING 2024 - 11th International Conference on Bioimaging

306

rating a pre-processing segmentation step (Wu et al.,

2020; Au et al., 2018). Stenosis detection on indi-

vidual frames is also prevalent and generally involves

key frame identiﬁcation and object detection models

for stenosis location. A comprehensive benchmark of

state-of-the-art object detection models for coronary

stenoses is presented in (Danilov et al., 2020). An-

other set of 2D methods analyze the form and visual

appearance of blood vessels on the key frame to lo-

cate stenoses (Zhao et al., 2021b; Zhao et al., 2021a).

Additionally, interpretability approaches on frame-

based stenosis classiﬁcation models generate activa-

tion maps to assist in stenosis detection (Moon et al.,

2021; Cong et al., 2019). Recently, a few 3D models

have emerged that operate on the entire angiography

videos for quantitative coronary analysis and steno-

sis detection (Zhang et al., 2019; Zhang et al., 2020;

Xue et al., 2018; Han et al., 2023). (Zhang et al.,

2019; Zhang et al., 2020) are particularly relevant to

our work as they conduct a quantitative coronary anal-

ysis (QCA) of stenoses. In detail, these methods carry

out regression of several clinical indices, such as min-

imum lumen diameter, proximal and distal reference

vessel diameters, among others, utilizing a primary

angiography view alongside an additional side view

and a manually chosen key frame. These methods

are based on a 3D convolutional backbone, shared

between the two angiography views, whose features

are further processed by an attention layer in (Zhang

et al., 2020). They also employ 2D dilated residual

convolutions to extract features from the key frame.

These two feature sets are then processed by a hier-

archical self-attention mechanism for the ﬁnal QCA

regression.

Our proposed approach contrasts with existing

ones in that it does not require a manually-selected

key frame, thereby reducing the load on physicians.

We employ a 3D CNN model combined with a global

attention mechanism in conjunction with a multi-task

formulation of stenosis severity assessment, which

encourages the learning of more generic features and

supports supervision via both discrete class labels and

continuous FFR/iFR scores.

3 METHOD

The proposed model, as depicted in Fig. 2, is a deep

learning architecture that combines a sequence of 3D

convolutional kernels, inspired by the 3D ResNet-

18 (Tran et al., 2018), with attention modules. The

model is designed to process two views of angiog-

raphy exams, which serve as its input. These inputs

are processed by a shared 3D convolutional network,

which is trained to extract spatio-temporal features

from the angiography data. More speciﬁcally, the fea-

ture extractor is a ResNet3D model (Tran et al., 2018),

pre-trained on the Kinetics-400 dataset (Carreira and

Zisserman, 2017) for video action recognition. To

adapt this model from RGB to the X-ray inputs, the

ﬁrst-layer convolutional kernels are averaged over the

channel dimensions, allowing the model to effectively

process the angiography data. This feature extractor is

shared among the two input views and for each view it

produces a spatio-temporal tensor [W, H, T, C] where

W , H, T and C are, respectively, weight, height, time

and channels (feature maps). This tensor is then se-

rialized into a [W ×H ×T , C] tensor and processed

by a multi-head spatio-temporal self attention module

that performs the following operation:

MHA(Q, K, V ) =



head

, head

, . . . , head



×W

(1)

where:

- Q, K, and V are the input queries, keys, and val-

ues, respectively.

- head

= SelfAttention(Q ×W

, K ×W

, V ×W

)

represents the self-attention mechanism applied to

each head.

- h is the number of attention heads.



head

, head

, . . . , head



denotes the concatena-

tion of the output of each head.

- W

is the output transformation weight matrix.

The SelfAttention function is deﬁned as:

SelfAttention(Q, K, V ) = Softmax



Q ×K

√



×V

(2)

where d

is the dimension of the key vectors.

This self-attention mechanism allows the model

to focus on the most relevant features for the task

at hand, enhancing its predictive capabilities. The

self-attended features are then simultaneously fed to

a three-branch layer. This layer is responsible for

performing binary classiﬁcation and quantiﬁcation,

through regression, of either FFR (for the data for

which FFR is provided) or iFR (for the data for which

iFR is provided). This multi-task approach allows the

model to provide a comprehensive analysis of the an-

giography data. The classiﬁer predicts a binary class

on the hemodynamical signiﬁcance of a stenosis, us-

ing established iFR and FFR thresholds of 0.89 and

0.80, respectively, as reported in (Neumann et al.,

2018; Tonino et al., 2009; De Bruyne et al., 2012;

Baumann et al., 2018).

Coronary Artery Stenosis Assessment in X-Ray Angiography Through Spatio-Temporal Attention for Non-Invasive FFR and iFR Estimation

307

Figure 2: Architecture overview. A 3D CNN, common to all views, extracts spatio-temporal features which are subsequently

reﬁned through a multi-head spatio-temporal self-attention. The resulting features are then channeled to a three-output layer

which carries out binary classiﬁcation (over the threshold), and quantiﬁcation of iFR and FFR.

The model is trained using a combination of

binary-cross entropy loss (3) for the classiﬁcation task

and L1 loss (4) for the regression task on the continu-

ous values of iFR and FFR.

BCE

= −

∑

i=1

log( ˆy

) + (1 −y

)log(1 − ˆy

)] (3)

∑

i=1

− ˆy

| (4)

where y is the true label and ˆy is the predicted label.

The total loss function, combining both the binary-

cross entropy loss and the L1 loss, can be represented

as:

total

= αL

BCE

+ βL

(5)

where α and β are hyperparameters that control the

importance of the two loss terms. This dual loss func-

tion (5) approach allows the model to effectively learn

and predict both binary and continuous outcomes, en-

hancing its versatility and predictive power.

4 EXPERIMENTAL RESULTS

4.1 Dataset

The proposed method was trained and evaluated using

a private dataset of 778 coronary angiographies from

389 patients, gathered between January 2020 and Jan-

uary 2022. The patient group consisted of 303 males

and 86 females, with an average age of 67.9 ± 9.61

years. IRB protocol number is 0092163.

The study encompassed patients diagnosed with

either chronic coronary syndrome (CCS) or acute

coronary syndrome (ACS). For each patient, two X-

ray angiographies, each from a different view, were

available. These angiographies were assessed by two

expert cardiologists who conducted invasive physio-

logical evaluations of intermediate coronary stenosis

using iFR, FFR, or both.

Speciﬁcally, FFR values were available for 251

patients (64.5%), iFR values for 228 patients (58.6%),

and both values were provided for a subset of 90 pa-

tients (23.1%). For each exam, the major stenosis

was identiﬁed by radiologists and labeled as hemody-

namically signiﬁcant if the FFR value was less than

0.80 (Neumann et al., 2018; Johnson et al., 2015;

Tonino et al., 2009; De Bruyne et al., 2012) or if the

iFR value was less than 0.89 (Neumann et al., 2018;

Baumann et al., 2018; Davies et al., 2017). Conse-

quently, 93 patients (23.9%) were labeled as positive,

resulting in a signiﬁcant class imbalance.

The coronary angiography and physiological mea-

surements were conducted following standardized

clinical practice, and key frames were annotated by

two expert cardiologists. The angiographies were

collected using different machines and practices, re-

sulting in variations in spatial sizes (ranging from

512×512 to 1024×1024) and frame rates (ranging

from 15 fps to 30 fps). To standardize the data, all

samples were resized to 256×256 pixels and adjusted

to 15 fps, with all collected videos cut to a length of

60 frames, equivalent to 4 seconds.

4.2 Training and Evaluation

We conducted a 5-fold nested cross-validation to es-

timate the accuracy of the proposed approach and the

comparative methods. In each split, we allocated 60%

BIOIMAGING 2024 - 11th International Conference on Bioimaging

308

Table 1: Comparison of our method with state-of-the-art general deep learning and clinic AI methods.

Methods Accuracy AUC Sensitivity Speciﬁcity

S3D (Xie et al., 2018) 79.5±5.54 0.93±0.03 65.4±14.17 93.6±4.81

MVCNN (Su et al., 2015) 84.5±6.11 0.85±0.07 76.8±10.20 92.2±2.18

GVCNN (Feng et al., 2018) 78.4±4.52 0.87±0.05 71.0±7.29 85.8±4.36

DMQCA (Zhang et al., 2019) 81.7±3.69 0.82±0.04 70.2±8.00 93.2±1.87

HEAL (Zhang et al., 2020) 79.5±5.18 0.84±0.05 67.9±9.32 91.4±3.26

DMTRL (Xue et al., 2018) 79.1±3.79 0.85±0.05 67.0±6.56 91.2±3.48

Ours 87.3±6.14 0.93±0.03 82.4±12.24 92.2±1.79

of the data for training, 20% for validation, and the

remaining 20% for testing, maintaining the original

label proportion. The input X-ray angiographies were

normalized to a range between 0 and 1 and standard-

ized to have a mean of 0 and a variance of 1. Data

augmentation was implemented through random hor-

izontal and vertical ﬂipping and random 90-degree ro-

tation, applied identically to all frames. The training

process involved minimizing the combination of bi-

nary cross-entropy loss for the classiﬁcation branch,

and the L1 losses for the two iFR/FFR regression

branches. We used the AdamW optimizer with a

learning rate of 1e-5 and a batch size of 8, over a to-

tal of 300 epochs. During training, not all samples

have both FFR and iFR values, so the two regression

branches are not always activated. A speciﬁc branch

is activated/trained only when the corresponding la-

bel is available for a given sample. The experiments

were conducted on two NVIDIA Tesla T4 GPUs us-

ing automatic mixed precision (amp) training. The

proposed approach was implemented using the Py-

Torch and MONAI frameworks.

We assess the performance of our method com-

paring it with both general state-of-the-art deep archi-

tectures (S3D (Xie et al., 2018), MVCNN (Su et al.,

2015), GVCNN (Feng et al., 2018)) and clinic AI

techniques (DMQCA (Zhang et al., 2019), HEAL

(Zhang et al., 2020), DMTRL (Xue et al., 2018)) de-

signed for angiography videos, that have been adapted

to perform classiﬁcation. We employed the balanced

accuracy to address class imbalance, the area under

the Receiver Operating Characteristic (ROC) curve

(AUC), as well as sensitivity and speciﬁcity. We also

assessed the model’s performance in terms of regres-

sion,i.e., the model’s ability to quantify iFR and FFR

as continuous values. The metrics used for the re-

gression task were Mean Square Error (MSE) (6) and

Mean Absolute Error (MAE) (7)

MSE =

∑

i=1

− ˆy

)

(6)

MAE =

∑

i=1

− ˆy

| (7)

where y

is the actual value and ˆy

is the predicted

value.

4.3 Results

As reported in Table 1, our model shows satisfactory

accuracy in determining the hemodynamic signiﬁ-

cance of coronary stenoses, outperforming both gen-

eral deep learning models and clinic AI techniques,

with an average accuracy score of 87.3, signiﬁcantly

higher than the others: the closest competitor (Zhang

et al., 2019) has a notably lower accuracy of 81.7.

In terms of Area Under the Curve (AUC), as

shown in Fig. 3 our method yields 0.93, indicating

a higher true positive rate for the same false positive

rate, which is a desirable characteristic in medical ap-

plications. Our method also shows superior perfor-

mance in terms of sensitivity, with a score of 82.4.

This means that our method is better at correctly iden-

tifying positive cases. The speciﬁcity of our method is

92.2, which is slightly lower than (Zhang et al., 2019)

but higher than the other two methods. This indicates

that our method is quite good at correctly identifying

negative cases.

Additionally, as reported in Table 2 the perfor-

mance of our proposed model was also satisfactory

when FFR and iFR values were treated as continuous

variables rather than dichotomous ones.

Table 2: Performance (in terms of MSE and MAE) when

regressing FFR and iFR values.

Measure MSE MAE

FFR 0.060±0.005 0.037±0.002

iFR 0.045±0.005 0.026±0.003

Fig. 3 reports the comparison in terms of ROC

and precision-recall curves between our approach and

Coronary Artery Stenosis Assessment in X-Ray Angiography Through Spatio-Temporal Attention for Non-Invasive FFR and iFR Estimation

309

Figure 3: ROC (left) and Precision-Recall (right) curves comparison between our approach and state-of-the-art clinical AI

methods.

Figure 4: Impact of attention strategies. Interpretability maps computed through M3D-cam (Chattopadhay et al., 2018),

when using attention and not. For each image, we also report, with an arrow, the major stenosis as identiﬁed by clinicians. In

each part, the yellow parts are the most activated ones, while the purple areas are the least activated ones. In our model we

can see that attention is focused on stenoses or arteries, while the model without attention also targets the morphology of the

bones and tissues visible at X-rays.

the state-of-the-art methods speciﬁcally designed for

stenosis quantiﬁcation.

In addition, in Table 3 we investigate whether the

use of a keyframe for each view, inserting an input

branch with a ResNet-50 with late fusion strategy on

our model, would lead to performance improvements.

Our ﬁndings reveal a substantial equality in perfor-

mance, showing that our approach is autonomously

able to identify key information in the video, without

the need for manual human interaction.

Finally, we investigate the impact of the employed

spatio-temporal attention mechanism. In particular,

we evaluate the performance of our model when us-

ing a) global attention, i.e., each location in the fea-

ture volume attends to all other locations in space and

time; b) no attention applied to the CNN extracted

features. The comparison is carried out using dif-

ferent attention strategies using interpretability maps,

computed through M3D-cam (Chattopadhay et al.,

2018). Fig. 4 shows that our spatial and temporal

attention is an effective strategy to make the model

focus on the stenosis for FFR quantiﬁcation, demon-

strating the importance of both spatial and temporal

information in coronary angiographies. When no at-

tention is used the model fails to focus on the major

stenosis, thus leading to incorrect and highly uncer-

tain predictions.

Overall, these results suggest that our method pro-

BIOIMAGING 2024 - 11th International Conference on Bioimaging

310

Table 3: Effect of keyframe integration.

Methods Accuracy AUC Sensitivity Speciﬁcity

Ours 87.3±6.14 0.93±0.03 82.4±12.24 92.2±1.79

Ours + keyframe 87.1±7.08 0.94±0.05 82.3±10.94 91.9±1.91

vides a more accurate and reliable performance com-

pared to the other state-of-the-art methods.

5 CONCLUSIONS

In this work, we presented an approach for the non-

invasive evaluation of Fractional Flow Reserve (FFR)

and instantaneous wave-free ratio (iFR) from standard

coronary angiography, based on a combination of

3D Convolutional Neural Networks and self-attention

layers, without the requirement of manual interven-

tion in the identiﬁcation of a keyframe or vessel re-

gion in the video. Our approach provides a reliable

evaluation of coronary stenoses without the need for

hyperemic ﬂow induction, eliminating risks associ-

ated with intracoronary wire passage and reducing ad-

ditional equipment, training, and procedural costs.

Our model has demonstrated exceptional accu-

racy and speciﬁcity across diverse cases, showcas-

ing its robust performance in hemodynamic evalua-

tion and its potential to enhance both operator and

patient access to physiologically guided decision-

making, which could have a consequential impact on

clinical outcomes and costs.

Future research directions include the employ-

ment of speciﬁc synthetic data generation techniques,

as exempliﬁed by (Pennisi et al., 2023), both for data

augmentation purposes and to facilitate secure data

sharing while preserving privacy. Moreover, given

that a substantial portion of patients in the dataset un-

derwent invasive FFR/iFR for clinical reasons, a po-

tential selection bias towards a relatively high burden

of angiographic and functional coronary disease can-

not be entirely dismissed. To address this concern, on-

going efforts involve expanding the dataset and reﬁn-

ing the model to ensure its robustness across a broader

spectrum of patient proﬁles and clinical scenarios.

ACKNOWLEDGEMENTS

This research was supported by MUR, PRIN 2020,

project: “LEGO.AI: LEarning the Geometry of

knOwledge in AI systems”, n. 2020TA3K9N, CUP:

E63C20011250001, and by Piano della Ricerca di

Ateneo 2020/2022, Linea 2D, Universit

a di Catania.

Raffaele Mineo is a PhD student enrolled in the Na-

tional PhD in Artiﬁcial Intelligence, XXXVII cycle,

course on Health and life sciences, organized by Uni-

versit

a Campus Bio-Medico di Roma.

REFERENCES

Au, B., Shaham, U., Dhruva, S., Bouras, G., Cristea,

E., Coppi, A., Warner, F., Li, S.-X., and Krumholz,

H. (2018). Automated characterization of steno-

sis in invasive coronary angiography images with

convolutional neural networks. arXiv preprint

arXiv:1807.10597.

Baumann, S., Chandra, L., Skarga, E., Renker, M.,

Borggrefe, M., Akin, I., and Lossnitzer, D. (2018). In-

stantaneous wave-free ratio (ifr®) to determine hemo-

dynamically signiﬁcant coronary stenosis: a compre-

hensive review. World Journal of Cardiology.

Carreira, J. and Zisserman, A. (2017). Quo vadis, action

recognition? a new model and the kinetics dataset.

In proceedings of the IEEE Conference on Computer

Vision and Pattern Recognition, pages 6299–6308.

Chattopadhay, A., Sarkar, A., Howlader, P., and Balasub-

ramanian, V. N. (2018). Grad-cam++: Generalized

gradient-based visual explanations for deep convolu-

tional networks. In 2018 IEEE Winter Conference on

Applications of Computer Vision (WACV). IEEE.

Cong, C., Kato, Y., Vasconcellos, H. D., Lima, J., and

Venkatesh, B. (2019). Automated stenosis detection

and classiﬁcation in x-ray angiography using deep

neural network. In IEEE international conference on

bioinformatics and biomedicine (BIBM). IEEE.

Danilov, V., Gerget, O., Klyshnikov, K., Ovcharenko, E.,

and Frangi, A. (2020). Comparative study of deep

learning models for automatic coronary stenosis de-

tection in x-ray angiography. In Proceedings of the

30th International Conference on Computer Graphics

and Machine Vision.

Davies, J. E., Sen, S., Dehbi, H.-M., Al-Lamee, R., Petraco,

R., Nijjer, S. S., Bhindi, R., Lehman, S. J., Walters,

D., Sapontis, J., et al. (2017). Use of the instantaneous

wave-free ratio or fractional ﬂow reserve in pci. New

England Journal of Medicine, 376(19):1824–1834.

De Bruyne, B., Pijls, N. H., Kalesan, B., Barbato, E.,

Tonino, P. A., Piroth, Z., Jagic, N., M

obius-Winkler,

S., Rioufol, G., Witt, N., et al. (2012). Fractional ﬂow

reserve–guided pci versus medical therapy in stable

coronary disease. New England Journal of Medicine.

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn,

D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer,

M., Heigold, G., Gelly, S., et al. (2020). An image is

Coronary Artery Stenosis Assessment in X-Ray Angiography Through Spatio-Temporal Attention for Non-Invasive FFR and iFR Estimation

311

worth 16x16 words: Transformers for image recogni-

tion at scale. arXiv preprint arXiv:2010.11929.

Feng, Y., Zhang, Z., Zhao, X., Ji, R., and Gao, Y. (2018).

Gvcnn: Group-view convolutional neural networks

for 3d shape recognition. In CVPR.

Han, T., Ai, D., Li, X., Fan, J., Song, H., Wang, Y., and

Yang, J. (2023). Coronary artery stenosis detection via

proposal-shifted spatial-temporal transformer in x-ray

angiography. Computers in Biology and Medicine.

Johnson, N. P., Johnson, D. T., Kirkeeide, R. L., Berry, C.,

De Bruyne, B., Fearon, W. F., Oldroyd, K. G., Pijls,

N. H., and Gould, K. L. (2015). Repeatability of frac-

tional ﬂow reserve despite variations in systemic and

coronary hemodynamics. JACC: Cardiovascular In-

terventions, 8(8):1018–1027.

Knuuti, J. and Revenco, V. (2020). 2019 esc guidelines

for the diagnosis and management of chronic coronary

syndromes. European heart journal, 41(5):407–477.

Ma, H., Ambrosini, P., and van Walsum, T. (2017). Fast

prospective detection of contrast inﬂow in x-ray an-

giograms with convolutional neural network and re-

current neural network. In MICCAI.

Moon, J. H., Cha, W. C., Chung, M. J., Lee, K.-S., Cho,

B. H., Choi, J. H., et al. (2021). Automatic stenosis

recognition from coronary angiography using convo-

lutional neural networks. Computer methods and pro-

grams in biomedicine, 198:105819.

Neumann, F.-J., Sousa-Uva, M., Ahlsson, A., Alfonso, F.,

Banning, A. P., Benedetto, U., Byrne, R. A., Col-

let, J.-P., Falk, V., Head, S. J., J

uni, P., Kastrati, A.,

Koller, A., Kristensen, S. D., Niebauer, J., Richter,

D. J., Seferovi

c, P. M., Sibbing, D., Stefanini, G. G.,

Windecker, S., Yadav, R., Zembala, M. O., and Group,

E. S. D. (2018). 2018 ESC/EACTS Guidelines on my-

ocardial revascularization. European Heart Journal.

Ovalle-Magallanes, E., Avina-Cervantes, J. G., Cruz-

Aceves, I., and Ruiz-Pinales, J. (2022). Hybrid

classical–quantum convolutional neural network for

stenosis detection in x-ray coronary angiography. Ex-

pert Systems with Applications, 189:116112.

Pennisi, M., Salanitri, F. P., Bellitto, G., Palazzo, S., Bagci,

U., and Spampinato, C. (2023). A privacy-preserving

walk in the latent space of generative models for med-

ical applications. In MICCAI.

Proietto Salanitri, F., Bellitto, G., Irmakci, I., Palazzo, S.,

Bagci, U., and Spampinato, C. (2021). Hierarchical 3d

feature learning forpancreas segmentation. In MLMI

(MICCAI workshop).

Rodrigues, D. L., Menezes, M. N., Pinto, F. J., and Oliveira,

A. L. (2021). Automated detection of coronary artery

stenosis in x-ray angiography using deep neural net-

works. arXiv preprint arXiv:2103.02969.

Salanitri, F. P., Bellitto, G., Palazzo, S., Irmakci, I., Wal-

lace, M., Bolan, C., Engels, M., Hoogenboom, S.,

Aldinucci, M., Bagci, U., et al. (2022). Neural trans-

formers for intraductal papillary mucosal neoplasms

(ipmn) classiﬁcation in mri images. In EMBC.

Su, H., Maji, S., Kalogerakis, E., and Learned-Miller, E.

(2015). Multi-view convolutional neural networks for

3d shape recognition. In ICCV.

Tomar, N. K., Jha, D., Bagci, U., and Ali, S. (2022). Tganet:

Text-guided attention for improved polyp segmenta-

tion. In International Conference on Medical Im-

age Computing and Computer-Assisted Intervention,

pages 151–160. Springer.

Tonino, P. A., De Bruyne, B., Pijls, N. H., Siebert, U.,

Ikeno, F., vant Veer, M., Klauss, V., Manoharan, G.,

Engstrøm, T., Oldroyd, K. G., et al. (2009). Fractional

ﬂow reserve versus angiography for guiding percuta-

neous coronary intervention. New England Journal of

Medicine, 360(3):213–224.

Tran, D., Wang, H., Torresani, L., Ray, J., LeCun, Y., and

Paluri, M. (2018). A closer look at spatiotemporal

convolutions for action recognition. In Proceedings of

the IEEE conference on Computer Vision and Pattern

Recognition, pages 6450–6459.

Valanarasu, J. M. J., Oza, P., Hacihaliloglu, I., and Pa-

tel, V. M. (2021). Medical transformer: Gated

axial-attention for medical image segmentation. In

Medical Image Computing and Computer Assisted

Intervention–MICCAI 2021: 24th International Con-

ference, Strasbourg, France, September 27–October

1, 2021, Proceedings, Part I 24, pages 36–46.

Springer.

Wu, W., Zhang, J., Xie, H., Zhao, Y., Zhang, S., and Gu,

L. (2020). Automatic detection of coronary artery

stenosis by convolutional neural network with tempo-

ral constraint. Computers in biology and medicine,

118:103657.

Xie, S., Sun, C., Huang, J., Tu, Z., and Murphy, K. (2018).

Rethinking spatiotemporal feature learning: Speed-

accuracy trade-offs in video classiﬁcation. In Pro-

ceedings of the European conference on computer vi-

sion (ECCV), pages 305–321.

Xue, W., Brahm, G., Pandey, S., Leung, S., and Li, S.

(2018). Full left ventricle quantiﬁcation via deep mul-

titask relationships learning. Medical image analysis.

Zhang, D., Yang, G., Zhao, S., Zhang, Y., Ghista, D.,

Zhang, H., and Li, S. (2020). Direct quantiﬁcation of

coronary artery stenosis through hierarchical attentive

multi-view learning. IEEE Transactions on Medical

Imaging.

Zhang, D., Yang, G., Zhao, S., Zhang, Y., Zhang, H.,

and Li, S. (2019). Direct quantiﬁcation for coro-

nary artery stenosis using multiview learning. In In-

ternational Conference on Medical Image Computing

and Computer-Assisted Intervention, pages 449–457.

Springer.

Zhao, C., Tang, H., McGonigle, D., He, Z., Zhang, C.,

Wang, Y.-P., Deng, H.-W., Bober, R., and Zhou, W.

(2021a). A new approach to extracting coronary ar-

teries and detecting stenosis in invasive coronary an-

giograms. arXiv preprint arXiv:2101.09848.

Zhao, C., Vij, A., Malhotra, S., Tang, J., Tang, H., Pienta,

D., Xu, Z., and Zhou, W. (2021b). Automatic extrac-

tion and stenosis evaluation of coronary arteries in in-

vasive coronary angiograms. Computers in Biology

and Medicine, 136:104667.

BIOIMAGING 2024 - 11th International Conference on Bioimaging

312