A Proposal for Explainable Breast Cancer Detection from Histological

Images

Lucia Lombardi

, Myriam Giusy Tibaldi

, Rachele Catalano

, Mario Cesarelli

, Antonella Santone

and Francesco Mercaldo

Department of Medicine and Health Sciences “Vincenzo Tiberio”, University of Molise, Campobasso, Italy

Department of Engineering, University of Sannio, Benevento, Italy

{francesco.mercaldo, antonella.santone}@unimol.it, mcesarelli@unisannio.it

Keywords:

Artiﬁcial Intelligence, Deep Learning, Digital Pathology, Breast Cancer.

Abstract:

Breast cancer is the most prevalent cancer among women globally, making early and accurate detection es-

sential for effective treatment and improved survival rates. This is the reason why, early and accurate breast

cancer detection is crucial for proper treatment planning to save a life. This paper presents a method designed

to detect and localize breast cancer using deep learning, speciﬁcally convolutional neural networks. The ap-

proach classiﬁes histological images of breast tissue as either tumor-positive or tumor-negative. We utilize

several deep learning models, including a custom-built CNN, EfﬁcientNet, ResNet50, VGG-16, VGG-19, and

MobileNet. Fine-tuning was also applied to VGG-16, VGG-19, and Mo bileNet to enhance performance.

The aim is to provide a more effective network, able to correctly detect and localise breast cancer, that could

support the physician in making clinical decisions. It could also prove to be a successful model to speed up

the diagnostic process and detect the possible presence of the disease at an early stage. Additionally, we in-

troduce a novel deep learning model called MR Net, aimed at providing a more accurate network for breast

cancer detection and localization, potentially assisting clinicians in making informed decisions. This model

could also accelerate the diagnostic process, enabling early detection of the disease. Furthermore, we propose

a method for explainable predictions by generating heatmaps that highlight the regions within tissue images

that the model focuses on when predicting a label, revealing the detection of benign, atypical, and malignant

tumors. We evaluate both the quantitative and qualitative performance of MR Net and the other models, also

presenting explainable results that allow visualization of the tissue areas identiﬁed by the model as relevant to

the presence of breast cancer.

1 INTRODUCTION

Breast cancer (BC) is the second most common can-

cer and the leading cause of cancer death among

women, after lung cancer. Currently, over 280,000

women are diagnosed with breast cancer each year in

the United States, and 44,000 die of the disease. De-

spite the enhancements in early detection and know-

ing of the molecular foundations of the biology of

BC, nearly 30% of the patients with “early-stage”

BC have disease recurrence. It is the uncontrolled

and irregular growth of breast tissues forming a lump

or tumor. These breast lesions are of two types:

benign and malignant. Diagnosis from a histolog-

ical image is the gold standard in diagnosing con-

siderable types of cancer. Histology allows to dis-

tinguish between normal tissue, non-malignant (be-

nign) and malignant lesions and to perform a prog-

nostic evaluation. Breast tissue biopsies allow pathol-

ogists to histologically assess the microscopic struc-

ture and elements of the tissue. Due to the com-

plexity and diversity of histology images, the man-

ual examination requires abundant knowledge and ex-

perience of the pathologists and is time-consuming

and error-prone. Therefore, Deep learning aims to

enhance accuracy and minimize human error, along-

side pathologists without replacing their role, foster-

ing a collaborative approach for improved diagnos-

tic outcomes. In this paper we propose, the descrip-

tion of convolutional neural networks (CNNs), capa-

ble of classifying the H&E stained breast histology

images into three classes: benign tissue, atypical le-

sions and malignant tumour.In particular, we consider

the following lesion types, Normal (N), Pathological

Benign (PB), Usual Ductal Hyperplasia (UDH), Flat

Epithelial Atypia (FEA), Atypical Ductal Hyperpla-

680

Lombardi, L., Tibaldi, M. G., Catalano, R., Cesarelli, M., Santone, A. and Mercaldo, F.

A Proposal for Explainable Breast Cancer Detection from Histological Images.

DOI: 10.5220/0013236500003911

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 18th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2025) - Volume 2: HEALTHINF, pages 680-685

ISBN: 978-989-758-731-3; ISSN: 2184-4305

sia (ADH), Ductal Carcinoma in Situ (DCIS) and In-

vasive Carcinoma (IC). The dataset on which to per-

form the experimental analysis was developed by the

collaboration of the National Cancer Institute IRCCS

‘Fondazione G. Pascale’ in Naples, the Institute for

High Performance Computing and Networks (ICAR)

and IBM Research in Zurich. The Dataset contains

4539 high-resolution histological images obtained by

applying hematoxylin and eosin (HE) staining and a

magniﬁcation factor of 40x. The images are all dif-

ferent sizes to ensure better heterogeneity of the sam-

ples. The results demonstrate the method ability to ac-

curately distinguish between three levels considered

(atypical tumour, benign and malignant.) and outper-

form other state-of-the-art methods based on feature

extraction. This approach has the potential to enhance

the computer-assisted diagnosis(CAD) of BC and im-

prove early diagnosis, contributing to the prevention

of avoidable deaths.

2 THE METHOD

This section shows the method we propose for BC de-

tection and localisation starting from tissue images.

We aim to ﬁnd a model capable of classifying histo-

logical images as positive or negative for BC.

In detail, this is a multi-class classiﬁcation prob-

lem because there are three classes to assign to a tis-

sue image under analysis, based on supervised learn-

ing. Clearly all the images in the training are already

labeled. The foundation of the methodology lies in

the selection of the dataset, selection of deep learn-

ing models, training and testing of these models, gen-

eration of explainability through Gradient-weighted

Class Activation Mapping (i.e., grad-CAMs) and

analysis of the results, as shown in ﬁgure 1.

Figure 1: The main steps of the proposed method.

2.1 Dataset and Preprocessing

The choice of dataset is fundamental, because it in-

ﬂuences the performance, generalization, and relia-

bility of models. In the following case study, the

BReAst Carcinoma Subtyping (BRACS) (ICAR, Is-

tituto di Calcolo e Reti ad Alte Prestazioni, ) dataset

was adopted, consisting of histological images stained

with hematoxylin and eosin. This dataset was cho-

sen for the large number of images and the inclusion

of not only normal and cancerous images, but also

two atypical lesions, known as precancerous lesions.

(Sukhadia et al., 2023)

In particular, the types of lesions present in this

dataset are: Normal (N), Pathological Benign (PB),

Usual Ductal Hyperplasia (UDH), Flat Epithelial

Atypia (FEA), Atypical Ductal Hyperplasia (ADH),

Ductal Carcinoma in Situ ( DCIS) and invasive carci-

noma (IC). To optimize the dataset, a pre-processing

phase was carried out in order to obtain not only a

greater number of images, but also their more ho-

mogeneous distribution between the different classes.

The dataset utilized in this study presents three main

classes: atypical, malignant and benign. In particu-

lar, the atypical class includes images related to Flat

Epithelial Atypia (FEA) and Atypical Ductal Hyper-

plasia (ADH); the malignant class includes images of

Ductal Carcinoma in Situ (DCIS) and Invasive Car-

cinoma (IC) and the benign class includes images la-

beled as Normal (N), Pathological Benign (PB) and

Usual Ductal Hyperplasia (UDH). Subsequently a re-

sizing was carried out in order to obtain a size of

500x500 pixels for each image. To increase the num-

ber of examples to be provided to the deep learning

models, data augmentation was applied, in particular

the horizontal ﬂip, brightness and zoom techniques.

(Gonz

alez-Castro et al., 2023)

Following this pre-processing phase, the ﬁnal

dataset used contains 5628 images of which 80%

were allocated to training, 10% to testing and another

10% to validation, obtaining the following subdivi-

sion:

• training set: 4500 images of which 1500 classi-

ﬁed as benign, 1500 as atypical and 1500 as ma-

lignant.

• validation set: 564 images of which 188 classiﬁed

as benign, 188 as atypical and 188 as malignant.

• test set: 564 images of which 188 classiﬁed as

benign, 188 as atypical and 188 as malignant.

2.2 The CNN Model

In this article we exploit the Standard CNN net-

work, created by the authors and the following CNNs

(He et al., 2024; Huang et al., 2024; Pan and Xin,

2024) already present in the literature: EfﬁcientNet,

ResNet50, VGG-16, VGG-19 and MobileNet. The

Standard CNN is a network characterized by 13 lay-

ers. The convolutional block has three Conv2D layers

based on the application of 32, 64 and 128 3x3 size ﬁl-

ters and ReLu activation respectively, alternating with

three MaxPooling2D layers. While the classiﬁcation

A Proposal for Explainable Breast Cancer Detection from Histological Images

681

block has three Dense layers of 512, 256 respectively

with ReLu activation and three neurons with SoftMax

activation, alternating with 0,5 Dropout layers, used

to regularize the network. This network leverages the

categorical crossentropy loss function as it is a multi-

class classiﬁcation.

2.3 Training

Once we developed the CNN models, the models

were trained on the considered dataset, selecting spe-

ciﬁc hyperparameters. These hyperparameters in-

clude the number of epochs, batch size and learning

rate. The values that led to obtaining better results

selected during the training phase are summarized in

the table 1.

2.4 Fine-Tuning

Lastly, we also implemented three additional models

using ﬁne-tuning. Fine-tuning is a transfer learning

technique involving the use of a pre-trained model,

typically on a large dataset, we can adapt to our spe-

ciﬁc problem by continuing the training only for cer-

tain layers. Fine tuning requires two main steps: fea-

ture extraction, a phase involving the implementation

and training of a new classiﬁer, and actual ﬁne-tuning,

in which some of the layer that are closer to the clas-

siﬁer are unfrozen and re-trained. According to the

layer names adopted in Keras models, we unfroze:

• for MobileNet, the weights of the layers in

the last two convolutional blocks, starting with

“conv dw 12”;

• for VGG-16, the weights of all three layers

in the last convolutional block, starting with

“block5 conv1”.

• for VGG-19, the weights of all four layers

in the last convolutional block, starting with

“block5 conv1”.

2.5 Grad-CAM

Gradient-weighted Class Activation Mapping (Grad-

CAM) is a technique utilized in the ﬁeld of deep learn-

ing to analyze the decisions made by CNNs in image

classiﬁcation tasks (Zhou et al., 2023; Brunese et al.,

2022a; Brunese et al., 2022b; Martinelli et al., 2022;

Di Giammarco et al., 2023; Mercaldo et al., 2024;

Di Giammarco et al., 2024). Essentially, it reveals

which regions of an image capture the network’s at-

tention during predictions, thereby enhancing the un-

derstanding of the model’s decisions. Grad-CAM is

typically exploited for interpretability, as a matter of

fact deep neural networks are often treated as black

boxes due to their complex architectures. Grad-CAM

provides insight into their decision-making process

by highlighting which parts of an image are important

for a particular prediction. Moreover, it can be use-

ful for model debugging i.e., tt helps in understanding

and debugging model errors. By visualizing the re-

gions of an image that contribute most to a particular

prediction, researchers can identify potential biases or

misclassiﬁcations. Grad-CAM can also provide trust

and transparency, as a matter of fact in critical ap-

plications like healthcare or autonomous driving, it is

crucial to understand why a model makes a certain de-

cision. Grad-CAM enhances the trustworthiness and

transparency of AI systems by providing interpretable

explanations for their outputs.

3 EXPERIMENTAL ANALYSIS

In this section, we present the results of our experi-

mental analysis aimed at proposing a reliable method

for the detection and localization of BC. Speciﬁcally,

we analyze the metrics and confusion matrices ob-

tained during the classiﬁcation phase to conduct a

quantitative analysis. Subsequently, we perform a

qualitative analysis by presenting images generated

via Grad-CAM to assess the features on which the

classiﬁcation decisions were based. The results per-

tain to the classiﬁcation performed using the images

from the test set described in section 2.1.

3.1 Quantitative Analysis

Table 2 shows the results of the experimental analysis

with the hyper-parameters given in Table 1.

Based on these metrics, it is determined that Mo-

bileNet and VGG-19 achieved the most favorable re-

sults among the evaluated networks, with an accu-

racy of 73%, a very satisfactory result considering

that other studies done on the same dataset achieved

an accuracy of 56 % (Brancati et al., 2022) and 66%

(Ahmed et al., 2023).

Table 3 shows the results of the experimental anal-

ysis carried out using ﬁne-tuning. We chose to use

this technique only for the top three models, namely

MobileNet, VGG-16 and VGG-19, as can be seen

above.

Surprisingly, the accuracy achieved with this

method was slightly lower than the one achieved with-

out ﬁne-tuning. The VGG-19 model exhibits the

worst results since it only reached an accuracy of 71%

against the 73% of the previous evaluation phase.

Along with the metrics, we also considered the

HEALTHINF 2025 - 18th International Conference on Health Informatics

682

Table 1: Hyper-parameters selected during experimentation.

Model

Image

size

Batch Epochs

Learning

rate

Ex. time

Standard CNN 110×3 32 20 0.0001 0:08:57

EfﬁcientNet 224×3 32 20 0.00001 1:21:48

ResNet50 110×3 32 50 0.0001 2:24:58

VGG-16 224×3 32 50 0.00001 15:44:49

VGG-19 224×3 32 50 0.00001 18:40:30

MobileNet 110×3 32 20 0.001 0:13:08

Table 2: Results of the experimental analysis.

Model Accuracy Loss Precision Recall

EfﬁcientNet 0.6738 0.9003 0.6875 0.6631

ResNet50 0.7163 1.5485 0.7163 0.7163

VGG-16 0.7269 1.3160 0.7337 0.7181

VGG-19 0.7305 1.4156 0.7338 0.7234

MobileNet 0.7305 1.5697 0.7351 0.7234

Standard CNN 0.6737 0.9034 0.6824 0.6401

Figure 2: Confusion matrix obtained with the Stan-

dard CNN model.

confusion matrices in order to evaluate the classiﬁ-

cation qualuty of the networks.

3.2 Qualitative Analysis

Drawing our conclusion merely on metrics would

lead us to consider the network as a black box, while

we want to propose a method that can be explain-

able to boost adoption of deep learning in real-world

medical activity. For this purpose, we also refer to

the images obtained through Grad-CAM, a technique

that proves to be extremely valuable for Explainable

Artiﬁcial Intelligence (XAI). The generated images

present in fact a heat-map that visually highlights the

areas the model relied on to make its decisions. (Ade-

biyi et al., 2024) This way we can understand more

thoroughly the reasons behind the classiﬁca tion car-

ried out by a machine learning model. A heat-map

conveys informa tion through a color scale; speciﬁ-

cally, in the images shown below, signiﬁ cant regions

are represented in yellow, while less important areas

exhibit a blue/violet color. Below are the Grad-CAMs

obtained from the Standard CNN model:

Figure 3: Heatmap related to atypical cancer, correctly clas-

siﬁed with a conﬁdence of 71.0%.

In the case of atypical category (a precancerous

condition of the breast) the classiﬁer utilizes the area

where the breast duct walls are darker purple in color.

Those walls are a little too thick with an excessive

number of cells, since epithelial atypica can grow to

a thickness of 5 or 6 cubic epithelial cells, as opposed

to the normal thickness of the breast duct lining of

about 2 cells. In fact, epithelial atypica is a pro lif-

eration of epithelial cells in the terminal duct-lobular

units (TDLU) of the breast. The cells are clustered in

acini that have rigid contours, round nuclei and even

chromatin and the cell borders are readily ap preci-

ated, creating the impression of a mosaic pattern. Se-

cretions and calciﬁcations are present in the acinar lu-

mens.

A Proposal for Explainable Breast Cancer Detection from Histological Images

683

Table 3: Results of the experimental analysis using ﬁne-tuning.

Model Accuracy Loss Precision Recall

MobileNet 0.7270 0.7132 0.7589 0.7198

VGG-16 0.7234 0.8434 0.7431 0.7181

VGG-19 0.7145 1.1417 0.7171 0.7057

Figure 4: Heatmap related to benign cancer, correctly clas-

siﬁed with a conﬁdence of 97.8%.

In the case of benign category the classiﬁer de-

tects the area of normal tissue, consisting of glan-

dular tissue and adipose tissue. Ducts, lobules and

acini of the mammary gland are lined with epithelial

cells and immersed in adipose tissue. The model fo-

cuses on areas of the image containing the ﬁbroade-

noma, a benign pathological nodule, that results from

the proliferation of the glandular epithelium and ﬁ-

brous stroma of the breast. It is characterised by a ﬁ-

broblastic stroma with glandular structures with cys-

tic spaces, surrounded by connective tissue forming

an enveloping capsule.

Figure 5: Heatmap related to malign cancer, correctly clas-

siﬁed with a conﬁdence of 86.5%.

In the case of malignant category the classiﬁer re-

lies on large areas of the image, characterised by un-

differentiated malignant tissue, in which the tumour

cells have lost all their speciﬁc, normal histological

features and are therefore difﬁcult to classify. In fact,

it is an invasive carcinoma.

4 CONCLUSION AND FUTURE

WORK

Accurate histopathological diagnosis is crucial for BC

as patient numbers surge and pathologist resources

dwindle. We believe that our study signiﬁcantly im-

pacts the early diagnosis and identiﬁcation of breast

cancer tumors and their subtypes, especially atypi-

cal and malignant tumors, thus improving patient out-

comes and reducing patient mortality rates. Although

the proposed model does not outperform state-of-the-

art models in terms of BC detection, it does in terms

of explainability, as the heat-maps generated using

Grad-CAM reveal a proper detection of the presence

of benign, atypical and malignant tumours. Both our

networks (Standard CNN e MR Net) base their de-

cision on the geometry of the structures, the number

and shape of the cells. However, the MR NET man-

ages to obtain more deﬁned contours for the area of

interest, despite presenting a slight lower level of con-

ﬁdence. Neither already existing models nor the ﬁne-

tuned ones seem to reach these results for the Grad-

CAMs. It is clear that these models do not evaluate

the correct areas of the images, thus partially invali-

dating their results. Integrating AI into routine pathol-

ogy practice stands to improve diagnostic accuracy,

thereby contributing to reducing avoidable errors. De-

spite the existing hurdles, AI’s multifaceted contri-

butions to BC pathology hold great promise, provid-

ing enhanced accuracy, efﬁciency, and standardiza-

tion. Continued research and innovation are crucial

for overcoming obstacles and fully harnessing AI’s

transformative capabilities in breast cancer diagnosis

and assessment. From the future work point of view,

we will explore the possibility of considering other

models, for instance, related to object detection, to

understand whether it is possible to improve the per-

formance obtained in terms of BC localisation.

ACKNOWLEDGEMENTS

This work has been partially supported by EU DUCA,

EU CyberSecPro, SYNAPSE, PTR 22-24 P2.01 (Cy-

bersecurity) and SERICS (PE00000014) under the

MUR National Recovery and Resilience Plan funded

by the EU - NextGenerationEU projects, by MUR

- REASONING: foRmal mEthods for computA-

tional analySis for diagnOsis and progNosis in imag-

ING - PRIN, e-DAI (Digital ecosystem for inte-

grated analysis of heterogeneous health data related

to high-impact diseases: innovative model of care

and research), Health Operational Plan, FSC 2014-

2020, PRIN-MUR-Ministry of Health, the National

Plan for NRRP Complementary Investments D

∧

HEALTHINF 2025 - 18th International Conference on Health Informatics

684

4 Health: Digital Driven Diagnostics, prognostics

and therapeutics for sustainable Health care, Pro-

getto MolisCTe, Ministero delle Imprese e del Made

in Italy, Italy, CUP: D33B22000060001, FORE-

SEEN: FORmal mEthodS for attack dEtEction in au-

tonomous driviNg systems CUP N.P2022WYAEW,

ALOHA: a framework for monitoring the physical

and psychological health status of the Worker through

Object detection and federated machine learning, Call

for Collaborative Research BRiC -2024, INAIL, and

by Fondazione Intesa SanPaolo Onlus in the “Doctor-

ates in Humanities Disciplines” for the “Artiﬁcial In-

telligence for the Analysis of Archaeological Finds”

topic.

REFERENCES

Adebiyi, M. O., Olaniyan, D., Adebiyi, A. A., Olaniyan,

J., Amrevuawho, O. F., et al. (2024). Random

forest-based approach for integrating blood proﬁle in

metastatic breast cancer classiﬁcation. In 2024 In-

ternational Conference on Science, Engineering and

Business for Driving Sustainable Development Goals

(SEB4SDG), pages 1–6. IEEE.

Ahmed, F., Abdel-Salam, R., Hamnett, L., Adewunmi,

M., and Ayano, T. (2023). Improved breast cancer

diagnosis through transfer learning on hematoxylin

and eosin stained histology images. arXiv preprint

arXiv:2309.08745.

Brancati, N., Anniciello, A. M., Pati, P., Riccio, D., Scog-

namiglio, G., Jaume, G., De Pietro, G., Di Bonito, M.,

Foncubierta, A., Botti, G., et al. (2022). Bracs: A

dataset for breast carcinoma subtyping in h&e histol-

ogy images. Database, 2022:baac093.

Brunese, L., Brunese, M. C., Carbone, M., Ciccone, V.,

Mercaldo, F., and Santone, A. (2022a). Automatic

pi-rads assignment by means of formal methods. La

radiologia medica, pages 1–7.

Brunese, L., Mercaldo, F., Reginelli, A., and Santone, A.

(2022b). A neural network-based method for respira-

tory sound analysis and lung disease detection. Ap-

plied Sciences, 12(8):3877.

Di Giammarco, M., Dukic, B., Martinelli, F., Cesarelli, M.,

Ravelli, F., Santone, A., and Mercaldo, F. (2024). Re-

liable leukemia diagnosis and localization through ex-

plainable deep learning. In 2024 Fifth International

Conference on Intelligent Data Science Technologies

and Applications (IDSTA), pages 68–75. IEEE.

Di Giammarco, M., Mercaldo, F., Zhou, X., Huang, P., San-

tone, A., Cesarelli, M., and Martinelli, F. (2023). A

robust and explainable deep learning method for cer-

vical cancer screening. In International Conference

on Applied Intelligence and Informatics, pages 111–

125. Springer.

Gonz

alez-Castro, L., Ch

avez, M., Duﬂot, P., Bleret, V.,

Martin, A. G., Zobel, M., Nateqi, J., Lin, S., Pazos-

Arias, J. J., Del Fiol, G., et al. (2023). Machine

learning algorithms to predict breast cancer recurrence

using structured and unstructured sources from elec-

tronic health records. Cancers, 15(10):2741.

He, H., Yang, H., Mercaldo, F., Santone, A., and Huang, P.

(2024). Isolation forest-voting fusion-multioutput: A

stroke risk classiﬁcation method based on the multidi-

mensional output of abnormal sample detection. Com-

puter Methods and Programs in Biomedicine, page

108255.

Huang, P., Li, C., He, P., Xiao, H., Ping, Y., Feng, P., Tian,

S., Chen, H., Mercaldo, F., Santone, A., et al. (2024).

Mamlformer: Priori-experience guiding transformer

network via manifold adversarial multi-modal learn-

ing for laryngeal histopathological grading. Informa-

tion Fusion, 108:102333.

ICAR, Istituto di Calcolo e Reti ad Alte Prestazioni. Bracs:

Breast carcinoma subtyping. https://www.bracs.icar.

cnr.it/.

Martinelli, F., Mercaldo, F., and Santone, A. (2022). Wa-

ter meter reading for smart grid monitoring. Sensors,

23(1):75.

Mercaldo, F., Di Giammarco, M., Ravelli, F., Martinelli, F.,

Santone, A., and Cesarelli, M. (2024). Alzheimer’s

disease evaluation through visual explainability by

means of convolutional neural networks. Interna-

tional Journal of Neural Systems, 34(2):2450007–

2450007.

Pan, H. and Xin, L. (2024). Fdts: A feature disentangled

transformer for interpretable squamous cell carcinoma

grading. IEEE/CAA Journal of Automatica Sinica,

12(JAS-2024-1027).

Sukhadia, S. S., Muller, K. E., Workman, A. A., and Na-

garaj, S. H. (2023). Machine learning-based predic-

tion of distant recurrence in invasive breast carcinoma

using clinicopathological data: a cross-institutional

study. Cancers, 15(15):3960.

Zhou, X., Tang, C., Huang, P., Tian, S., Mercaldo, F., and

Santone, A. (2023). Asi-dbnet: an adaptive sparse

interactive resnet-vision transformer dual-branch net-

work for the grading of brain cancer histopathological

images. Interdisciplinary Sciences: Computational

Life Sciences, 15(1):15–31.

A Proposal for Explainable Breast Cancer Detection from Histological Images

685