Breast Cancer Automatic Diagnosis System using Faster Regional
Convolutional Neural Networks
Lourdes Duran-Lopez, Juan Pedro Dominguez-Morales
a
, Isabel Amaya-Rodriguez,
Francisco Luna-Perejon, Javier Civit-Masot, Saturnino Vicente-Diaz
b
and Alejandro Linares-Barranco
c
Robotics and Technology of Computers Lab., University of Seville, Seville 41012, Spain
Keywords:
Breast Cancer, Mammography, Deep Learning, Convolutional Neural Network, Faster Regional
Convolutional Neural Network, Medical Image Analysis.
Abstract:
Breast cancer is one of the most frequent causes of mortality in women. For the early detection of breast cancer,
the mammography is used as the most efficient technique to identify abnormalities such as tumors. Automatic
detection of tumors in mammograms has become a big challenge and can play a crucial role to assist doctors
in order to achieve an accurate diagnosis. State-of-the-art Deep Learning algorithms such as Faster Regional
Convolutional Neural Networks are able to determine the presence of an object and also its position inside
the image in a reduced computation time. In this work, we evaluate these algorithms to detect tumors in
mammogram images and propose a detection system that contains: (1) a preprocessing step performed on
mammograms taken from the Digital Database for Screening Mammography (DDSM) and (2) the Neural
Network model, which performs feature extraction over the mammograms in order to locate tumors within
each image and classify them as malignant or benign. The results obtained show that the proposed algorithm
has an accuracy of 97.375%. These results show that the system could be very useful for aiding physicians
when detecting tumors from mammogram images.
1 INTRODUCTION
Breast cancer is the most prevalent cancer among
women and the leading cause of cancer death. Ac-
cording to Global Cancer Observatory (GLOBO-
CAN), there have been around 2.1 million diagnosed
female breast cancer cases in 2018, representing al-
most one for every four cancer cases among women
(Bray et al., 2018). There are several types of abnor-
malities in breast cancer such as tumors and micro-
calcifications, which are the main indicators of malig-
nancy. Tumors are attributed to any lesion or protu-
berance in the breast, which may be benign or malig-
nant; while microcalcifications are areas with a large
amount of calcium accumulation. The early detection
of these abnormalities is crucial to improve women’s
quality of life and also the survival rate in critical
cases.
Mammography is the most effective screening
a
https://orcid.org/0000-0002-5474-107X
b
https://orcid.org/0000-0001-9466-485X
c
https://orcid.org/0000-0002-6056-740X
tool for the diagnosis of breast cancer. When us-
ing this technique, the traditional method to perform
the diagnose consists in the evaluation by a physi-
cian. However, due to certain reasons, the diagnosis
of breast cancer may be susceptible to failure. On the
one hand, the experience of the physician and his ex-
pertise, and also the fatigue occasioned after examin-
ing many consecutive mammograms are some of the
reasons for failure. On the other hand, there may be
other reasons beyond the ones related to the special-
ist. Tumors could be confusing and hard to classify
as benign or malignant depending on the character-
istics shown in the mammograms due to their high
similarity. To this end, a computer-aided diagnosis
(CAD) could provide a second opinion in order to
assist physicians to make the decision, which could
reduce false negative diagnoses and, therefore, mini-
mize the possibility of making a mistake.
In this work, a CAD system for detecting and clas-
sifying tumors in mammograms using a type of Deep
Learning-based algorithm called Regional Convolu-
tional Neural Networks (R-CNN) is presented. The
purpose of R-CNNs is to locate an object inside an
444
Duran-Lopez, L., Dominguez-Morales, J., Amaya-Rodriguez, I., Luna-Perejon, F., Civit-Masot, J., Vicente-Diaz, S. and Linares-Barranco, A.
Breast Cancer Automatic Diagnosis System using Faster Regional Convolutional Neural Networks.
DOI: 10.5220/0008494304440448
In Proceedings of the 11th International Joint Conference on Computational Intelligence (IJCCI 2019), pages 444-448
ISBN: 978-989-758-384-1
Copyright
c
2019 by SCITEPRESS Science and Technology Publications, Lda. All rights reserved
image by proposing possible regions of interest and
classifying them later. Faster Regional Convolutional
Neural Network (Faster R-CNN) (Ren et al., 2015)
is one of the most recent developed algorithms based
on the R-CNN architecture. Faster R-CNN was devel-
oped in 2015 and its objective was to reduce its prede-
cessors computation time. This algorithm has already
been used for different context, proving its ability to
detect objects successfully. Face detection (Jiang and
Learned-Miller, 2017), polyp detection in gastroin-
testinal images, driver’s cell-phone usage, hands on
steering wheel detection (Hoang Ngan Le et al., 2016)
and wildland forest fire smoke detection (Zhang et al.,
2018) are some application examples.
In this paper, we report our study of an automated
computerized system for locating and classifying tu-
mors as malignant or benign in mammograms using a
Faster R-CNN algorithm.
The rest of the paper is structured as follows: Sec-
tion 2 is organized in two main subsections: Dataset
(2.1), which consists of Image Acquisition (2.1.1)
and Data Preprocessing (2.1.2), and Neural Network
(2.2). Then, the results achieved in this approach are
presented and discussed in Section 3. Finally, the con-
clusions of this work are presented in Section 4.
2 METHODOLOGY
In this section, we present the methods that are used in
the approach that has been carried out. First, the used
dataset is explained from its acquisition to its pre-
processing. Finally, the Faster R-CNN is presented,
along with the architecture, training and testing of the
neural network that has been used in this work, and
also, the performance evaluation metrics.
2.1 Dataset
2.1.1 Image Acquisition
In this work, the database of mammograms from Dig-
ital Database for Screening Mammography (DDSM)
1
was used. The DDSM (Heath et al., 2000), (Heath
et al., 1998) was created by the University of South
Florida. It has become a very useful tool in the devel-
opment of decision support systems for breast cancer
diagnosis. DDSM contains more than 2620 scanned
grayscale mammogram images which include nor-
mal, benign and malignant cases with verified patho-
logical information (see Fig. 1). For each case, four
1
http://www.eng.usf.edu/cvprg/Mammography/
Database.html
Figure 1: Mammograms taken from the DDSM. From left
to right: breast without evidence of abnormality, breast with
presence of malignant tumor and breast with benign tumor
both of them shown in red.
mammograms are taken with two different views: bi-
lateral craniocaudal (CC) and mediolateral oblique
(MLO). In benign and malignant cases, ground truth
information is given with the location of the abnor-
mality.
2.1.2 Data Preprocessing
A preprocessing step was applied to the images in or-
der to enhance the performance of the system and,
therefore, improve the results of the detection (see
Fig. 2).
Grayscale images tend to have a compressed his-
togram distribution, meaning that the details are not
easy to observe. By modifying the histogram, the
grayscale interval can be extended, increasing the
contrast in order to better distinguish details con-
tained in the image. Therefore, a contrast enhance-
ment method called contrast-limited adaptative his-
togram equalization (CLAHE) was used to enhance
image details. CLAHE is able to define the shape
of the histogram that produces the best quality result
(Maitra et al., 2012).
Another problem that has to be considered is the
noise, since during the mammogram acquisition it is
one of the effects that is going to be present. There-
fore, we performed the spatial transformation using
the median filter in order to reduce noise (Ponraj et al.,
2011). This technique generates a new image where
each pixel gets its intensity from the median of neigh-
boring pixels.
Mammograms of the DDSM have different size,
thus the preprocessed dataset was resized to 600 width
and 900 pixels height. Also, a normalization process
was applied reducing the range of values to [0,1] in
order to achieve a better performance.
Breast Cancer Automatic Diagnosis System using Faster Regional Convolutional Neural Networks
445
Figure 2: Preprocessing step applied to the original mam-
mograms. (A) shows an original mammogram sample,
while (B) shows the same mammogram after being prepro-
cessed.
2.2 Neural Network Model for Breast
Cancer Detection
As mentioned before, Faster R-CNN is an algorithm
whose purpose is to detect and classify the regions of
interest, locating them in the image. This algorithm
consists of two parts (see Fig. 3): the region proposal
network (RPN) which generates proposal regions of
interest and the detector network whose purpose is
to perform the classification over the proposed region
(Ren et al., 2015).
RPN receives an image from the dataset as an in-
put. Then, it extracts feature maps and analyzes them
to propose the regions that most likely contains a tu-
mor. The novel step that this architecture introduces is
the way to determine the regions of interest, by using
a Convolutional Neural Network that takes advantage
of the mathematical operations made in the convolu-
tion layers. In this study, ResNet50 (He et al., 2016)
was used as the CNN model.
The proposed regions of interest generated by the
RPN are the input of the detector network, called
Faster R-CNN detector, which performs two main
tasks: a classification and a regression. The output of
the regression determines a predicted bounding-box
where the object could be located, while the output of
the classification sub-network is the probability (con-
fidence value) that the box contains the object.
For the training step, all the images obtained af-
ter performing the preprocessing step were randomly
mixed in order to avoid any classification bias.
To evaluate the accuracy and robustness of Faster
R-CNN in the detection, 85% of the preprocessed
dataset was used to train and validate the network,
while the remaining 15% was used to test its perfor-
mance. These two folders did not contain mammo-
grams from the same patient, meaning that the per-
formance of the system was not tested using images
from patients that were previously used in the training
step.
In this study, TensorFlow
2
together with Keras
3
have been used to design, train and test the network.
3 RESULTS AND DISCUSSION
In this section, the results obtained after testing the
trained network with the images from the dataset are
presented.
In order to evaluate the results, the accuracy was
defined using the following equation:
Accuracy = 100 ×
T P + T N
T P + T N + FP + FN
(1)
Where TP means true positives, TN means true
negatives, FP means false positives, and FN means
false negatives.
Every time that the training step used the whole
amount of images that were selected for training the
network as input, the system performed a test over
the a validation set. After training the network until
the loss value was minimum, the classifier achieved a
mean accuracy of 97.375% over the two classes that
has been studied in this work: benign and malignant
tumor.
After obtaining these results, the output images
from the network were analyzed in order to see the
bounding boxes that the system proposed over the
original images. Fig. 4 shows the results of our recog-
nition system in terms of precision when detecting tu-
mors inside the samples from the dataset that were
considered for testing it. Images in the left part of Fig.
4 correspond to the output of our system, where the
bounding boxes are marked in blue; and, in the right
part, their corresponding mark images are presented,
indicating where tumors are located, considering that
as ground truth.
Most of the tumors inside the set of images were
not located exactly in the same area as the ground
truth, which could be caused by some factors. First
of all, the amount of images that were considered for
training the network was very low, which could lead
to the fact that the network has not learned how to de-
tect tumors properly. Also, training the network takes
several days, which didn’t let us experiment enough
2
https://www.tensorflow.org/?hl=es
3
https://keras.io/
NCTA 2019 - 11th International Conference on Neural Computation Theory and Applications
446
Figure 3: Faster R-CNN architecture.
A
B
C
Figure 4: Results of the network performance. At the
left, mammograms with the predicted tumors and its cor-
responding confidence values (benign tumor shown in blue,
malignant tumors shown in green). At the right, ground
truth images indicating where the tumors are located. A
and B were correctly predicted whereas C was not.
to optimize the hyperparameters for this task. Finally,
only the ResNet50 model was used, leaving the door
open for many other different architectures.
Other approaches as Ayelet Akselrod-Ballin et al.
(Akselrod-Ballin et al., 2016) developed a modified
algorithm based on Faster R-CNN whose purpose is
to detect and classify the major clinical classes in
breast cancer, malignant and benign tumors. As op-
posed to our implementation, that architecture uses
VGG16 as CNN for the extraction of features and pro-
posal of regions of interest. This system uses multi-
center clinical dataset with a total of 4750 images.
After training their modified network, they obtained
a 77% accuracy.
The preliminary results that are presented in this
work already prove that, if we take into consideration
the drawbacks that were mentioned before, the system
is already able to classify if the mammogram has a
benign or a malignant tumor with an accuracy that is
higher than 97%.
4 CONCLUSIONS
In this study, a computerized-aided diagnosis method
based on Faster R-CNN used for detecting and clas-
sifying tumors in mammograms is presented. Firstly,
the images to train the network were obtained from
the public database DDSM, which were preprocessed
in order to improve the results. This preprocessing
step consisted of an enhancement of the images by
increasing the contrast with the CLAHE technique,
a noise reduction with the median filter and a resize
and normalization processes. Then, the network was
trained and validated with the 85% of the prepro-
cessed dataset, which extracts features over the mam-
mogram images, proposes regions of interest where
tumors could be located, and, finally classifies each
Breast Cancer Automatic Diagnosis System using Faster Regional Convolutional Neural Networks
447
of them based on the probability that they contain a
tumor. Using the detection metrics, the performance
of the network was measured with the remaining 15%
of the dataset in order to evaluate its robustness. After
training the network, the results show that this pro-
posed computer-aided diagnosis method achieved a
mean accuracy of 97.375% proving that the system
could aid specialized doctors to recognize cancerous
signs when analyzing mammograms, improving pa-
tients’ quality of life.
In future works, the authors will study different
CNNs models and also other network architectures
like Mask R-CNNs instead of Faster R-CNNs in order
to not only locate the tumor inside the mammogram,
but also to create a mask with its shape. This way, the
size of the tumor can be estimated more precisely and
taken into account in the decision making task.
ACKNOWLEDGEMENTS
This work was supported by the excellence project
from the Spanish government grant (with support
from the European Regional Development Fund)
COFNET (TEC2016-77785-P).
REFERENCES
Akselrod-Ballin, A., Karlinsky, L., Alpert, S., Hasoul, S.,
Ben-Ari, R., and Barkan, E. (2016). A region based
convolutional network for tumor detection and classi-
fication in breast mammography. In Deep Learning
and Data Labeling for Medical Applications, pages
197–205. Springer.
Bray, F., Ferlay, J., Soerjomataram, I., Siegel, R. L., Torre,
L. A., and Jemal, A. (2018). Global cancer statistics
2018: Globocan estimates of incidence and mortality
worldwide for 36 cancers in 185 countries. CA: a can-
cer journal for clinicians, 68(6):394–424.
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep resid-
ual learning for image recognition. In Proceedings of
the IEEE conference on computer vision and pattern
recognition, pages 770–778.
Heath, M., Bowyer, K., Kopans, D., Kegelmeyer, P., Moore,
R., Chang, K., and Munishkumaran, S. (1998). Cur-
rent status of the digital database for screening mam-
mography. In Digital mammography, pages 457–460.
Springer.
Heath, M., Bowyer, K., Kopans, D., Moore, R., and
Kegelmeyer, W. P. (2000). The digital database
for screening mammography. In Proceedings of the
5th international workshop on digital mammography,
pages 212–218. Medical Physics Publishing.
Hoang Ngan Le, T., Zheng, Y., Zhu, C., Luu, K., and Sav-
vides, M. (2016). Multiple scale faster-rcnn approach
to driver’s cell-phone usage and hands on steering
wheel detection. In Proceedings of the IEEE Con-
ference on Computer Vision and Pattern Recognition
Workshops, pages 46–53.
Jiang, H. and Learned-Miller, E. (2017). Face detection
with the faster r-cnn. In Automatic Face & Gesture
Recognition (FG 2017), 2017 12th IEEE International
Conference on, pages 650–657. IEEE.
Maitra, I. K., Nag, S., and Bandyopadhyay, S. K. (2012).
Technique for preprocessing of digital mammogram.
Computer methods and programs in biomedicine,
107(2):175–188.
Ponraj, D. N., Jenifer, M. E., Poongodi, P., and Manoha-
ran, J. S. (2011). A survey on the preprocessing tech-
niques of mammogram for the detection of breast can-
cer. Journal of Emerging Trends in Computing and
Information Sciences, 2(12):656–664.
Ren, S., He, K., Girshick, R., and Sun, J. (2015). Faster
r-cnn: Towards real-time object detection with region
proposal networks. In Advances in neural information
processing systems, pages 91–99.
Zhang, Q.-x., Lin, G.-h., Zhang, Y.-m., Xu, G., and Wang,
J.-j. (2018). Wildland forest fire smoke detection
based on faster r-cnn using synthetic smoke images.
Procedia engineering, 211:441–446.
NCTA 2019 - 11th International Conference on Neural Computation Theory and Applications
448