Polyp Detection in Gastrointestinal Images using Faster Regional

Convolutional Neural Network

Lourdes Duran-Lopez, Francisco Luna-Perejon, Isabel Amaya-Rodriguez, Javier Civit-Masot,

Anton Civit-Balcells, Saturnino Vicente-Diaz and Alejandro Linares-Barranco

Robotics and Computer Technology Lab., University of Seville, Av. Reina Mercedes s/n, Seville, Spain

Keywords:

Polyp, Colonoscopy, Deep Learning, Image Analysis, Faster Regional Convolutional Neural Network.

Abstract:

Colorectal cancer is the third most frequently diagnosed malignancy in the world. To prevent this disease,

polyps, the principal precursor, are removed during a colonoscopy. Automatic detection of polyps in this

technique could play an important role to assist doctors for achieving an accurate diagnosis. In this work,

we apply a state-of-the-art Deep Learning algorithm called Faster Regional Convolutional Neural Network to

each colonoscopy frame in order to detect the presence of polyps. The proposed detection system contains

two main stages: (1) processing of the colonoscopy frames for training and testing datasets generation, where

artifacts are extracted and the number of images in the dataset is augmented; and (2) the Neural Network

model, which performs feature extraction over the frames in order to detect polyps within the frames. After

training the algorithm under different conditions, our result shows that the proposed system detection has a

precision of 80.31%, a recall of 75.37%, an accuracy of 71.99% and a speciﬁcity of 65.70%.

1 INTRODUCTION

Colorectal cancer (CRC) is the third most frequently

diagnosed malignancy and the fourth leading cause

of cancer-related deaths in the world. Its burden is

expected to increase by 60% to more than 2.2 million

new cases and 1.1 million deaths from cancer by 2030

(Arnold et al., 2017). Certain types of polyps that

grow in the colon are the precursors of more than 95%

of cancer cases in this organ. Thus, the removal of

polyps is usually practiced as a safety measure.

Polyps are detected and extracted during a colo-

noscopy, which is a medical analysis in which the

colonoscopist examines the meter and a half length

tube of the large intestine with the use of an colo-

noscope. This technique is a commonly used primary

method to detect polyps (Lieberman et al., 2012). Ho-

wever, by using this technique, the polyp detection is

not always effective (Corley et al., 2014), as there are

some factors that make the correct evaluation difﬁ-

cult. These factors may be related to the experience of

the physician, the conditions of the patient, the qua-

lity of the instruments, among others. Highly expe-

rienced colonoscopists can achieve an accuracy in the

detection that is greater than 90%, while colonosco-

pists with less experience cannot perform better than

an 80% in the recognition rate (Hewett et al., 2012)

(Ignjatovic et al., 2009). On the other hand, there may

be other external factors, such as the ambiguity in the

characteristics of the polyp. Polyps could be difﬁcult

and confusing to diagnose depending on the characte-

ristics shown when performing the colonoscopy (He-

wett et al., 2012). An automated computerized diag-

nosis could provide a cross validation mechanism in

order to assist colonoscopists to make a decision in

this task and, thus, reducing the possibility of com-

mitting mistakes in the diagnosis and improving the

quality of life of the affected patients. This technique

could also be used for training purpose of new colo-

noscopists at the university or medical centers.

In the past few years, several researches investiga-

ted the problem of detecting polyps in colonoscopies

(Bernal et al., 2017), (Zhang et al., 2017), (Ribeiro

et al., 2016), (Axyonov et al., 2016), (Park and Sar-

gent, 2016). Many of them used conventional Con-

volutional Neural Networks (CNNs) to classify fra-

mes obtained from videos, indicating the presence of

a polyp. However, none of these solutions are able to

locate the polyp inside the image, which could be very

useful when making a colonoscopy to aid the experts

in their task.

In this paper, we report our study of an automa-

ted computerized system for detecting polyps in co-

lonoscopy videos using a Deep Learning based al-

626

Duran-Lopez, L., Luna-Perejon, F., Amaya-Rodriguez, I., Civit-Masot, J., Civit-Balcells, A., Vicente-Diaz, S. and Linares-Barranco, A.

Polyp Detection in Gastrointestinal Images using Faster Regional Convolutional Neural Network.

DOI: 10.5220/0007698406260631

In Proceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2019), pages 626-631

ISBN: 978-989-758-354-4

gorithm called Regional Convolutional Neural Net-

works (R-CNN). These methods try to determine the

presence of an object and also locate its position in-

side the image. One of the most recent algorithms

based on the R-CNN approach are called Faster Regi-

onal Convolutional Neural Network (Faster R-CNN),

which were developed in 2015 (Ren et al., 2015). Fas-

ter R-CNNs were designed with the purpose of redu-

cing its predecessors computation time. In ILSVRC

and COCO 2015 competitions, Faster R-CNN was the

foundations of the 1st-place winning entries in several

tracks.

Faster R-CNNs have been used for different pur-

poses: face detection (Jiang and Learned-Miller,

2017), driver’s cell-phone usage and hands on steer-

ing wheel detection (Hoang Ngan Le et al., 2016) are

some application examples of this algorithm, which

has proven to show good results.

In this paper, the authors present an application

of Faster R-CNNs in order to build an automated re-

cognition system to detect the presence of polyps in

colonoscopy images.

The rest of the paper is structured as follows:

section 2 presents the methods that are used in the

approach that has been carried out in this paper. First,

the dataset used is explained from its acquisition to

its preprocessing and augmentation. Then, the Fas-

ter R-CNN algorithm is presented, along with the ar-

chitecture, design, training and testing of the neural

network that has been used in this work, which is ba-

sed on those algorithms. Also, to evaluate the system,

some tests to prove its performance and robustness are

performed. Then, Section 3 presents the classiﬁcation

results, and Section 4 a discussion over the obtained

values. Finally, the conclusions of this work are pre-

sented in Section 5.

2 MATERIALS AND METHODS

In this section the main materials and methods used

in this work are presented, distinguishing between the

dataset and the neural network. For the dataset, the

applied stages are: image acquisition, pre-processing

and data augmentation. For the neural network sub-

section, the architecture of the Faster R-CNN, the trai-

ning and testing steps for polyp detection in colo-

noscopy images and the performance evaluation are

presented.

A general view of our study can be seen in the

block diagram that is shown in (Fig. 1).

Figure 1: Block diagram of the implemented approach.

2.1 Dataset

2.1.1 Image Acquisition

We used the database provided by Endoscopic Vi-

sion’s sub-challenge (MICCAI 2018

) called Gas-

trointestinal Image ANAlysis (GIANA

) (Angermann

et al., 2017) (Bernal et al., 2018), which is composed

of 18 videos collected from colonoscopies from diffe-

rent patients. These videos are already provided as a

segmented set of frames. The dataset contains a total

of 11954 images with their corresponding masks in-

dicating where the polyp is located (only in the case

that there was one in that speciﬁc image) (Fig. 2). All

images have a resolution of 384x288 pixels.

2.1.2 Artifact Extraction

In order to enhance the performance of the system, we

applied a pre-processing step to the images to make

the network be able to extract descriptors more ap-

propriately and, therefore, improve the results of the

detection (Fig. 3). For this, we extracted the black ed-

ges of the endoscopy frames to remove the unwanted

areas, reducing the images to a size of 284x265.

https://www.miccai2018.org/en/

https://giana.grand-challenge.org/

Polyp Detection in Gastrointestinal Images using Faster Regional Convolutional Neural Network

627

Figure 2: Samples of the database. Left: frames captu-

red from the colonoscopy video recordings. Right: binary

masks indicating if there is a polyp (marked as a white area)

and its location inside the frame.

2.1.3 Data Augmentation

Deep learning models perform better when training

with large datasets that include more variability in the

samples. The most popular method to make data-

sets bigger is called data augmentation, which helps

prevent overﬁtting when training on very little data

(Wong et al., 2016). The simplest way to perform

data augmentation is to add noise or apply geometric

transformations to data to simulate other conditions

of camera recordings with the same objects (polyps in

our case). Hence, all the transformations would boost

the models to learn better.

The dataset was augmented using a series of

transformations so that the model would never train

twice the exact same image. For each original pre-

processed image, an horizontal ﬂip, a vertical ﬂip and

a blur ﬁlter have been applied. Thus, we obtain three

new images from each original sample (Fig. 3). After

this data augmentation step, we obtain a dataset that

consists of 47.816 images in total.

2.2 Faster R-CNN

2.2.1 Arquitecture

As it was mentioned in the introduction, a Faster

RCNN was used in this work for the polyp detection

task. This algorithm is divided in two modules (Fig.

4):

– First of all, a deep Fully Convolutional Network

(FCN) (Ren et al., 2015) receives the images from

the dataset that was presented in this section as in-

put. Then, it extracts feature maps or descriptive cha-

racteristics and analyzes them to propose regions of

Figure 3: Processing applied to the original images. First,

black edges are removed in a pre-processing step. Then

data augmentation is applied, generating three different new

images.

interest. The novel step that this architecture intro-

duces is the way to determine the regions of interest,

by using a neural network that takes advantage of the

mathematical operations made in the convolution lay-

ers. In our architecture we have used the ResNet50

model (He et al., 2016) as FCN. ResNet models try to

solve the saturation of the accuracy caused by increa-

sing the network depth.

– Secondly, the proposed regions are the input

of the second module, called Fast RCNN detector,

composed by two fully-connected layers, a regression

layer and a classiﬁcation layer (Ren et al., 2015).

2.2.2 Training and Testing

In order to reduce the number of false positives, a

technique called hard-negative mining was used (Fel-

zenszwalb et al., 2010). It consists on adding negative

samples, which means, including examples of images

that do not contain polyps in the training step, labe-

ling them as background.

85% of the augmented dataset (15 videos) was

used to train the network, while the remaining 15%

(3 videos) for evaluating its accuracy and robustness

in the detection. These training and testing dataset

are obtained in such way that different patients are in-

volve in each dataset.

In this work, both TensorFlow

and Keras

, which

https://www.tensorﬂow.org/?hl=es

https://keras.io/

GIANA 2019 - Special Session on GastroIntestinal Image Analysis

628

Figure 4: Faster R-CNN architecture. Image taken from

(Ren et al., 2015).

are two widely known Deep Learning frameworks,

are used to design, train and test the network.

2.2.3 Performance Evaluation Measures

To quantitatively present the capabilities of the pro-

posed computerized-aided diagnosis method, perfor-

mance metrics were used to show how the implemen-

ted technique was able to detect and classify the po-

lyps. These are precision (Equation 1), recall (Equa-

tion 2), accuracy (Equation 3), speciﬁcity (Equation

4), F1-score (Equation 5) and F2-score (Equation 6).

Precision = 100 ×

T P

T P + FP

(1)

Recall = 100 ×

T P

T P + FN

(2)

Accuracy = 100 ×

T P + T N

T P + T N + FP + FN

(3)

Speci f icity = 100 ×

T N

T N + FP

(4)

F1 − score = 2 ×

Precision × Recall

Precision + Recall

(5)

F2 − score = 5 ×

Precision × Recall

4 × Precision + Recall

(6)

where, TP and FP denote true positive cases (the sy-

stem detects a polyp in a frame that contains a polyp)

and false positive cases (the system detects a polyp in

a frame that does not have a polyp), respectively. TN

and FN denote true negative cases (the system does

not detect a polyp in a frame that does not have a

polyp) and false negative cases (the system does not

detect a polyp in a frame that contains a polyp), re-

spectively.

These metrics was applied to the classiﬁcation

layer of this method. Since this layer returns as out-

put a conﬁdence value, a minimal threshold was es-

tablished to apply these metrics. On this way, if the

conﬁdence value exceeds the threshold then the polyp

is detected.

3 RESULTS

Different experiments were carried out to determine

if polyps were detected correctly or not. Tests were

performed every 50 epochs, selecting different conﬁ-

dence thresholds in order to obtain the best results.

Polyps detection performance is reported in Table

1. The results show the robustness of the propo-

sed Faster R-CNN architecture on detecting the polyp

position in colonoscopy images with a precision of

80.31%, a recall of 75.37%, an accuracy of 71.99%

and a speciﬁcity of 65.70%. The minimal threshold

was established at 0.80.

In Fig. 5 the results of our recognition system can

be seen by showing the precision when detecting po-

lyps inside samples from the dataset, and their corre-

sponding mask images (as a ground truth) indicating

where the polyps are located.

4 DISCUSSION

In this work, we have performed a polyp detection

system using Faster R-CNN. This kind of networks

have had a very high impact in recognition systems

in recent years thanks to the fact that they are able

to both classify objects or items inside images and

also provide the speciﬁc location in which that item

is. The application of this kind of networks in medi-

cal image analysis is becoming more and more usual.

In this case, this work presents, what we believe it is,

the ﬁrst application of Faster R-CNNs to polyp de-

tection in videos obtained from colonoscopies. As it

can be seen, we have focused on the polyp detection

task but, as we mentioned earlier, Faster R-CNNs can

also provide the location of polyps inside images. The

results obtained show that there is still room to im-

prove and we will continue to pursue the goal of im-

plementing an automatic polyp detection system that

Polyp Detection in Gastrointestinal Images using Faster Regional Convolutional Neural Network

629

Table 1: Polyps detection performance. TP: True Positive, FP: False Positive, TN: True Negative, FN: False Negative.

TP FP TN FN Precision Recall Accuracy Speciﬁcity F1 F2

3533 866 1659 1154 80.31 75.37 71.99 65.70 77.76 76.30

Figure 5: Polyp detection results. Left: polyps detected by

Faster R-CNN. Conﬁdence values are represented in blue.

Right: their corresponding ground truth. A and B show the

performance in case a polyp appears, while C shows the

performance in case there is no polyp.

could be used in hospitals as an aid for colonosco-

pists in their task. This future work will include the

polyp location task on top of the already performed

detection system by adding a new analysis layer and

a set of metrics that would improve the overall recog-

nition system.

For now, we have achieved ﬁrst results using this

aforementioned Deep Learning technique. Desig-

ning a new and more appropriate CNN model instead

of using the well-known ResNet50, along with ﬁne-

tuning the hyperparameters of it could improve the

classiﬁcation results not only in terms of accuracy but

also in terms of speciﬁcity and sensitivity. Also, ot-

her methods for pre-processing the data and increa-

sing the variability of the dataset will make the recog-

nition system more robust and plausible to use in a

real case scenario. These two approaches will be stu-

died in future works in which we will try to improve

state-of-the-art solutions and also the accuracy achie-

ved by colonoscopist in order to build a system that

could work together with them and aid them in their

recognition task.

5 CONCLUSIONS

In this paper, we present the use of Faster R-CNN as

a mechanism for a computarized-aided diagnosis sy-

stem that automatically detects polyps in colonosco-

pies in order to assist doctors in this task. The images

were obtained from GIANA Endoscopic Vision’s sub-

challenge to create the dataset. These samples from

the dataset were ﬁrst pre-processed, removing the un-

wanted areas and, after that, increased in number by

using data augmentation techniques. 85% of dataset

was used to train the network, which, on a ﬁrst step,

extracts features over the colonoscopy frames; then,

it proposes regions of interests where polyps could

be located; and ﬁnally, it classiﬁes the regions most

likely to contain a polyp. The remaining 15% of data-

set was used to test the network in order to evaluate its

performance. The results show that the proposed re-

cognition system has a precision of 80.31%, a recall

of 75.37%, an accuracy of 71.99% and a speciﬁcity

of 65.70%. The system is able to detect the presence

of polyps in the images studied in this paper and the

results prove that the system could be used as an aid

for doctors in this task. In future works, the authors

will study and evaluate different custom CNN models

in order to improve the performance, classiﬁcation re-

sults and also the polyp location task, along with new

pre-processing algorithms to improve the variability

of the dataset and the feature extraction step.

ACKNOWLEDGEMENTS

This work was supported by the Spanish grant (with

support from the European Regional Development

Fund) COFNET (TEC2016-77785-P).

REFERENCES

Angermann, Q., Bernal, J., S

anchez-Montes, C., Ham-

mami, M., Fern

andez-Esparrach, G., Dray, X., Ro-

main, O., S

anchez, F. J., and Histace, A. (2017). To-

wards real-time polyp detection in colonoscopy vi-

deos: Adapting still frame-based methodologies for

video sequences analysis. In Computer Assisted and

Robotic Endoscopy and Clinical Image-Based Proce-

dures, pages 29–41. Springer.

Arnold, M., Sierra, M. S., Laversanne, M., Soerjomataram,

I., Jemal, A., and Bray, F. (2017). Global patterns

GIANA 2019 - Special Session on GastroIntestinal Image Analysis

630

and trends in colorectal cancer incidence and morta-

lity. Gut, 66(4):683–691.

Axyonov, S., Liang, J., Kostin, K., Zamyatin, A. V., et al.

(2016). Advanced pattern recognition and deep lear-

ning for colon polyp detection.

Bernal, J., Histace, A., Masana, M., Angermann, Q.,

anchez-Montes, C., Rodriguez, C., Hammami, M.,

Garcia-Rodriguez, A., C

ordova, H., Romain, O., et al.

(2018). Polyp detection benchmark in colonoscopy vi-

deos using gtcreator: A novel fully conﬁgurable tool

for easy and fast annotation of image databases. In

Proceedings of 32nd CARS conference.

Bernal, J., Tajkbaksh, N., S

anchez, F. J., Matuszewski, B. J.,

Chen, H., Yu, L., Angermann, Q., Romain, O., Ru-

stad, B., Balasingham, I., et al. (2017). Comparative

validation of polyp detection methods in video colo-

noscopy: results from the miccai 2015 endoscopic vi-

sion challenge. IEEE transactions on medical ima-

ging, 36(6):1231–1249.

Corley, D. A., Jensen, C. D., Marks, A. R., Zhao, W. K.,

Lee, J. K., Doubeni, C. A., Zauber, A. G., de Boer,

J., Fireman, B. H., Schottinger, J. E., et al. (2014).

Adenoma detection rate and risk of colorectal can-

cer and death. New england journal of medicine,

370(14):1298–1306.

Felzenszwalb, P. F., Girshick, R. B., McAllester, D., and

Ramanan, D. (2010). Object detection with discri-

minatively trained part-based models. IEEE tran-

sactions on pattern analysis and machine intelligence,

32(9):1627–1645.

He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep resi-

dual learning for image recognition. In Proceedings of

the IEEE conference on computer vision and pattern

recognition, pages 770–778.

Hewett, D. G., Kaltenbach, T., Sano, Y., Tanaka, S., Saun-

ders, B. P., Ponchon, T., Soetikno, R., and Rex,

D. K. (2012). Validation of a simple classiﬁcation sy-

stem for endoscopic diagnosis of small colorectal po-

lyps using narrow-band imaging. Gastroenterology,

143(3):599–607.

Hoang Ngan Le, T., Zheng, Y., Zhu, C., Luu, K., and Savvi-

des, M. (2016). Multiple scale faster-rcnn approach to

driver’s cell-phone usage and hands on steering wheel

detection. In Proceedings of the IEEE Conference on

Computer Vision and Pattern Recognition Workshops,

pages 46–53.

Ignjatovic, A., East, J. E., Suzuki, N., Vance, M., Guent-

her, T., and Saunders, B. P. (2009). Optical diagno-

sis of small colorectal polyps at routine colonoscopy

(detect inspect characterise resect and discard; discard

trial): a prospective cohort study. The lancet oncology,

10(12):1171–1178.

Jiang, H. and Learned-Miller, E. (2017). Face detection

with the faster r-cnn. In Automatic Face & Gesture

Recognition (FG 2017), 2017 12th IEEE International

Conference on, pages 650–657. IEEE.

Lieberman, D. A., Rex, D. K., Winawer, S. J., Giardiello,

F. M., Johnson, D. A., and Levin, T. R. (2012). Gui-

delines for colonoscopy surveillance after screening

and polypectomy: a consensus update by the us multi-

society task force on colorectal cancer. Gastroentero-

logy, 143(3):844–857.

Park, S. Y. and Sargent, D. (2016). Colonoscopic polyp

detection using convolutional neural networks. In

Medical Imaging 2016: Computer-Aided Diagnosis,

volume 9785, page 978528. International Society for

Optics and Photonics.

Ren, S., He, K., Girshick, R., and Sun, J. (2015). Faster

r-cnn: Towards real-time object detection with region

proposal networks. In Advances in neural information

processing systems, pages 91–99.

Ribeiro, E., Uhl, A., Wimmer, G., and H

afner, M. (2016).

Exploring deep learning and transfer learning for co-

lonic polyp classiﬁcation. Computational and mathe-

matical methods in medicine, 2016.

Wong, S. C., Gatt, A., Stamatescu, V., and McDonnell,

M. D. (2016). Understanding data augmentation

for classiﬁcation: when to warp? arXiv preprint

arXiv:1609.08764.

Zhang, R., Zheng, Y., Mak, T. W. C., Yu, R., Wong, S. H.,

Lau, J. Y., and Poon, C. C. (2017). Automatic de-

tection and classiﬁcation of colorectal polyps by trans-

ferring low-level cnn features from nonmedical dom-

ain. IEEE J. Biomedical and Health Informatics,

21(1):41–47.

Polyp Detection in Gastrointestinal Images using Faster Regional Convolutional Neural Network

631