Towards Accurate Cervical Cancer Detection: Leveraging Two-Stage

CNNs for Pap Smear Analysis

Franklin Steven De la Cruz Paucar

1 a

, Carlos Julio Macancela Bojorque

1 b

, Iv

an Reyes-Chac

2 c

Paulina Vizcaino-Imaca

2 d

and Manuel Eugenio Morocho-Cayamcela

1,2 e

Yachay Tech University, School of Mathematical and Computational Sciences, DeepARC Research Group,

Hda. San Jos

e s/n y Proyecto Yachay, Urcuqu

ı, 100119, Ecuador

Universidad Internacional del Ecuador, Faculty of Technical Sciences, School of Computer Science,

Quito, 170411, Ecuador

Keywords:

Convolutional Neuronal Network, Cervical Cancer, Pap Smear Test, Dataset, Computer Vision.

Abstract:

Cervical cancer is a type of cancer that occurs in the cervix. It is caused by the abnormal growth of cells in the

cervix and is often caused by the human papillomavirus. Symptoms can include abnormal vaginal bleeding,

and pelvic pain, among others. It is usually diagnosed with a pelvic exam, biopsy, and Papanicolaou or pap test.

Generally, during the test, a small sample of cells is taken from the cervix and examined under a microscope to

look for any abnormal cells. The test is usually done during a pelvic exam and can be done in a doctor’s ofﬁce

or clinic, which can cause human errors to exist and lead to a deﬁcit of service or misdiagnosis for patients.

Especially, in Ecuador, cervix cancer is the second with the most prominent incidence and mortality. One of

the obstacles in Latin America to improving the number of cervix cancer screens is the amount of time needed

to give results. This paper proposes a pre-trained artiﬁcial neural network and a much larger database than

its paper base, this will allow us to obtain better results and a network with more accurate predictions when

throwing where malignant cells could be located that could lead to cervical cancer. The process to carry it

out is similar to its original process, where the analysis of the Papanicolaou tests is carried out in two stages.

The ﬁrst focused on ﬁnding the coordinates of the anomalous cells observed within each of the images of our

dataset and the second, specializing in being able to obtain an image with a much higher resolution for each of

these coordinates, thus obtaining an improvement and being able to give a much more reliable diagnosis for

each of the patients.

1 INTRODUCTION

Detecting cancer is the process of identifying the

presence of cancer cells in a person’s body. This can

be done through various methods including physical

exams, imaging tests, biopsy, and blood tests. Each

method will depend on the type of cancer that needs

to be recognized and the area in the body where it is

suspected to be (Hasenleithner and Speicher, 2022).

Since the different ways to detection of cancer, the in-

clusion of modern technology such as computer vi-

sion algorithms can quickly and accurately analyze

large amounts of medical images, reducing the need

https://orcid.org/0000-0002-0429-5219

https://orcid.org/0000-0002-8265-3261

https://orcid.org/0009-0002-2731-5531

https://orcid.org/0000-0001-9575-3539

https://orcid.org/0000-0002-4705-7923

for human interpretation and increasing the speed of

giving a diagnosis (Koh et al., 2022). Object detec-

tion and image classiﬁcation are two common tech-

niques used in computer vision for detecting cancer

cells in medical images such as microscopy images

and medical scans (Rom

an et al., 2023). Object de-

tection algorithms aim to locate instances of objects

within an image and draw bounding boxes around

them. In medical imaging, object detection can be

used to identify and localize cancer cells within a tis-

sue sample (Elyan et al., 2022). Image classiﬁcation,

on the other hand, aims to assign an input image to a

speciﬁc class or category, such as “normal” or “can-

cerous”. In medical imaging, image classiﬁcation can

be used to classify a patch of an image as cancerous

or non-cancerous (Elyan et al., 2022). In this case,

we will focus on cervical cancer by using two steps, a

low-resolution scanning for quick cell detection and

Paucar, F., Bojorque, C., Reyes-Chacón, I., Vizcaino-Imacaña, P. and Morocho-Cayamcela, M.

Towards Accurate Cervical Cancer Detection: Leveraging Two-Stage CNNs for Pap Smear Analysis.

DOI: 10.5220/0012735800003753

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 19th International Conference on Software Technologies (ICSOFT 2024), pages 219-227

ISBN: 978-989-758-706-1; ISSN: 2184-2833

219

location and a second high-resolution one, for de-

tailed classiﬁcation. Cervical cancer is challenging

to detect for several reasons. First, the human papil-

lomavirus (HPV) is the main cause of cervical cancer

and is highly contagious and spreads easily (Franco

et al., 2001). Also, many people infected with HPV

do not have symptoms, making early detection of can-

cer difﬁcult. In addition, HPV can cause different

types of cancer, which can present differently and re-

quire different approaches to detection and treatment.

Finally, current screening methods have limitations in

terms of accuracy and availability, making early and

effective detection of cervical cancer difﬁcult.

On the other hand, given that it is a very challeng-

ing and critical issue in the health ﬁeld, the World

Health Organization (WHO) in 2020 proposed a sig-

niﬁcant initiative to eradicate cervical cancer globally.

This is based on the three key pillars of HPV vac-

cination, cervical screening, and treatment, and also

has associated intervention targets for the year 2030

(Davies-Oliveira et al., 2021). This initiative is based

on a goal called “90-70-90”, which refers to the fact

that 90% of adolescent girls should receive the HPV

vaccine, 70% of adult women should have at least two

cervical HPV tests in their lifetime, and 90% of them

should receive appropriate treatment if they have an

HPV-related disease, either pre-invasive or invasive.

In 2009, a survey of 81 women in both urban

and rural areas of Ecuador revealed dissatisfaction

with the lengthy wait for test results. The delays

were caused by insufﬁcient equipment and the lim-

ited capacity of trained professionals, especially in

rural areas (Strasser-Weippl et al., 2015). Addition-

ally, nearly half of women in Ecuador over the age of

18 have never undergone a cervical cancer screening

(ins, 2018). Hence, the analysis of cervical screening

can only be performed by three specialists: anatomic

pathologists, cytologists, and clinical pathologists.

This limited number of specialties for evaluating pap

smear tests increases the difﬁculty of evaluating more

tests. According to a 2021 WHO report, in Ecuador

showed that the number of medical personnel per

10,000 cancer patients in Ecuador was as follows in

2019: 0 radiation oncologists, 3 medical physicists,

154 radiologists, and 4 nuclear medicine physicians

(wor, 2021). The lack of equipment, trained profes-

sionals, and the large amount of tests that are per-

formed and have to be analyzed are problems that do

not allow for achieving the goals proposed by WHO.

In this paper, we propose a pre-trained artiﬁcial

neural network and a much larger database than its

paper base, this will allow us to obtain better results

and a network with more accurate predictions when

throwing where malignant cells could be located that

could lead to cervical cancer. The process to carry it

out is similar to its original process, where the anal-

ysis of the Papanicolaou tests is carried out in two

stages. The ﬁrst focused on ﬁnding the coordinates

of the anomalous cells observed within each of the

images of our dataset and the second, specializing in

being able to obtain an image with a much higher res-

olution for each of these coordinates, thus getting an

improvement and being able to give a much more re-

liable diagnosis for each of the patients.

2 RELATED WORKS

Because of the improvement in the area for taking cy-

tological images. It is possible to develop computer-

aided cancer diagnosis methods. One of the ﬁrst ones

is from Yamal et al. started the work related to the de-

tection of cervical cancer by using hierarchical data.

The logistic regression algorithm is used to distin-

guish cervical cancer at the cellular level and use an

add Hoc approach to classify it at the patient level.

The dataset was about 1728 patients obtaining a 61%

and speciﬁcity of 89% on the independent dataset

(Yamal et al., 2015).

Having this ﬁrst algorithm used, there is also an-

other from Su et al. that proposes a two-level cascade

integration system to classify the cervical cells into

normal and abnormal, and logistic regression (LR)

was used individually as the work presented before.

However, by the integration of a two-level cascade,

the false positives were lower than in the traditional

pap smear review. Obtaining the recognition rate of

95.6 % (Su et al., 2016).

Other traditional machine learning methods were

applied. In this case, Kurniawati et al. propose differ-

ent methods for cervical cancer prediction using a pap

smear, such as Na

ıve Bayes, support vector machines,

and random forest. The data used were obtained from

the medical records of the pap smear test results. The

performance metric used is accuracy. The best accu-

racy of 80.18% was obtained by random forest (Kur-

niawati et al., 2016).

Alsalatie et al. focus on cervical cancer screening

and the challenges that traditional screening methods

approach face. Alsalatie mentions the importance of

computer-assisted diagnosis to improve the accuracy

of cervical cancer screening. According to the paper,

conventional screening methods take into account the

knowledge of a pathologist which can result in mis-

diagnosis and low diagnostic effectiveness. In addi-

tion, the article proposes a deep learning model de-

signed for the automatic diagnosis of whole-slide im-

ages (WSI) in cervical cancer samples. The proposed

ICSOFT 2024 - 19th International Conference on Software Technologies

220

network has a high accuracy rate of up to 99.6%, and

considers the entire staining slice image, not just a

single cell. The deep learning architecture considers

overlapping and non-overlapping cervical cells in the

WSI. Finally, they mention that the work is distinct

from existing research in terms of simplicity, accu-

racy, and speed (Alsalatie et al., 2022).

Hussain et al. discuss the importance of using

clinical data in AI systems for automated disease di-

agnosis, prediction, or classiﬁcation. The article re-

marks the importance of a publicly available bench-

mark dataset, and that the hospital data collected from

a clinical setup is also important. The data must

be frequently updated to guarantee that developed

AI systems are as accurate as possible. Thus, Hus-

sain provides a liquid-based cytology (LBC) reposi-

tory with images collected from 460 patients visiting

a public hospital’s O&G department. The repository

consists of 963 LBC images split into four classes,

high squamous intra-epithelial lesion, low squamous

intra-epithelial lesion, negative for intra-epithelial

malignancy, and squamous cell carcinoma represent-

ing pre-cancerous and cancerous lesions. The images

were taken with a Leica ICC50 HD microscope and

categorized by experts from the pathology department

(Hussain et al., 2020).

Rezende et al. treat the difﬁculties faced in ac-

curately detecting cervical cancer through the con-

ventional pap smear test. Furthermore, the authors

mentioned the importance of computational tools to

support screening efﬁciently, especially during the

current health crisis. The article said that machine

learning can reduce the test limitations, but the lack

of high-quality datasets has break the development

of strategies to improve cervical cancer screening.

The Center for Recognition and Inspection of Cells

(CRIC) platform has created the CRIC Cervix col-

lection, which currently contains 400 images of con-

ventional pap smears with manual classiﬁcation of

11,534 cells. The dataset is a good beginning for

improving machine learning algorithms to automate

tasks in cytopathological analysis (Rezende et al.,

2021).

Zhang et al. explore the use of deep convolutional

neural networks for cervical cell classiﬁcation. The

authors describe how they used a deep convolutional

neural network to analyze cervical cell images and

classify them into relevant categories for cervical can-

cer detection. They propose a method to directly clas-

sify cervical cells based on deep features using con-

volutional neural networks (ConvNets), without prior

segmentation. Unlike previous methods that relied on

hand-crafted features and cytoplasm/nucleus segmen-

tation, this method automatically extracts deep fea-

tures embedded in the cell patch for classiﬁcation. It

involves extracting image patches centered roughly

on the nucleus as input to the network, transferring

features from a pre-trained model to a new ConvNet

to ﬁne-tune the cell image patches, and aggregating

multiple predictions to form the ﬁnal network output.

The results show that the DeepPap neural network is

effective in the task of cervical cell classiﬁcation and

this proposed method was evaluated on Papanicolaou

and LBC smear datasets. The results show that their

method outperforms previous algorithms in classiﬁ-

cation accuracy (98.3%) (Zhang et al., 2017).

Sompawong et al. This study investigates the

use of the neural network known as Mask Region-

Based convolutional neural network (R-CNN) for de-

tecting cervical cancer from Papanicolaou tests. The

researchers claim that this is the ﬁrst time that this

technology is used to identify and examine the nu-

cleus of cervical cells. Tissue samples obtained from

a hospital were used to train the algorithm. The aver-

age accuracy of the model was 57.8%, with an accu-

racy of 91.7%, sensitivity of 91.7%, and speciﬁcity of

91.7% per image. Additionally, this model was com-

pared to another implemented in (Zhang et al., 2017),

using the proposed data set, achieving an accuracy of

89.8%, sensitivity of 72.5%, and speciﬁcity of 94.3%.

Also, the authors discovered that the accuracy in im-

age classiﬁcation is high (91.7%). This is because if

an abnormal cell was found, the entire image was con-

sidered to be abnormal. The idea behind this was to

reduce the workload of the histopathologist and de-

crease the time needed for cell analysis. However,

when evaluating each cell with its corresponding nu-

cleus, the experimental results showed low accuracy

due to the dataset containing both typical and atypi-

cal cells, different from other studies reviewed in the

literature (Sompawong et al., 2019).

Ghoneim et al. suggest a system for detecting

and categorizing cervical cancer cells using convolu-

tional neural networks (CNNs) and Extreme Learn-

ing Machines (ELMs). They ﬁrst input cell images

into the CNNs model to extract important features and

then use the ELM-based classiﬁer to categorize the

images. The authors also examine alternative clas-

siﬁers using Multi-layer Perceptron (MLP) and Au-

toencoder (AE). The study was conducted using the

Herlev database, and the proposed system showed a

99.5% accuracy rate in detecting cancer (2-class) and

91.2% accuracy in classifying it (7-class) (Ghoneim

et al., 2020).

Other authors have propose a cervical cancer cell

detection and classiﬁcation system based on deep

convolutional neural networks (CNNs) (Macancela

et al., 2023). The cell images are fed into a CNN

Towards Accurate Cervical Cancer Detection: Leveraging Two-Stage CNNs for Pap Smear Analysis

221

model to extract deeply learned features. Then, an ex-

treme learning machine (ELM) based classiﬁer clas-

siﬁes the input images. The CNNs model is used via

transfer learning and ﬁne-tuning. Alternatives to the

ELM, multi-layer perceptron (MLP), and autoencoder

(AE) based classiﬁers are also investigated. Experi-

ments were performed using the Herlev database. The

proposed CNN-ELM-based system achieved 99.5%

accuracy in the detection problem (2 classes) and

91.2% in the classiﬁcation problem (7 classes) (Xia

et al., 2020).

3 SYSTEM MODEL AND

METHODOLOGY

3.1 Cell Recognition Model

3.1.1 Data Preparation

It is a common practice to split data into three sepa-

rate folders (train with 80% of data, test, and valida-

tion with the rest) when training a convolutional neu-

ral network. This helps prevent overﬁtting, which is

when the model performs well on training data but not

on new data. Overﬁtting is undesirable as it results in

a less accurate model. The dataset contains 2000 im-

ages, which have a size of 150 × 150 pixels, and they

are split in cells and no cells images, see Figure 1 and

Figure 2.

Figure 1: Examples of cell’s images.

Figure 2: Examples of no cell’s images.

3.1.2 Architecture

The ﬁrst stage of the fast scanning process uses a sim-

ple Convolutional Neural Network (CNN) to iden-

tify cell shapes and reduce the hardware resources

required for predictions. After experimentation, the

chosen architecture has four layers. The ﬁrst layer is

a 10x10 convolutional layer with 10 ﬁlters, receiving

an input of 20x20 pixels. The second layer is a 2x2

max-pooling layer, followed by a dense layer of 16

neurons. The ﬁrst and third layers use a ReLu acti-

vation function, while the output layer uses a softmax

activation function. We will refer to this architecture

as CNN 1 see Figure3.

Figure 3: Cell recognition model architecture.

3.2 Cell Classiﬁcation

For the second step, the cell classiﬁcation stage, a pre-

trained CNN was used. A Residual Network (ResNet)

capable of categorizing cells into the following cate-

gories is implemented:

• Negative for intraepithelial lesion (NIL).

• Low-grade squamous intraepithelial lesion (LSIL).

• High-grade squamous intraepithelial lesion (HSIL).

• Squamous cell carcinoma (SCC).

In a total dataset of 4800 images. For the dataset of

this step, not all the images were centered. All images

were standardized to a dimension of 250 x 250 pixels.

Residual Network (ResNet) is a deep convolu-

tional neural network architecture introduced in 2015

by Kaiming He et al. in the paper “Deep Residual

Learning for Image Recognition”, mentions that the

main idea behind ResNet is to address the issue of

vanishing gradients in deep neural networks. Deep

networks with many layers can suffer from vanish-

ing gradients, where the magnitude of the gradients

used to update the weights during training becomes

very small, leading to slow convergence or even non-

convergence. ResNet solves this problem by adding

residual connections to the network, which allow the

ICSOFT 2024 - 19th International Conference on Software Technologies

222

gradients to bypass one or more layers and ﬂow more

easily through the network (Targ et al., 2016).

A residual block in ResNet consists of several

convolutional layers, with the output of each layer be-

ing added to the input of the next layer (the “resid-

ual connection”). This allows the network to learn

residual functions, or the difference between the de-

sired output and the input, instead of trying to learn

the whole mapping from scratch (He et al., 2020) see

Figure. 4.

Figure 4: Architecture of a ResNet residual block uses a

shortcut connection to bypass one or more layers, allowing

the model to capture and reuse the original image informa-

tion. Retrieved from (He et al., 2020).

ResNet also introduced the concept of deep super-

vision, where multiple loss functions are used for dif-

ferent parts of the network to help improve conver-

gence.

The ResNet architecture achieved state-of-the-art

results on several computer vision benchmarks and

has been widely used in many subsequent deep learn-

ing models and systems. It is also one of the most

popular models for transfer learning, where a pre-

trained ResNet model is ﬁne-tuned for a new task (He

et al., 2020).

There are many architectures of ResNet. In this

work, we will use ResNet-50. The main difference

between them is the depth of the network which is

determined because of the number of layers. Each

number represents the total number of layers. Never-

theless, it is necessary to emphasize that as the num-

ber of layers increases the computational cost too. All

these architectures can be trained by using pre-default

weights. This is useful for increasing the efﬁciency

and precision of the model. As a consequence, it

helps the network to converge faster, having an ac-

curate model in less time. ResNet-50 is trained with a

set of data from ImageNet that can be used to initialize

the network before being trained with different data.

This network already knows how to recognize and ex-

tract image features. To avoid the over-ﬁtting prob-

lem presented during the training a dropout will be

added in the last layer of ResNet-50. The dropout is a

hyper-parameter, and regularization technique where

the main idea is forcing some neurons to not activate

during the training (Chen et al., 2020). For this ar-

chitecture, the dropout will be added in the last layer

with a rate of 0.3. It will compare the difference with

ResNet-50 without dropout and with dropout. In this

context, we will refer to ResNet-50 as CNN 2.

3.3 Scanning Process

The prediction process needs 6 steps, explained in de-

tail below:

Firstly both previously trained CNN models are

charged at the beginning of the process. In CNN 1

we use the architecture presented before, and in CNN

2 we use the model ResNet-50. Then, in Figure 5, the

pap smear image is loaded and resized to N pixels to

standardize all images while preserving the width-to-

height ratio.

Figure 5: Pap smear image is loaded and resized.

In Figure 6, it can be seen that the image is made

up of a matrix, which is linked with 70-pixel steps ver-

tically and horizontally, extracting and storing ROI-

sized matrix sections. Once this is done, the extracted

images are stored in List A, and the quadruple coor-

dinates indicating where each of the extracted areas

begins and ends are stored in List B.

Figure 6: Extraction and storing ROI-sized matrix sections

from pap smear image.

Then, two versions of every section taken out are

saved as vectors to satisfy the demands of the previ-

ously loaded TensorFlow CNN. To keep track of them

using their indexes, each version is put into a sepa-

rate list; one is made smaller to 250 x 250 pixels and

placed in list C, while the other is reduced to 20 x 20

pixels and put into list D. This is seen in the Figure 7.

In Figure 8, the process of recognizing cells in a

list D involves feeding each element of the list to a

CNN for cell recognition. If the prediction for “Cell”

Towards Accurate Cervical Cancer Detection: Leveraging Two-Stage CNNs for Pap Smear Analysis

223

Figure 7: Two versions of every section saved as vectors.

is accurate (above 80%), the corresponding image,

coordinates, and vector are stored in new lists (E, F,

and G) using the current index of list D as a reference.

This enables the visualization of the recognition pro-

cess by drawing lines in the Pap test sample image us-

ing the coordinates from list B. This step is efﬁcient

due to the use of a low-resolution CNN with limited

layers.

Figure 8: Cell recognition using the CNN 1.

Finally, in Figure 9, list C is processed using the

classiﬁcation cell CNN, similarly to the previous step.

Whenever a prediction matches one of the following

categories with the speciﬁed accuracy, the coordinates

from list F are used to highlight their location with a

color code:

• NIL

• LSIL

• HSIL

• SCC

Figure 9: Cell classiﬁcation using the CNN 2.

As a consequence, cells that are determined to

have no Intraepithelial Lesion are designated as Nor-

mal Cells, while all other categories of cells fall under

the Abnormal Cell Category.

4 RESULTS

The system implemented for the cell classiﬁcation

process was through the use of the ResNet-50 model,

having better results in comparison to the benchmark

paper, following the dataset that we had previously

proposed.

The proposed model was trained, in the ﬁrst in-

stance, with 500 epochs, see Figure 10 and Figure

11. The ﬁgures help to identify problems during the

training. The ﬁrst indication of overﬁtting in our

ResNet-50 implementation was observed through the

disparities between the training and validation accu-

racy curves. The training accuracy continued to in-

crease while the validation accuracy stagnated or de-

creased, creating a signiﬁcant gap between the two. A

similar pattern emerged in the training and validation

loss curves, further conﬁrming the overﬁtting issue.

Figure 10: ResNet-50 Behavior in training and validation

accuracy without dropout layer.

To combat the overﬁtting problem, we turned to

data augmentation techniques. Data augmentation in-

volves introducing variability into the training dataset

by applying transformations to the images. We ap-

plied several augmentation techniques, including ro-

tation, brightness adjustments, contrast alterations,

and zoom operations which are differences that Pap

smear images can contain. These modiﬁcations aimed

to diversify the training data, making the model more

robust and adaptable to varying conditions.

In addition to data augmentation, we introduced a

dropout layer with a rate of 0.3 in the ﬂattened layer of

the ResNet-50 architecture. Dropout is a regulariza-

tion technique that temporarily deactivates a fraction

ICSOFT 2024 - 19th International Conference on Software Technologies

224

Figure 11: ResNet-50 Behavior in training and validation

loss without dropout layer.

of neurons during training, preventing over-reliance

on particular features. This encourages the network

to learn a more robust representation of the data.

The impact of these modiﬁcations was immedi-

ately evident. We just needed 100 epochs to see that

the disparity between training and validation accuracy

and loss substantially decreased, indicating that over-

ﬁtting had been successfully mitigated, see Figure 12

and Figure 13. The validation accuracy now exhibited

a more consistent growth pattern, and the validation

loss stabilized. These outcomes demonstrated the im-

proved generalization ability of the model, making it

more suitable for real-world applications.

Figure 12: ResNet-50 Behavior in training and validation

accuracy.

Table 1 shows a summary of the ﬁnal results ob-

tained by applying the adjustments to the CNN after

the training. It is important to notice that the metrics

were assessed with 200 images.

4.1 Comparison with Other Studies

In this comparative analysis, ResNet50V2* and

ResNet101V2* from (Wong et al., 2023) emerge

as primary contenders, achieving accuracy rates

Figure 13: ResNet-50 Behavior in training and validation

loss.

Table 1: Summary of metrics obtained with ResNet50 by

category.

Categories Precision Recall F1-score

NILM 1.00 0.97 0.98

LSIL 0.94 0.92 0.93

HSIL 0.78 0.92 0.84

SCC 0.91 0.86 0.85

of 0.97 and 0.95, respectively, demonstrating their

robust performance. However, our proposed

ResNet50 model maintains a noteworthy accuracy of

0.91, indicating competitive effectiveness. Notably,

ResNetXt29 264d and ResNetXt29 464d from (Zhao

et al., 2022) exhibit similar levels of accuracy. It’s

worth mentioning that (Wong et al., 2023) and (Zhao

et al., 2022), these ﬁndings together provide valuable

insights into optimizing convolutional neural network

models.

Table 2: Comparison with other studies ordered by accu-

racy.

AI Methods Accuracy Classes

ResNet50V2* 0.97 4

ResNet101V2* 0.95 4

ResNet50 0.91 4

ResNetXt29 2*64d 0.91 10

ResNetXt29 4*64d 0.91 10

5 CONCLUSIONS

This study addresses a critical challenge in cervical

cancer diagnosis by introducing a rapid and efﬁcient

two-stage CNN approach for the analysis of high-

resolution conventional pap smear images. Tradition-

ally, the process of pap smear analysis has been ham-

pered by the substantial computational resources and

time required, often leading to delays in the diagno-

Towards Accurate Cervical Cancer Detection: Leveraging Two-Stage CNNs for Pap Smear Analysis

225

sis of cervical cancer. Our innovative two-stage CNN

method represents a signiﬁcant contribution to this

domain. In the ﬁrst stage, we employ a swift and high-

precision model for cell detection, achieving preci-

sion rates exceeding 90%. This initial stage ensures

prompt identiﬁcation of cells, effectively reducing the

computational overhead.

In the second stage, we leverage the power of the

ResNet-50 architecture, renowned for its exceptional

top-1 accuracy and efﬁciency, to perform cell clas-

siﬁcation. By employing this pre-trained model, we

not only enhance accuracy but also optimize compu-

tational resources, streamlining the classiﬁcation pro-

cess, but our journey was not without its challenges.

During our work, we encountered the issue of overﬁt-

ting in the ResNet-50 model. However, our commit-

ment to excellence and the early recognition of this

challenge allowed us to swiftly address it by introduc-

ing a dropout layer with a rate of 0.3 in the ﬂattened

layer of ResNet-50 architecture. This correction en-

sured that our model not only excelled in accuracy

but also maintained its robustness, further enhancing

its reliability.

This approach aligns with the World Health Orga-

nization’s objectives for cervical cancer screening, as

it expedites the analysis while maintaining high ac-

curacy standards. Our work is not only a technologi-

cal advancement but a potential game-changer in the

ﬁeld of medical diagnostics, as it holds the promise of

accelerating the detection and, subsequently, the pre-

vention of cervical cancer.

Looking ahead, future research endeavors could

explore further improvements in the scanning pro-

cess, offering even greater efﬁciency and accuracy.

Additionally, expanding the dataset for training mod-

els may yield enhanced results, reinforcing the ro-

bustness of the method. Despite the limitation of a

small dataset, we can conﬁdently assert that our mod-

els have been successfully trained, marking a pivotal

step toward a future where the early and accurate de-

tection of cervical cancer is not only achievable but

a cornerstone in global healthcare. Our contribution

paves the way for a world where cervical cancer is

no longer an insurmountable threat, but a preventable

and treatable disease.

REFERENCES

(2018). Instituto nacional de estad

ıstica y censos.

(2021). Cervical cancer ecuador 2021 country proﬁle.

Alsalatie, M., Alquran, H., Mustafa, W. A., Mohd Yacob,

Y., and Ali Alayed, A. (2022). Analysis of cytology

pap smear images based on ensemble deep learning

approach. Diagnostics, 12(11):2756.

Chen, L., Gautier, P., and Aydore, S. (2020). Dropcluster: A

structured dropout for convolutional networks. arXiv

preprint arXiv:2002.02997.

Davies-Oliveira, J., Smith, M., Grover, S., Canfell, K.,

and Crosbie, E. (2021). Eliminating cervical cancer:

Progress and challenges for high-income countries.

Clinical Oncology, 33(9):550–559. Latest Advances

in Endometrial and Cervical Cancer - A Global View.

Elyan, E., Vuttipittayamongkol, P., Johnston, P., Martin, K.,

McPherson, K., Jayne, C., Sarker, M. K., et al. (2022).

Computer vision and machine learning for medical

image analysis: recent advances, challenges, and way

forward. Artiﬁcial Intelligence Surgery, 2.

Franco, E. L., Duarte-Franco, E., and Ferenczy, A. (2001).

Cervical cancer: epidemiology, prevention and the

role of human papillomavirus infection. Cmaj,

164(7):1017–1025.

Ghoneim, A., Muhammad, G., and Hossain, M. S. (2020).

Cervical cancer classiﬁcation using convolutional

neural networks and extreme learning machines. Fu-

ture Generation Computer Systems, 102:643–649.

Hasenleithner, S. O. and Speicher, M. R. (2022). How to

detect cancer early using cell-free dna. Cancer Cell,

40(12):1464–1466.

He, F., Liu, T., and Tao, D. (2020). Why resnet works?

residuals generalize. IEEE transactions on neural net-

works and learning systems, 31(12):5349–5362.

Hussain, E., Mahanta, L. B., Borah, H., and Das, C. R.

(2020). Liquid based-cytology pap smear dataset for

automated multi-class diagnosis of pre-cancerous and

cervical cancer lesions. Data in Brief, 30:105589.

Koh, D.-M., Papanikolaou, N., Bick, U., Illing, R., Kahn Jr,

C. E., Kalpathi-Cramer, J., Matos, C., Mart

ı-Bonmat

ı,

L., Miles, A., Mun, S. K., et al. (2022). Artiﬁcial

intelligence and machine learning in cancer imaging.

Communications Medicine, 2(1):133.

Kurniawati, Y. E., Permanasari, A. E., and Fauziati, S.

(2016). Comparative study on data mining classiﬁca-

tion methods for cervical cancer prediction using pap

smear results. In 2016 1st International Conference

on Biomedical Engineering (IBIOMED), pages 1–5.

IEEE.

Macancela, C., Morocho-Cayamcela, M. E., and Chang, O.

(2023). Deep reinforcement learning for efﬁcient dig-

ital pap smear analysis. Computation, 11(12).

Rezende, M. T., Silva, R., de O. Bernardo, F., Tobias, A.

H. G., Oliveira, P. H. C., Machado, T. M., Costa, C. S.,

Medeiros, F. N. S., Ushizima, D. M., Carneiro, C. M.,

and Bianchi, A. G. C. (2021). Cric searchable im-

age database as a public platform for conventional pap

smear cytology data. Scientiﬁc Data, 8(1).

Rom

an, K., Llumiquinga, J., Chancay, S., and Morocho-

Cayamcela, M. E. (2023). Hyperparameter tuning

in a dual channel u-net for medical image segmenta-

tion. In Maldonado-Mahauad, J., Herrera-Tapia, J.,

Zambrano-Mart

ınez, J. L., and Berrezueta, S., editors,

Information and Communication Technologies, pages

337–352, Cham. Springer Nature Switzerland.

Sompawong, N., Mopan, J., Pooprasert, P., Himakhun, W.,

Suwannarurk, K., Ngamvirojcharoen, J., Vachiramon,

ICSOFT 2024 - 19th International Conference on Software Technologies

226

T., and Tantibundhit, C. (2019). Automated pap smear

cervical cancer screening using deep learning. In 2019

41st Annual International Conference of the IEEE En-

gineering in Medicine and Biology Society (EMBC),

pages 7044–7048. IEEE.

Strasser-Weippl, K., Chavarri-Guerra, Y., Villarreal-Garza,

C., Bychkovsky, B. L., Debiasi, M., Liedke, P.

E. R., de Celis, E. S.-P., Dizon, D., Cazap, E.,

de Lima Lopes, G., Touya, D., Nunes, J. S., Louis,

J. S., Vail, C., Bukowski, A., Ramos-Elias, P., Unger-

Salda

na, K., Brandao, D. F., Ferreyra, M. E., Luciani,

S., Nogueira-Rodrigues, A., de Carvalho Calabrich,

A. F., Carmen, M. G. D., Rauh-Hain, J. A., Schmeler,

K., Sala, R., and Goss, P. E. (2015). Progress

and remaining challenges for cancer control in latin

america and the caribbean. The Lancet Oncology,

16(14):1405–1438.

Su, J., Xu, X., He, Y., and Song, J. (2016). Automatic de-

tection of cervical cancer cells by a two-level cascade

classiﬁcation system. Analytical Cellular Pathology,

2016.

Targ, S., Almeida, D., and Lyman, K. (2016). Resnet

in resnet: Generalizing residual architectures. arXiv

preprint arXiv:1603.08029.

Wong, L., Ccopa, A., Diaz, E., Valcarcel, S., Mauricio, D.,

and Villoslada, V. (2023). Deep learning and trans-

fer learning methods to effectively diagnose cervical

cancer from liquid-based cytology pap smear images.

International Journal of Online and Biomedical Engi-

neering (iJOE), 19(04):pp. 77–93.

Xia, M., Zhang, G., Mu, C., Guan, B., and Wang, M.

(2020). Cervical cancer cell detection based on deep

convolutional neural network. In 2020 39th Chinese

Control Conference (CCC), pages 6527–6532. IEEE.

Yamal, J.-M., Guillaud, M., Atkinson, E. N., Follen, M.,

MacAulay, C., Cantor, S. B., and Cox, D. D. (2015).

Prediction using hierarchical data: Applications for

automated detection of cervical cancer. Statistical

Analysis and Data Mining: The ASA Data Science

Journal, 8(2):65–74.

Zhang, L., Lu, L., Nogues, I., Summers, R. M., Liu, S.,

and Yao, J. (2017). Deeppap: deep convolutional net-

works for cervical cell classiﬁcation. IEEE journal of

biomedical and health informatics, 21(6):1633–1643.

Zhao, S., He, Y., Qin, J., and Wang, Z. (2022). A semi-

supervised deep learning method for cervical cell clas-

siﬁcation. Analytical Cellular Pathology, 2022:1–12.

Towards Accurate Cervical Cancer Detection: Leveraging Two-Stage CNNs for Pap Smear Analysis

227