aimed at predicting treatment outcomes for HPV
infection in recently diagnosed cervical cancer
patients (Vickram, A. S. et al. 2021). The pursuit of
early-stage cervical cancer diagnosis has led to
proposals focusing on accuracy, precision, recall, the
F1-score, and the appropriate application of Logistic
Regression algorithms (Song et al. 2015).
A challenge with current studies is the inability to
accurately detect cervical cancer disease. This study
aims to enhance the initial detection of cervical
cancer in patients and improve overall accuracy by
utilizing the Logistic Regression algorithm. A
comparison with the technique involving the
Artificial Neural Network is also presented.
2 MATERIALS AND METHODS
The study took place within the Department of
Electronics and Communication Engineering at
Saveetha School of Engineering. The analysis was
divided into two subsets, each comprising 15
samples. Subset 1, containing 15 samples, was
dedicated to Logistic Regression, while Subset 2 was
assigned to Artificial Neural Network. The sample
sizes for each subset were determined through a
power calculation, employing 80.200% pretest
power, an alpha error of 0.94, a threshold of 0.05, and
a confidence level of 83.066%.
A cervical cancer dataset was employed to
determine the presence of the disease. The dataset
was obtained from the Cervical Cancer.cv website
and consists of 5 categories, 186 attributes, and
11,650 instances. Each subset was sampled
individually, resulting in a total of 30 unique samples
for the test dataset. This set of 30 samples was
subsequently divided into training and testing
datasets. Once the dataset was partitioned, the
algorithms were applied to the training and testing
sets to predict accuracy values.
2.1 Logistic Regression
A logistic regression model predicts a dependent
variable by analyzing the connection between
existing independent variables. It can predict binary
outcomes, such as the success of a political candidate
or the acceptance of a high school student into a
specific institution. Logistic regression's significance
has grown in machine learning, enabling the
classification of new data based on historical
information. As new data becomes available,
algorithms improve their accuracy in classifying data
points.
Algorithm for sample 1 preparation
1. Collect data from patients diagnosed with
cervical cancer, including factors like age,
family history, HPV status, etc. Clean the data
by addressing missing values, outliers, and
scaling features.
2. Divide the dataset into two subsets: training
and testing.
3. Develop a logistic regression model using
relevant libraries like scikit-learn or
TensorFlow. Train the model on the training
data to predict the likelihood of cervical cancer
based on independent factors.
4. Evaluate the model's performance on the
testing data using metrics such as accuracy,
precision, recall, F1 score, and area under the
ROC curve.
5. If needed, fine-tune the model by adjusting
hyperparameters, conducting feature
engineering, or exploring alternative
algorithms.
6. Deploy the trained model in a production
environment to predict cervical cancer
outcomes for new data.
7. Regularly monitor the model's performance
and update it as necessary to maintain
accuracy and relevance.
2.2 Artificial Neural Networks
Artificial Neural Networks (ANNs) are constructed
from interconnected units called artificial neurons,
mimicking the structure of neurons in the human
brain. Similar to synapses, these connections enable
the transmission of signals between neurons. Upon
processing incoming signals, an artificial neuron can
transmit signals to its connected neurons. The
neuron's output is determined by a non-linear
function applied to the sum of its inputs, and the
"signal" transmitted across a connection is
represented by a real number. These connections are
referred to as edges. Neuron and edge weights are
adaptable and change during the learning process.
Weight adjustments influence the strength of a
connection's signal by either increasing or decreasing
it. Additionally, some neurons may possess a
threshold that incoming signals must exceed before
transmission occurs.
Algorithm for sample 2 preparation
1. Gather patient information, medical history,
clinical presentation, lab results, imaging
studies, and pathology reports from cervical
cancer patients.
AI4IoT 2023 - First International Conference on Artificial Intelligence for Internet of things (AI4IOT): Accelerating Innovation in Industry
and Consumer Electronics
26