Predicting the Malignant Breast Cancer using Tumor Tissue Features
Wenrui Zhao
College of Art and Science, the Ohio State University, Columbus, OH, 43210, U.S.A.
Keywords: Breast Cancer, Breast Cancer Datase, Feature Selection, FNA, Cancer Diagnosis.
Abstract: Breast cancer is one of the most common cancers in women and is the second leading cause of death after
lung cancer. In clinical diagnosis, fine needle aspiration cytology is often used in tumor diagnosis, considering
safety, accuracy, and ease of operation. Pathologists can judge whether the patient's tumor tissue is malignant
by observing the cell population. The accuracy of fine-needle biopsy largely depends on the doctors who
participate in sampling and analysis. Therefore, it is crucial to study which characteristics of cells can become
a solid basis for discrimination. This article constructs univariate and multivariate logistic regression models
to analyze the predictive value of 9 features of the cell to breast cancer. By evaluating the ROC curve, the
article shows that the constructed model accurately predicts malignant tumor tissue. The 9 characteristics of
FNA quantitative detection of tumor tissue are of great value in predicting malignant breast cancer.
1 INTRODUCTION
Breast cancer is one of the most common cancers in
women and is the second leading cause of death after
lung cancer (Nguyen,1970) (Mangasarian, 1990). In
2020, over 2.3 million women were diagnosed with
breast cancer worldwide, and 685 thousand died. Due
to population growth, aging, and the increasing
prevalence of known cancer risk factors (such as
smoking and unhealthy eating), WHO believes that if
the global incidence rate remains the same as in 2020,
there will be around 28.4 million new cancer cases
worldwide in 2040. Women in every country face the
risks of developing breast cancer at any age after
puberty, but the incidence rate will increase with age
growth (Piro,2021). Existing diagnostic techniques,
including nuclear magnetic resonance imaging,
ultrasound, CT (computer tomography) or PET
(positron emission tomography), are very effective in
tumor detection (World Health Organization, 2021)
However, when doctors find suspicious tumor tissue,
they still hope to obtain tissue samples for analysis.
Biopsy isan essential technique for the diagnosis of
cancer in the clinic. Because fine needle biopsy does
not need any preparation in advance, nor does it need
special dietary norms, fine-needle aspiration (FNA)
has become the preliminary diagnostic basis for
judging whether breast tissue is cancerous. A large
number of data show that although FNA has many
advantages, a few cases may be misdiagnosed.
Therefore, it is vital to study which characteristics of
cells can become a solid basis for discrimination.
From 1989 to 1991, Dr. Wolberg, Dr. Mangasarian
and two graduate students constructed a classifier
using the pattern separation multi-surface method
(MSM) for these nine features and successfully
diagnosed 97% of new cases (Nguyen,1970)
(Wolberg,1989). These led to the Wisconsin breast
cancer dataset. This article constructs univariate and
multivariate logistic regression models to analyze the
predictive value of 9 features of the cell to breast
cancer. This article used biometric methods for
exploratory data analysis to focus more narrowly on
checking the fitting degree of the model (Chatfield,
2021) By studying the different importance of the 9
features of cells, the article helps people establish a
more standard method to judge whether tumor tissue
is malignant.
2 ANALYSES
FNA uses a tiny needle tube of about 20-27G (similar
to or smaller than the needle tube for regular blood
testing. Generally, the larger the number of G, the
smaller the needle tube) (CancerQuest,2021). Due to
the small amount of tissue and its cellular
components collected, pathologists will pay more
attention to the observation of cell populations. The
study used the Wisconsin Breast Cancer Dataset