Prediction of Breast Cancer Using Classification Algorithms

Harshitha R.

and S. Manju Priya

†

Department of Computer Science, Karpagam Academy of Higher Education, India

Keywords: Breast Cancer, SVM Algorithm, Data Science, R Program.

Abstract: Breast cancer is the most widespread aliment in women, an average of 1,78,000 new cases have been

diagnosed every year. The work done in this research paper has been tested using the Breast cancer Data sets.

This study was carried out to analyze Breast cancer data using the SVM algorithm. Support Vector Machine

& Naïve Bayes algorithm are the two classifiers used in this research. The algorithms effectiveness, timeliness

and precision are evaluated. Based on the findings, the optimal classification strategy for categorising breast

cancer is chosen.

1 INTRODUCTION

Compared to other cancer kinds, breast cancer occurs

more frequently, especially in women. In the

glandular tissue of the breast, this cancer type

develops in the lining cells (epithelium) of the ducts

(85%) or lobules (15%).

To lower the death rate, breast cancer must be

discovered and treated early. The most prevalent type

of cancer worldwide is Breast cancer, with 7.8 million

people alive as of the end of 2020 who had received

a diagnosis in the previous five years. Breast cancer

comes in two forms: invasive and non-invasive.

Cancerous and malignant, invasive spreads to other

organs. Precancerous and non-invasive, benign it stays

in the original organ. It ultimately progresses to

aggressive breast cancer. Many breast cancer

identification techniques have been developed in

recent years, and their effectiveness has been

confirmed. The accuracy and time of the performances

were compared, and it was shown that SVM's

robustness contributed to its high accuracy rate.

The most often utilised data mining techniques in

the healthcare industry are categorization approaches.

The specific application that is trained using the

existing data set is the classification model. With this

idea Support Vector Machine & Naïve Bayes

algorithm available in R package are studied. An

analysis of above two algorithms has been done based

on their accuracy, performance and timing.

II MSc

†

Professor

2 LITERATURE REVIEW

Several machine learning techniques have been

suggested by different researchers to detect breast

cancer. Here, we spoke about a few different ways

others researched to diagnose breast cancer. Dr. R.

Vijaya Kumar Reddy, et.al., (Reddy 2020) showed

the effectives of techniques and extraction of

techniques. Lina Alkhathlan, et.al. (Namik Kemal

2018) has done a comprehensive analysis of recent

research on ML to help researchers. Md. Milon Islam,

et.al, [5 Suggested that a clinical aid for the diagnosis

of Breast cancer could be provided by machine

learning technique (Senapati et al 2013). Kemal has

shown multilayer perceptron method with high

accuracy rate (Amin et al 2021). Amin Ul Haq, et.al.

For the purpose of accurately identifying BC, the

proposed technique is simple to integrate into e-

healthcare systems. Juneja K, et.al., has enhanced the

weighted decision tree technique for predicting breast

cancer (Jhajharia et al 2016). Muhammet Fatih Ak,

employing data visualization and machine learning

applications, evaluated the identification and

diagnosis of breast cancer. The key benefit of LR is

that it produces correct results in complicated

algorithms and is very effective at training.

276

R, H. and Priya, S.

Prediction of Breast Cancer Using Classiﬁcation Algorithms.

DOI: 10.5220/0012613600003739

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 1st International Conference on Artiﬁcial Intelligence for Internet of Things: Accelerating Innovation in Industry and Consumer Electronics (AI4IoT 2023), pages 276-280

ISBN: 978-989-758-661-3

3 METHODOLOGY

The main objective of the work is to assess the

effectiveness of the two algorithms that were selected

for the investigation and identify which algorithm is

most suitable. The second objective is to conduct an

investigation into Breast cancer using particular data.

Figure 1: Flow chart of the proposed method.

The Breast Cancer dataset was used to implement

the study using SVM (Support Vector Machine

Algorithm). Among the R-based classification

approaches available, SVM was chosen because of its

quick findings and high accuracy level.

Here the data set for this classification has been

taken from Breast Cancer Wisconsin (Diagnostic)

Data Set (WBC), from UCI machine learning

repository is a classification dataset, which records

the measurements for Breast Cancer Cases.

4 RESULTS AND DISCUSSIONS

This research report analysed classification systems

and offered a basis for accuracy percentage

comparisons between them. The confusion matrix is

used to calculate the effectiveness level.

And the R programming language, which is

supported by the R core and used for statistical

computing and graphics, was utilized to implement it.

One of the most popular programming languages for

data research is R. R comes with a large number of

pre-installed packages, and each of these packages

contains a collection of functions for various analyses

and graphical displays.

Support Vector Machine (SVM) algorithm has

been applied since its accuracy level and working

process is better than the other algorithm used for the

study, The algorithm known as supervised machine

learning (SVM) evaluates and divides data into one

or more categories. Since SVM can handle both

classification and regression on both linear and non-

linear data, it is one of the machine learning

techniques we use. It is used in applications like

Recognition, Detection, and Classification. Naive

Bayes Algorithm is kind of simple probabilistic

Bayesian-Based classification technique. Statistical

independence is utilised by a family of machine

learning algorithms, not by a single one. This is

mainly used for text classification. Statistical

classifiers include, for example, Bayesian classifiers.

They are able to forecast the likelihood that a

given set of data will belong to a certain class. The

cornerstone of Bayesian classification is the Bayes

theorem. The results of the study are as follows from

the algorithm SVM. At first the overall frequency

level is analysed from the dataset, to know the

frequency level in both Benign & Malignant. Figure

2a shows the overall average of women affected in

Benign & Malignant. This means there is a 1in 10

chance a woman will have breast cancer and this also

means there is 8 in 10 chance a woman will won’t be

affected by Breast cancer. The incidence of

Malignant increases in 2

decade to 5

decade and

100% in the 8

decade. The incidence of Malignant

increases with age with a maximum incidence in the

older age group. Figure 2b shows Correlation Matrix

is a graphical display to find potential relationships

between variables and to understand the strength of

these relationships. This is an effective tool for

compiling a sizable dataset and for finding and

displaying data trends. Figure 2c shows the Concavity

of Benign vs Malignant, where Concavity in medical

term mean the ranges from the stage 1 to stage 4,

which mean the cancer has spread more. So, this

result shows the range of Malignant is higher than the

range of Benign. Since the concavity mean is most

important attribute in breast cancer, indicates the

shape & color to identify the disease. Concave

represents the number of indentations present on the

nuclear border.

Figure 2d shows Fractional Dimension in breast

cancer analyses the breast tissue specimens provides

a clearance of tumor growth patterns. It is an

objective and measure of the complexity of the tissue

of specimen. This helps to shoe how scaling changes

a model or modelled object. This have been measured

using the formula D=log N/log S. This has to be

measured to see how completely the fractals embed

themselves.

Prediction of Breast Cancer Using Classiﬁcation Algorithms

277

Figure 2: (a) overall average of women affected in Benign & Malignant; (b) Correlation Matrix; (c) Concavity of Benign vs

Malignant; (d) Fractional Dimension in breast cancer.

Figure 3: (a) Symmetry means of breasts; (b) Accuracy levels.

Fig 3a shows the breasts are generally symmetric

in their density and architecture. Some studies have

shown women with breast cancer had a greater breast

asymmetry. This shows at a different position,

volume and form of the breasts.

While deducting in Naïve Bayes the accuracy

level was 93%. So SVM is better compared to Naïve

Bayes.

The Classifier, Accuracy and the Error level of the

SVM algorithm are:

Classifiers:

svm(formula=diagnosis~.,data=training_set,type=’C

-classification’, kernel=’linear’)

Parameters:

SVM-Type: C-classification

SVM-Kernel: Linear

Cost: 1

Support Vectors: 24

Accuracy: 96.98%

Error: 0.30%

5 CONCLUSIONS

Using classification techniques, one can reliably

predict the early breast cancer detection. In this

research paper SVM C-classification method were

compared and suggested as one of the best that can

outperform the competition. It is observed that among

the two algorithms compared namely “SVM” and

“Naïve Bayes”, considering the performance metric

accuracy, which stands out from the other employed

AI4IoT 2023 - First International Conference on Artiﬁcial Intelligence for Internet of things (AI4IOT): Accelerating Innovation in Industry

and Consumer Electronics

278

measures, "SVM" is the best. The "SVM" method has

the highest precision (1), leading us to draw the

conclusion that it is the best choice for cancer

analysis. To further improve the classification of

breast cancer, we will apply other classification

algorithms and additional data sets in the future.

Table 1.

Attribute Name

Attribute Description

Values

Identification Number for Each

and Every Patient

(0-9)

Diagnosis

Breast Tissue Diagnostic (M =

Malignant, B = Benign)

M&B

(0&1)

Radius _Mean

Average Separation Between the

Centre and the Perimeter’s Point

(0-9)

Texture_Mean

Standard Deviation of Values in

Gray Scale

(0-9)

Perimeter_Mean

Core Tumor's Average Size

(0-9)

Smoothness_Mean

Mean of Localised Radius

Length Fluctuation

(0-9)

Compactness_ Mean

Perimeter2/Area-1.0

(0-9)

Concavity_Mean

Average Degree of Contour's

Concave Areas' Severity

(0-9)

Concave _Points _Mean

Average for the Proportion of

The Contour That’s Concave.

(0-9)

Fractal_Dimension_Mean

Mean for “Approximation of

The Coastline”-1

(0-9)

Radius_Se

The Mean of The Centre-To-

Point Grids 3*3 Distances

Standard Deviation

(0-9)

Texture_Se

Grey-Scale Values Standard

Deviation Standard Error

(0-9)

Smoothness_Se

Standard Deviation for Regional

Differences in Radius Length

Grid 3*3

(0-9)

Compactness_Se

Standard Error for Perimeter^2 /

Area - 1.0

(0-9)

Concavity_Se

Standard Error for Severity of

Concave Potions of the Contour

Grid_3*3

(0-9)

Concave_Points_Se

Standard Deviation for the

Quantity of Concave Parts of the

Contour

(0-9)

Fractal_Dimension_Se

“Coastline Approximation”

Standard Error=1

(0-9)

Radius_Worst

Highest or "Worst" Figure for

the Average Distance Between

the Centre and Points on the

Periphery

(0-9)

Texture_Worst

The “Worst” or Largest Mean

Standard Deviation for Gray

Scale Values

(0-9)

Smoothness_Worst

The Worst Value, or Greatest

Mean Value, for Differences in

Radius Lengths by Region

(0-9)

Compactness_Worst

The Perimeter2 / Area - 1.0

Mean Value That Is "Worst" Or

Largest Is 1.0

(0-9)

Concavity_Worst

"Worst" or Greatest Mean Value

for The Degree to Which the

Contour Is Concave

(0-9)

Concave Points_Worst

The Number of Concave

Contour Segments “Worst” or

Greatest Average Value

(0-9)

Fractal_Dimension_Worst

The Biggest or Worst Mean

Value for "Coastline

Approximation" was 1.

(0-9)

REFERENCES

Vijay kumar Reddy, Shaiksubhani, G. Rajesh Chandra, B.

Sriniva Rao “Breast cancer classification

methodologies, International Journal of Emerging

trends in engineering research, Vol 8, No 9, 2020.

Lina Alkhathlan1 and Abdul Khader Jilani saudagar

“Machine Learning Methods for Breast Cancer

Analysis: A Systematic Literature Review, IJCSNS

International Journal of Computer Science and

Network Security, Vol. 20 No.6,2020.

Md. Milon Islam, Md. Rezwanul Haque, Hasib Iqbal, Md.

Munirul Hasan, Mahmudul Hasan, and Muhammad

Nomani Kabir “Breast cancer prediction: A

comparative study using machine learning techniques”

SN Computer Science, 2020.

Namık Kemal University, Çorlu Engineering Faculty

“classification and diagnostic prediction of breast

cancers via different classifiers” International scientific

and vocational journal (ISVOS) Vol 2, 2018.

Ebrahim Edriss Ebrahim Ali1, Wu Zhi Feng “Breast Cancer

Classification Using Suppssort Vector Machine and

Neural Network” International Journal of Science and

Research (IJSR), 2014.

Senapati, Mohanty Ak, Dash S, Dash PK. Local linear

wavelet Neural Network for breast cancer recognition.

Neural comput Apppl, 2013.

Amin Ul Haq, Jian Ping Li, Abdus Saboor, Jalaluddin

Khan1, Samad Wali2, Sultan Ahmad3, (Member,

IEEE), Amjad Ali4, Ghufran Ahmad Khan5, And

Wang Zhou6 “Detection of Breast Cancer Through

Clinical Data Using Supervised And Unsupervised

Feature Selection Techniques” Ieee Access ,2021.

Jhajharia S, Verma S, Kumar R. A cross-platform

evaluation of various decision tree algorithms for

prognostic analysis of breast cancer data. In: Proc.

International conference on inventive computation

technogies (ICICT), 2016.

Muhammet Fatih Ak “A Comparative Analysis of Breast

Cancer Detection and Diagnosis Using Data

Visualization and Machine Learning Applications”

Healthcare,2020.

M. M. Islam, H. Iqbal, M. R. Haque, And M. K. Hasan,

‘‘Prediction of Breast Cancer Using Support Vector

Machine And K-Nearest Neighbours,’’ In Proc. IEEE

Region Humanitarian Technol. Conf. (R-Htc),2017.

Senapati, Panda G, Dash Pk. Hybrid approach using KPSO

and RLS for RBFNN design for breast cancer detection.

Neural comput Appl. 2014.

Prediction of Breast Cancer Using Classiﬁcation Algorithms

279

Juneja K, Rana C. An Improved weighted decision tree

approach for breast cancer prediction. In: International

journal of Information technology, 2018.

V. Chaurasia, And S. Pal, “A Novel Approach for Breast

Cancer Detection Using Data Mining Techniques”,

International Journal of Innovative Research In

Computer And Communication Engineering, Vol.2,

2014.

A. T. Azar and S. A. El-Said, ‘‘Probabilistic Neural

Network for Breast Cancer Classification,’’ Neural

Comput. Appl., Vol. 23.

F. Ahmad, N. A. M. Isa, Z. Hussain, And S. N. Sulaiman,

‘‘A Genetic Algorithm-Based Multi-Objective

Optimization of An Artificial Neural Network

Classifier for Breast Cancer Diagnosis,’’ Neural

Comput. Appl., Vol. 23.

R. Sheikhpour, M. A. Sarram, And R. Sheikhpour,

‘‘Particle Swarm Optimization for Bandwidth

Determination and Feature Selection of Kernel Density

Estimation-Based Classifiers in Diagnosis of Breast

Cancer’’, Appl. Soft Comput, Vol 40.

A. H. Osman and H. M. A. Aljahdali, ‘‘An Effective of

Ensemble Boosting Learning Method for Breast Cancer

Virtual Screening Using Neural Network Model,’’

IEEE Access, Vol. 8, 2020.

C. A. Peña-Reyes and M. Sipper, ‘‘A Fuzzy-Genetic

Approach to Breast Cancer Diagnosis’’, Artif. Intell.

Med., Vol. 17, No. 2.

AI4IoT 2023 - First International Conference on Artiﬁcial Intelligence for Internet of things (AI4IOT): Accelerating Innovation in Industry

and Consumer Electronics

280