feature selection method on all these three datasets.
For example, in THCA dataset, SVM-RFE method is
about 0.7 % higher than SVM, while in GBMGG
dataset, SVM-RFE method is about 0.2 % to 2 %
higher than SVM. On LUSC dataset, the F1 score of
SVM-RFE method is slightly lower than SVM only
when using LR classifier, and the other scores are
higher than SVM
4 CONCLUSIONS
In this paper, two feature selection methods based on
SVM are compared, and this method is applied to
three different TCGA cancer datasets to verify and
compare their performance on two classifiers.
Finally, it is concluded that the comprehensive
performance of the SVM-RFE feature selection
method is better than that of the SVM feature
selection method.
In addition, we did a further experiment on the
performance of SVM-RFE, by eliminating a different
number of features to explore the impact of SVM-
RFE each iteration on the model performance. The
conclusion is that when we use SVM-RFE, the model
performs best when one feature is removed in each
iteration, but it takes a long time. Eliminating
multiple features in each iteration improves the time
efficiency of the model, but reduces its performance.
This experiment is of great significance to the study
of cancer, further verifying the feasibility of machine
learning in cancer data analysis, helping doctors and
researchers to reduce the pressure of analyzing cancer
data, and helping predict the patient's condition.
Suggestions for further work: Analyze whether the
patient's condition is serious by judging whether the
patient is in the primary state of cancer or the
metastatic state of cancer lesions. Divide tumors into
types and adopt different treatment options to
improve the patient's 5-year survival rate.
ACKNOWLEDGEMENTS
Throughout the writing of this dissertation, I have
received a great deal of support and assistance. I
would like to thank my parents for their wise counsel
and sympathetic ear. You are always there for me. I
could not have completed this dissertation without the
support of my friends, who provided stimulating
discussions as well as happy distractions to rest my
mind outside of my research.
REFERENCES
Comparison of Penalty-based Feature Selection Approach
on High Throughput Biological Data. N Wang.W
Zhou.J Wu.S Chen.Z Fan(2020)
Comprehensive molecular portraits of human breast
tumours, TCGA Network (2012)
Decision tree based feature selection and multilayer
perceptron forsentiment analysis. J Jotheeswaran.S
Koteeswaran (2015)
Development of Two-Stage SVM-RFE Gene Selection
Strategy for Microarray Expression Data Analysis, Yu
chun Tang, Yan-Qing Zhang, and Zhen Huang (2007).
Feature selection for support vector machines. J Weston, S
Mukherjee, O Chapelle, M Pontil, V Vapnik(2001)
ISABELLE GUYON, JASON WESTON, STEPHEN
BARNHILL Gene Selection for Cancer Classification
using Support Vector Machines, AT&T Labs, Red
Bank, New Jersey, USA, (2002,7-14).
Molecular Classification of Cancer: Class Discovery and
Class Prediction by Gene Expression Monitoring.T. R.
GoLub, 12*t D. K. SLonim,1t P. Tamayo,' C.
Huard,'M. Gaasenbeek,l J. P. Mesirov,1 H. CoUler,1
M. L. Loh,2 J. R. Downing,3 M. A. Caligiuri,4 C. D.
Bloomfield,4 E. S. Lander (1999)
Multiclass SVM-RFE for product form feature selection.
Meng-Dar Shieh *, Chih-Chieh Yang (2002)
Overview of Systems Biology and Omics Technologies.
Bensu Karahalil (2016).
Platt J C. Fast train of support vector machines using
sequential minimal optimization (1999).
Support Vector Machines, Boser, (1992); Vapnik, (1998)
Support vector regression machines. In: Advances in
Neural Information Processing Systems 9, Drucker J,
Burgers C J C, Kaufman L,er al., NIPS 1996. MIT
Press, 155-161
Variable selection using Random Forests. Robin Genuer,
Jean-Michel Poggi, Christine Tuleau-Malot (2016).
WA. Bhola and S. Singh (2018), “Gene selection using high
dimensional gene expression data: An appraisal.