In this work, we have proposed a novel prognosis
prediction method based on SVM to create
personalized predictive models. Two datasets ,
NSCLC and WPBC, was selected which had small
size, high dimensional characteristics.
The novelty of this work is three-fold. Firstly, we
have modified the standard RBF kernel function in
SVM model to fit the test data, which have hybrid
types of feature. This modification makes the model
meet the needs of practical application. Secondly, we
propose the SMOTE strategy to deal with
imbalanced training-data problems. A series of
experiments have demonstrated the effectiveness of
SMOTE strategy when faced with imbalanced data
set. Thirdly, SVM-RFE is employed to extract
features collection of greatest impact on outcome.
The results demonstrate that with the help of SVM-
RFE, 17 out of 34 attributes of WBPC have been
selected, and 28 out of 37 attributes of NSCLC have
been selected which outperforms the over all attribute
collection.
So far, only SVM models have been employed.
In the future, we are preparing an extensive set of
tests by using other machine learning method, such
as random forest, deep learning, in the same manner
as the SVM procedure.
ACKNOWLEDGMENT
This work has been partially supported by the
National Natural Science Foundation of China(No
31371340) and the Natural Science Foundation of
Education Department of Anhui Province(No.
KJ2017A542)
REFERENCES
Z. H. Zhu, B. Y. Sun, Y. Ma, J. Y. Shao, H. Long, X.
Zhang, ... & P. Ling, “Three immunomarker support
vector machines–based prognostic classifiers for stage
IB non–small-cell lung cancer,” Journal of clinical
oncology, Vol.27, pp.1091-1099, 2009.
K. Jayasurya, G. Fung, S. Yu, C. Dehing-Oberije, D. De
Ruysscher, A. Hope, ... & A. L. A. J. Dekker,
“Comparison of Bayesian network and support vector
machine models for two-year survival prediction in
lung cancer patients treated with radiotherapy,”
Medical physics, Vol.37, pp.1401-1407, 2010.
G. Wu, E.Y. Chang, “Class-boundary alignment for
imbalanced dataset learning,” In ICML 2003
workshop on learning from imbalanced data sets II,
Washington, DC, pp. 49-56, August 2003.
W. N. Street, O. L. Mangasarian, W. H. Wolberg, “An
inductive learning approach to prognostic prediction,”
In ICML, Tahoe City, California, USA, pp.522-530,
July 1995.
D. Chakraborty, U. Maulik, “Identifying cancer
biomarkers from microarray data using feature
selection and semisupervised learning,” IEEE journal
of translational engineering in health and
medicine, Vol.2, pp.1-11, 2014.
X. Xu, Y. Zhang, L. Zou, M. Wang, ... & A. Li, “A gene
signature for breast cancer prognosis using support
vector machine,” In BMEI, Chongqing, China, pp.
928-931, October 2012
A. Rosenwald, G. Wright, W. C. Chan, “The use of
molecular profiling to predict survival after
chemotherapy for diffuse large-B-cell lymphoma,”
New England Journal of Medicine, Vol.346, pp.1937-
1947, 2002.
K. B. Duan, J. C. Rajapakse, H. Wang, et al. “Multiple
SVM-RFE for gene selection in cancer classification
with expression data,” NanoBioscience, IEEE
Transactions on, Vol.4, pp.228-234, 2005.
Xu T, Le T D, Liu L, et al. “Identifying cancer subtypes
from mirna-tf-mrna regulatory networks and
expression data,” PloS one, Vol.11(4): e0152792,
2016.