where p
a
is ratio of cohesions and p
e
is probability of
cohesion by coincidence
where TP is the number of true positives, TN is the
number of true negatives, FP is the number of false
positives, and FN is the number of false negatives
8 LIMITATION
The limitation of this paper is the pre-processing of
dataset that was not mentioned in details. However,
during running some algorithms compatibility issue
occurred with train and test dataset. WEKA developer
version 3.7.13 addresses this by using class input
mapped classifier. Secondly, this study did not
include statistical comparison of classification
performance and thus, how significant the difference
in performance among all classifiers is unknown.
9 CONCLUSIONS
In this study, Thyroid dataset has been classified
using various decision trees, neural network, Statistic
Learning, and k-NN algorithms. Decision tree J48
model was found to be the best classifier based on
accuracy, Kappa, MCC, and ROC. This model also
outperformed other classifiers used in previous study,
either as single classifier or in combination with
Adaptive Boosting algorithm. Deciding the best
algorithm that can be used in data mining of thyroid
dataset can simply be based on the classification
accuracy, which is the closeness of measured value to
the actual value. The vast majority of studies claimed
that the most common predictor of the optimum
classification algorithm is classification accuracy.
Improved accuracy can be achieved by combining
two classifiers. In this study, integration of different
algorithms into a combinatorial classifier has
successfully overcome the shortages of some
classifiers (SMO and BayesNet). However, there are
considerations to weigh in, particularly the type of
dataset. Moreover, classification using combinatorial
algorithm for large dataset would be a time-
consuming procedure. Therefore, it is suggested that
further study on optimization of combinatorial
classification for numerical, nominal, and discrete
datasets would be very beneficial.
REFERENCES
Akbas
̧
, A., Turhal, U., Babur, S., & Avci, C. (2013).
Performance Improvement with Combining Multiple
Approaches to Diagnosis of Thyroid Cancer.
Engineering, 05(10), 264-267.
Yeh, W.C., Chang, W.W., & Chung, Y.-Y. (2008). A new
hybrid approach for mining breast cancer pattern using
discrete particle swarm optimization and statistical
method. Expert Systems with Applications. Available
online 21 October 2008.
Olafsson, S., Li, X., & Wu, S. (2008). Operations research
and data mining. European Journal of Operational
Research, 187(3), 1429–1448.
Mohamadi, H., Habibi, J., Abadeh, M. S., & Saadi, H.
(2008). Data mining with a simulated annealing based
fuzzy classification system. Pattern Recognition, 41(5),
1824–1833.
Chang, W., Yeh, W., & Huang, P. (2010). A hybrid
immune-estimation distribution of algorithm for mining
thyroid gland data. Expert Systems With Applications,
37(3), 2066-2071.
Ozyilmaz, L., Yildirim. (2002) T. Diagnosis of Thyroid
Disease Using Artificial Neural Network Methods.
Proceedings of the 9
th
International Conference on
Neural Information Processing (ICONIP’02), 4, 2033-
2037.
Giannopoulou, E.G. (2008). Data Mining in Medical and
Biological Research, InTech, November, ISBN 978-
953-7619-30-5.
Ressom, W., Varghese, R.S., Zhang, Z., Xuan, J., and
Clarke, R. (2008) Classification Algorithms for
phenotype prediction in genomic and Proteomics Front
BioScience.
Iavindrasana, J., Hidki, A., Cohen, G., Geissbuhler, A.,
Platon, A., Poletti, P.A., Müller, H.J. (2010). Journal of
Digit Imaging, Comparative performance analysis of
state-of-the-art classification algorithms applied to lung
tissue categorization. Depeursinge. 23(1), 18-30.
Fernandez, A., Duarte, A., Hernandez, R., Sanchez, A.
(2010). GRASP for Instance Selection in Medical Data
Sets. AISC, 74, 53-60.
Jacob, S.B., Ramani, R.G. (2012). Mining of Classification
Patterns in Clinical Data through Data Mining
Algorithms. International Conference on Advances in
Computing Communiations and Informatics, 997-1003.
Kotsiantis, S.B. (2007). Supervised Machine Learning: A
Review of Classification Techniques. Informatica, 31,
249-268.
Bostanci, B. and Bostanci, E. (2012). An Evaluation of
Classification Algorithms Using Mc Nemar’s Test.
Advances in Intelligent Systems and Computing,
pp.15-26.
Comparison of Decision Tree, Neural Network, Statistic Learning, and k-NN Algorithms in Data Mining of Thyroid Disease Datasets
245