Performance of above variants is compared against
following state-of-the-art classification algorithms:
Logistic regression
KNN
Decision trees
Gradient boosting
The presentation of the paper is as follows. Second
section begins with an analysis of prior research on
classification algorithms. Following this, there is
discussion about the proposed classification
methodology. Next, the time complexities of various
models are detailed. Experimental findings are
presented in the next section. Finally, the findings of
the study and recommendations for future research
work are presented.
2 RELATED WORK
An empirical analysis about the effectiveness of
supervised learning on high-dimensional data is
carried out by (Rich Caruana et al., 2008). The
authors implement several machine learning
algorithms like support vector machine (SVM),
artificial neural network (ANN) and others. These
models are evaluated on the basis of performance
metrics, namely, accuracy, root mean square error,
and area under the ROC metric curve. During the
study, 11 binary datasets of very high dimensionality
are evaluated and it is concluded that random forest
(RF), ANN, SVM and boosted trees outperform all
other models. It is also observed that the least
performing methods are naive Bayes and perceptron.
The study also indicates that boosted trees perform
well in lower dimensionality datasets but when it is
applied above 400 dimensions, it tends to over fit.
(Chongsheng Zhang et al., 2017) carry out an
empirical study on various emerging classifiers like
extreme learning machine (ELM), sparse
representation classifier (SRC) and others. These
classifiers are compared with traditional classifiers
like random forest, k- nearest neighbors (KNN) etc.
During the study, 71 datasets are experimented to
validate the effectiveness of the models. The results
indicate that the stochastic gradient boosting decision
trees perform well in supervised learning. (Jingjun Bi
et al, 2018) propose a new machine learning method
based on multi class imbalance, namely, Diversified
Error Correcting Output Codes (DECOC). To
validate the effectiveness of their model, they
perform experiments on 17 multi class imbalance
datasets. The results indicate that the DECOC achieve
best results in terms of accuracy (ACC), area under
the ROC Curve (AUC), geometric mean (G-mean)
and F- measure. (Amanpreet Singh et al., 2016)
compare various supervised machine learning
algorithms on the various datasets, on the basis of
accuracy, speed, comprehensibility and speed of
learning. The authors employ Bayesian networks,
naive Bayes, KNN, etc. The authors suggest that
choice of an appropriate algorithm depends on the
dataset and type of classification problem. The
authors conclude from the experimental results that
the tree-based algorithms perform better than the rest
of the algorithms. According to (Rich Caruana et al.,
2006), multiple performance criteria are used to
compare learning models in various domains. A
model may perform well on one measure but poorly
on another. Multiple performance measures assess
various trade-offs in prediction. As a result, they
evaluate algorithms based on a relatively wide range
of performance indicators. The authors compare the
ten supervised algorithms using eight distinct
performance metrics. They examine the performance
indicators before and after using Platt scaling and
isotonic regression to calibrate the outputs. They
come to the conclusion that calibrated boosted trees
outperform other methods in all eight measures.
Random Forest is at the second place. Logistic
regression and naive Bayes fare the worst. They also
find that calibration with either Platt scaling or
isotonic regression enhances SVM, stumps, and
Naive Bayes performance. (Henry Brighton et al.,
2002) start their study by detailing some practical
challenges in classification algorithms. The main
argument they make is that reduction methods have,
historically, been seen as generic solutions to the
issue of instance selection. Their studies of, how
various schemes function and how well they perform
in different contexts, lead them to believe that the
success of a scheme is strongly reliant on the structure
of the instance-space. They contend that one selection
criteria is insufficient to ensure excellent overall
performance. They conclude that for the vast majority
of classification issues, border instances are crucial to
class discrimination. Their algorithm competes with
the best effective current methods in 30 fields.
(Saksham Trivedi et al., 2021) use ML algorithms in
many fields of study. They come to the conclusion
that assignment structure has the greatest impact on
algorithm selection in machine learning. They assert
that SVM and neural networks are more valuable due
to their multidimensionality despite the fact that logic
systems are ordinarily capable of handling
differential/categorical characteristics. For neural
network models and SVMs to achieve maximum
accuracy, a large sample size is required, whereas NB
only requires a small amount of data. Makdah et al.,
Development and Comparative Analysis of an Instance-Based Machine Learning Classifier
435