2.5 ROC Curve
Receiver Operating Characteristic (ROC) curves
show accuracy and compare visual classifications.
ROC expresses confusion matrix. ROC is a two-
dimensional graph with false positives as horizontal
lines and true positive as vertical lines (Vercellis,
2011).
It can be concluded that, one point on the ROC
curve is better than the other if the transverse
direction from the bottom left to the top right in the
graph. The indicator of accuracy is the AUC (Area
Under Curve) value. The level of accuracy can be
diagnosed as follows (Powers, 2011):
a. Accuracy 0.90 - 1.00 = Excellent classification
b. Accuracy 0.80 - 0.90 = Good classification
c. Accuracy 0.70 - 0.80 = Fair classification
d. Accuracy 0.60 - 0.70 = Poor classification
e. Accuracy 0.50 - 0.60 = Failure
2.6 Related Work
The study of the use of data mining to predict the
timeliness of graduating students has been widely
published.
The biggest challenge faced by universities is
reducing the number of students who drop out of
study (Pal, 2012). The number of students who drop
out of study is an indicator of how well academic
performance and management of new student
admission selection. This causes universities to focus
more on the strength of students than the quality of
education. In this study, data mining applications are
applied to produce predictive models for the
management of students dropping out of study, so that
it can be known which students need more support.
The results showed that the machine learning
algorithm was able to build a predictive model
effectively from the data of existing study dropouts.
Performance in academic programs is one of the
most important factors affecting the quality of higher
education available to students (Al-Barrak & Al-
Razgan, 2015). In this study, data mining techniques
were used especially classification to analyze student
scores in different evaluative tasks for structured data
courses. For this purpose, compared three different
classifiers to predict student performance.
Classification techniques are applied here for both
numeric and categorized attributes. The results show
that the model based on the Naïve Bayes algorithm
provides the most accurate predictions with 91%
accuracy to predict student failures in the course.
Other studies show that the most influential
factors in student graduation rates are the Semester
Achievement Index (IPS) and the Total Semester
Credit System (SKS) as a whole and every semester
(Amelia, Lumenta & Jacobus, 2017). Student study
period can be predicted based on factors related to
student academic, such as study programs, semester
achievement index scores and number of credits at
university. The Naïve Bayes algorithm used can
determine the prediction of the study period of
students with the level of Accuracy on the algorithm
testing worth 85.17% on the average value of testing
in five semesters.
Another study states that one of the biggest
challenges facing higher education today is predicting
student academic paths (Abu-Oda & El-Halees,
2015). Many higher education systems are unable to
detect student populations that tend to break up due
to lack of intelligence methods to use information,
and guidance from the university system. Data
mining methods to classify and predict dropout
students, proposed two different classifiers, namely
Decision Tree (DT), and Naive Bayes (NB), and
trained using the dataset that has been collected. The
results showed that the accuracy of DT reached
98.14%, while NB reached 96.86%.
Research conducted by Sulistiono and Defiyanti
shows that the Naïve Bayes algorithm has the highest
level of accuracy (Sulistiono and Defiyanti, 2015).
The accuracy of the Naïve Bayes algorithm is 93.58%
compared to the C4.5 algorithm of 93.05 and the
Neural Network of 89.56%. That is why the author
uses the Naïve Bayes classification method to
conduct this research. Naïve Bayes classification
method was chosen because the Naïve Bayes method
is a simple statistical probability method but produces
accurate results.
3 PROPOSED METHOD
The method used in this study is an experiment that
includes investigation of causal relationships using
self-controlled testing (Dawson, 2009). This study
aims to get a prediction model for the timeliness of
graduating prospective students. Because recognized
/ accepted research must follow recognized rules
(Dawson, 2009), then in this study conducted by
following the stages in data mining which has six
phases of CRISP-DM (Cross Industry Standard
Process for Data Mining) (Chapman et al., 2000) The
stages are as follows:
1. Business Understanding
The first stage is understanding the goals and needs
from a business point of view, then translating this
knowledge into defining problems in data mining.
Furthermore, plans and strategies will be determined
to achieve these goals.