the popular cluster optimization methods is the El-
bow method (L
´
opez-Rubio et al., 2018) (Bholowalia
and Kumar, 2014), (Kodinariya and Makwana, 2013)
, (Liu et al., 2018). The Elbow method is a visual
method to test the consistency of the best number of
clusters by comparing the difference of the sum of
square error (SSE) of each cluster, the most extreme
difference forming the angle of the elbow shows the
best cluster number. In some of these studies, the fo-
cus is still on optimizing the determination of the best
number of clusters by the Elbow method while the
initial selection of centroid is still random. This al-
lows the number of iterations to place objects in the
cluster based on the center of the new cluster to be
more numerous so that the achievement of similarity
of patterns formed becomes longer.
Much research has been done related to determin-
ing the centroid value to improve the performance of
the K-Means algorithm including the idea of weight-
ing on each cluster variant as min-max K-Means
(Tzortzis and Likas, 2014), there are also studies with
a simple formula through weighting the highest and
lowest averages to be used as a centroid value (Fab-
regas et al., 2017) with better computational perfor-
mance results than the original K-Means, in this study
proposes the use of simple statistical mean and me-
dian formulas in initial determination of the centroid
and combined with the method Elbow to determine
the number of clusters used so that the performance
of the K-Means algorithm is better in terms of the
number of iterations needed and the consistency of
the generated cluster members compared to the origi-
nal K-Means method.
The K-means clustering algorithm in this research
will be implemented in the case study of mapping
data of teaching staff in public schools in districts,
cities in province of Central Java, with this grouping it
can be seen which schools have excess teaching staff
or lack of teaching staff so that they can be used as
a basis for distribution teaching staff as an effort to
equalize teaching staff placement in public schools in
Central Java so that there are no problems with ex-
cess or lack of teaching staff, excessive concentration
of teaching staff in certain areas, and aging teaching
staff population in placement in major city centers be-
cause of the distribution of teaching staff which is un-
even (Szelkagowska-Rudzka, 2018). In this study, it
is assumed that three main groups are formed that rep-
resent deficiency, sufficient and excess conditions, to
improve the performance of the KMeans algorithm in
this study using the elbow method to evaluate the de-
termination of the best number of clusters and com-
bined with determining the initial centroid by compar-
ing the minimum value, the median value , the mean
value and maximum object values of the results of
the comparison of this experiment are used as a de-
terminant of the initial centroid, so it is expected to
reduce the number of iterations to achieve similarity
in the formed cluster rather than using the initial ran-
dom centroid determination.
2 MATERIALS AND METHODS
2.1 Materials
The data used in this study is a sample recapitulation
of Public high school data in Central Java Province
covering 16 of Public Senior High School data in Se-
marang City, 11 of Public Senior High School data
in Semarang Regency and 3 of Public Senior High
School data in Salatiga City and its attributes in-
cluding the name of the school, number of students
(ns), number of teachers (nt), number of study groups
(nsg), number of subjects (nsbc), data obtained from
http: //sekolah.data.kemdikbud.go.id and the Office
of Education and Culture of the Central Java Provin-
cial Government, the data is downloaded in the form
of CSV file.
The tools used in this study are a set of computers
with AMD Dual Core A9-9420 3.6 GHz CPU hard-
ware specifications, 4GB RAM with Windows 10 op-
erating system, Excel applications, Orange Data Min-
ing, Python programming and Visual Studio Code ed-
itors as supporting software
2.2 Methods
In this study conducted using the following stages, the
first stage is problem analysis at this stage a problem
analysis is carried out in the case study of equal dis-
tribution of teaching staff, especially in public schools
in districts, cities in Central Java, which results in the
formulation of the problem needed by mapping the
teaching staff in public schools.
The second stage is literature review, at this stage
a literature study method is conducted which will be
used to clustering data on existing case studies, the re-
sult of this stage is the use of the K-Means clustering
method for grouping data that enables improved per-
formance by determining the best number of clusters
and determining the initial centroid.
The third stage is data collection, the data used
are secondary data obtained from the website of
the Ministry of Education and Culture of the Re-
public of Indonesia at the address http: //seko-
lah.data.kemdikbud.go.id and other data from the
Office of Education and Culture of Central Java
K-Means Clustering Optimization using the Elbow Method and Early Centroid Determination Based-on Mean and Median
235