
6 CONCLUSIONS
This paper proposes to analyze the performance
of three unsupervised learning clustering algorithms
with three data preprocessing methods composed en-
tirely of nominal data to classify the performance of
students who participated in the ERCE 2019 test, in
order to generate groups according to their character-
istics that evidence low academic performance. The
proposed algorithms were k-means, BIRCH and ag-
glomerative clustering. On the other hand, the pro-
posed preprocessing methods were Convert to Nu-
meric Values and Apply Gower Distance, Apply
Gower Distance to Nominal Values, and Natural Lan-
guage Processing with vectorization. Afterwards, the
analysis shows that the k-means algorithm is the one
that presents the best performance in the four metrics.
The Agglomerative Clustering algorithm was se-
lected because it uses the Hierarchical clustering
method to generate the clusters. Its silhouette re-
sults are similar to the optimal results obtained by
K-means, which is why it was selected over BIRCH.
As for the preprocessing method, Convert to Numeric
Values and Apply Gower Distance was chosen as it
had the best results in the four metrics.
For future works, other preprocessing techniques
should be tested to work with databases composed en-
tirely or mostly of nominal data, as well as testing
with different student quantities and variables. Fur-
thermore, using chatbots like in other areas (Solis-
Quispe et al., 2021) or Question Answering Models
(Burga-Gutierrez et al., 2020; Rodriguez et al., 2023)
to improve the communication with the students.
REFERENCES
Agasisti, T., Antequera, G., and Delprato, M. (2023).
Technological resources, ict use and schools effi-
ciency in latin america -insights from oecd pisa 2018.
International Journal of Educational Development,
99:102757.
Burga-Gutierrez, E., Vasquez-Chauca, B., and Ugarte, W.
(2020). Comparative analysis of question answering
models for HRI tasks with NAO in spanish. In SIM-
Big, volume 1410 of Communications in Computer
and Information Science, pages 3–17. Springer.
Chang, I.-C., Yu, T.-K., Chang, Y.-J., and Yu, T.-Y. (2021).
Applying text mining, clustering analysis, and latent
dirichlet allocation techniques for topic classification
of environmental education journals. Sustainability,
13:10856.
Falcon, S., Admiraal, W., and Le
´
on, J. (2023). Teachers’
engaging messages and the relationship with students’
performance and teachers’ enthusiasm. Learning and
Instruction, 86.
Flores-Mendoza, C., Ardila, R., Gallegos, M., and
Reategui-Colareta, N. (2021). General intelligence
and socioeconomic status as strong predictors of stu-
dent performance in latin american schools: Evidence
from pisa items. Frontiers in Education, 6.
Miguez, D. (2023). ¿por qu
´
e var
´
ıa el desempe
˜
no entre estu-
diantes de baja condici
´
on social? factores escolares y
dom
´
esticos asociados al logro en seis pa
´
ıses sudamer-
icanos. Education Policy Analysis Archives, 31.
Moubayed, A., Injadat, M., Shami, A., and Lutfiyya, H.
(2021). Student engagement level in e-learning en-
vironment: Clustering using k-means. J. Distance
Educ., 34.
Rodriguez, R. A., Ferroa-Guzman, J., and Ugarte, W.
(2023). Classification of respiratory diseases us-
ing the NAO robot. In ICPRAM, pages 940–947.
SCITEPRESS.
Rusteholz, G., Mediavilla, M., and Pires Jim
´
enez, L.
(2021). Impact of bullying on academic performance:
A case study for the community of madrid. SSRN
Electronic Journal.
S Sani, N., Sani, M. A. A., Abd Rahman, A. H., Nafuri, F.,
and Zainudin, N. (2022). Clustering analysis for clas-
sifying student academic performance in higher edu-
cation. Applied Sciences, 12:9467.
Solis-Quispe, J. M., Quico-Cauti, K. M., and Ugarte, W.
(2021). Chatbot to simplify customer interaction in e-
commerce channels of retail companies. In ICITS (1),
volume 1330 of Advances in Intelligent Systems and
Computing, pages 561–570. Springer.
Talib, N., Majid, N., and Sahran, S. (2023). Identification of
student behavioral patterns in higher education using
k-means clustering and support vector machine. Ap-
plied Sciences, 13:3267.
Wulff, P., Buschh
¨
uter, D., Westphal, A., Mientus, L.,
Nowak, A., and Borowski, A. (2022). Bridging the
gap between qualitative and quantitative assessment
in science education research with machine learning
— a case for pretrained language models-based clus-
tering. Journal of Science Education and Technology,
31.
Zuo, J. and Kummer, M. G. C. (2022). A new student be-
havior analysis method based on k-means algorithm
and consumption data of campus smart card. In
FSDM, volume 358 of Frontiers in Artificial Intelli-
gence and Applications, pages 117–125. IOS Press.
Classification of Peruvian Elementary School Students with Low Achievement Problems Using Clustering Algorithms and ERCE Evaluation
43