related to the purpose of clustering in the context, The
Section 4 describes the problem, methodology,
implementation and experimentation. The Section 5
describes the results and finally the Section 6,
concludes the findings and lists scope for future
research.
2 RELATED WORK
The progress in human-computer interaction has led
to the development of novel approaches for
examining graphical data in a dynamic manner,
allowing users to have adaptable control. Although
the majority of this research focuses on the
presentation of statistical data, there has also been
significant collaboration with advancements in
information visualization as a whole. This is
especially true for the representation of extensive
networks, hierarchies, databases, and text, where the
difficulties of handling massive amounts of data
persistently arise (Hearn & Baker, 2015; Al-Barrak & Al-
Razgan, 2016).
The field of statistical graphics encompasses the
creation of various contemporary methods for
visualizing data, including bar and pie charts,
histograms, line graphs, time-series plots, contour
plots, and other techniques. Thematic cartography
evolved from individual maps to extensive atlases,
which portrayed data on diverse subjects such as
economics, society, ethics, medicine, and physical
features. This advancement also offered innovative
methods of representing information through various
symbols (Hearn & Baker, 2015; Johnson & Wichern,
2007).
Most of the work related to the visualization of
the students’ performance are focusing on the user
interface for the students to visualize their
performance rather than helping the evaluator to
visualize the insights in the dataset of the marks.
These simply displays the marks in 3D or 2D without
performing principal component analysis. Some
works related to visualization of data mining and
predictions of the students’ performance (Al-Barrak &
Al-Razgan, 2016; Misailidis et al., 2018) are helping the
both the students and the evaluators. The work done
by Humphries et al. (2006) helps the students to
visualize their grade as their performance and the
work by Deng et al. (2019) is course specific and does
not combine more number of related or selected
courses.
Most of the learning analytics tools and discussed
in (Darcy, 2022; Paolucci et al., 2024; Mukred et al., 2024;
Atif et al., 2013) displays bar graphs, pie-charts etc.
depicting the distribution of learners’ performance
including performance improvement (or degradation)
over time, but these tools do not display 3D scalar and
vector plots for most discriminating courses of study.
3 PURPOSE OF CLUSTER
ANALYSIS
Cluster analysis aims to condense a vast dataset into
significant subgroups of individuals or things. The
division is achieved by categorizing the objects based
on their similarity across a predetermined set of
parameters. Anomalies pose a challenge to this
methodology, frequently arising from an excessive
number of extraneous factors. It is essential for the
sample to accurately reflect the population, and it is
preferable for the components to be independent of
each other. There are three primary clustering
techniques: hierarchical, which follows a tree-like
procedure suitable for smaller data sets; non-
hierarchical, which necessitates specifying the
number of clusters in advance; and a hybrid approach
that combines both methods. The development of
clusters is guided by four primary principles:
distinctiveness, accessibility, measurability, and
profitability (sufficiently significant to have an
impact).
In the present work for 3D visualization we
cluster the marks about the point in 3D representing
the mean of scores in three subjects. These three
subjects are selected by Principal Component
Analysis (Johnson & Wichern, 2007).
4 PROBLEM, METHODOLOGY,
AND IMPLEMENTATION
The problem dealt in this paper is a multivariate
problem so that students’ performance can be graded
using this. Cluster analysis technique is employed to
solve this problem.
A. Problem Description
The problem is to graphically represent marks
obtained by different students. We take a case of four
students. Each student registers in three different
subjects. Each student attempts fixed number of tests,
given by the instructor, in each of the three subjects.
So input for the problem is four text files, one for each
student namely student1.txt, student2.txt,
student3.txt, student4.txt. In other words all the data
related to marks obtained by a particular student is