Authors studied academic performance using data
from 210 undergraduate students. These criteria
determine a student's grade: Research shows that
considering pre-university marks and first- and
second-year grades can improve graduation
prediction in the final semester (Daud et al, 2017).
Researchers predicted the dropout rate of
technology undergraduates using their samples
Pradeep et al, 2015). WEKA Attribute Selection
Algorithms may reduce the effects of applied
attributes. Post-enrollment measures like attendance,
class attentiveness, and exam performance are
perhaps the most crucial factors in decision-making.
Higher education institutions collect vast amounts of
student data, which they use in retention programs
and predictive and modeling approaches to achieve
effective results; age, gender, and religion are not
particularly important for academic performance
prediction. The college collects a student's high
school transcript, SAT score1, demographic data, and
proof of address when they enroll. After collecting the
data, you'll learn about the degree recipient's stated
major, academic level, course topics, and grade. Main
course modules in an online learning management
system (LMS) like Moodle or BlackBoard. LMS lets
users establish group discussions, view course
materials, and participate in other course activities
including online queries and alternative operations.
3 METHODS
This methodology section consists of data acquisition
and classification phases. Data acquisition includes
collecting student data on the basis of predefined
attributes and classification is performed using Naïve
bayes, linear regression and random forest methods.
The Bayesian probability theorem is used in the
Naive Bayes classifier (Sokkhey et al, 2020), which
results in a straightforward and efficient approach to
the classification problem. NB Classifiers are often
used in the early stages of the process of text retrieval
for the purpose of text categorization. The simplicity
and scalability of the Nave Bayes classifier are two of
its primary advantages. The NB classifier considers
all of the characteristics in the dataset to be causally
independent of one another, which results in an
increased probability of the class variable. One is able
to figure out what the conditional probability is by
using Bayes' theorem. The conditional probability of
an event is its likelihood given that it is connected to
one or more other occurrences. The conditional
probability of an event might be positive or negative.
Given a hypothesis H and some evidence E, Bayesian
law states that there is a correlation between the
probability of the hypothesis before receiving the
evidence (denoted by the letter P(H)) and the
probability of the hypothesis after acquiring the
evidence (denoted by the letter P(H|E)).
(|) = (|).() /()
When it comes to statistics, the method of
machine learning that goes by the name Logistic
Regression (LR) is a well-liked option. Within the
context of this model, weight features are taken from
the input, logs are produced, and the data is connected
linearly. LR is a statistical method that is used to solve
problems involving the categorising of subjects into
two groups. Classes are presumed to be almost
identically different from one another. It makes use of
the logistic function, which is more frequently
referred to as the sigmoid function, in order to turn
predictions into probabilities. Foretelling the
occurrence of a binary event is accomplished via the
use of a logit function (Peng et al, 2021).
RF is a kind of ensemble classifier that generates
a collection of DT that may or may not be connected
to one another by use randomization as the means to
do so. This methodology was conceived of by
Breiman and Cutler. It develops a forest of
hypothetical decision-making routes by using the
ensemble learning technique. Utilizing a random
forest, which first generates a number of decision
trees before combining them together into a single
structure, may help provide accurate results for
predictive modelling. The fact that a random tree may
be utilised to address both classification and
regression issues is the primary advantage of using its
use. The mathematical expression for a random forest
looks like this: h (y,), k =1, 2.....M, where y is the
variable that is being input. Every DT uses a random
vector as the measure, surprising quality of instances,
and selects a subset of the sample data set at random
to serve as the training dataset. The number of
decision trees (DTs), which are represented by k in a
random forest (RF) approach, the number of samples
(n) associated with each DT in the training dataset
(M), and the number of features (m) supplied by the
sample are all denoted by the variable "m." (where
m<< M) (Dhanka et al, 2021).
4 RESULT ANALYSIS
The experimental inquiry takes advantage of the data
gathering that was done at the university and
available at UCI. This data collection has 285
different samples in total. This data collection has a
total of seventeen qualities that set it apart from others
Prediction of Student Academic Performance in Higher Education Institutions Using Data Mining
653