Students’ Performance in Learning Management System: An Approach

to Key Attributes Identiﬁcation and Predictive Algorithm Design

Dynil Duch

1,2 a

, Madeth May

1 b

and S

ebastien George

1 c

LIUM, Le Mans Universit

e, 72085 Le Mans, Cedex 9, France

Institute of Digital Research & Innovation, Cambodia Academy of Digital Technology, Phnom Penh, Cambodia

Keywords:

Deep Analytics, Knowledge Extraction, Student Performance Prediction, Moodle, Random Forest Classiﬁer,

Predictive Algorithm, Data Normalization, Student Engagement Activities, Student Achievement.

Abstract:

The study we present in this paper explores the use of learning analytics to predict students’ performance in

Moodle, an online Learning Management System (LMS). The student performance, in our research context,

refers to the measurable outcomes of a student’s academic progress and achievement. Our research effort aims

to help teachers spot and solve problems early on to increase student productivity and success rates. To achieve

this main goal, our study ﬁrst conducts a literature review to identify a broad range of attributes for predicting

students’ performance. Then, based on the identiﬁed attributes, we use an authentic learning situation, lasting

a year, involving 160 students from CADT (Cambodia Academy of Digital Technology), to collect and analyze

data from student engagement activities in Moodle. The collected data include attendance, interaction logs,

submitted quizzes, undertaken tasks, assignments, time spent on courses, and the outcome score. The collected

data is then used to train with different classiﬁers, thus allowing us to determine the Random Forest classiﬁer

as the most effective in predicting students’ outcomes. We also propose a predictive algorithm that utilizes

the coefﬁcient values from the classiﬁer to make predictions about students’ performance. Finally, to assess

the efﬁciency of our algorithm, we analyze the correlation between previously identiﬁed attributes and their

impact on the prediction accuracy.

1 INTRODUCTION

In recent years, the use of Learning Management

Systems (LMS) has grown in popularity primarily

for their ability to manage and organize digital in-

formation related to teaching and learning. Educa-

tional institutions use LMS platforms such as Moo-

dle, an open-source system widely adopted in the

education sector, to facilitate the creation, distribu-

tion, and management of online learning materials

(Costaa et al. (2015)). In addition, to support teaching

practices, various studies have employed data min-

ing techniques to predict students’ performance (SP)

in Moodle and other LMS platforms, such as F

elix

et al. (2018). These studies have demonstrated that

making use of data from a learning environment and

efﬁcient predictive techniques can assist teachers in

evaluating SP. For instance, a teacher can identify

https://orcid.org/0000-0002-7857-5811

https://orcid.org/0000-0002-8527-7345

https://orcid.org/0000-0003-0812-0712

areas where students may be struggling, and imple-

ment targeted interventions to improve student out-

comes. Therefore, the ability to predict SP can sig-

niﬁcantly impact learning practices in general, as it

allows for more personalized and adaptive learning

experiences that can better meet the needs of indi-

vidual students. However, it is challenging to iden-

tify the crucial data attributes (hereafter referred to as

”key attributes” or ”attributes”) from online learning

activities because of the large amount of information

in the LMS (Venkatachalam et al. (2011)), especially

when the goal is to obtain a more accurate SP predic-

tion (Albreiki et al. (2021)). Moreover, predicting SP

and its associated issues have always been a consid-

erable concern in education (Ramesh et al. (2013)).

To state an example, both the nature and quality of

data strongly impact the efﬁciency of a predictive ap-

proach. Then, the outcomes of using the predictive

techniques heavily rely on teachers’ technical skills.

The research question in this study is: How can

teachers employ predictive techniques to evaluate SP

effectively? This question aims to explore the poten-

Duch, D., May, M. and George, S.

Students’ Performance in Learning Management System: An Approach to Key Attributes Identiﬁcation and Predictive Algorithm Design.

DOI: 10.5220/0012754200003756

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 13th International Conference on Data Science, Technology and Applications (DATA 2024), pages 285-292

ISBN: 978-989-758-707-8; ISSN: 2184-285X

285

tial beneﬁts and limitations of using predictive analyt-

ics in the educational context and identify best prac-

tices for implementing these techniques to support

student learning. By addressing this research ques-

tion, educators and researchers can better understand

the concrete applications of predictive analytics in the

classroom, ultimately contributing to more effective

teaching practices and improved student outcomes.

In the context of this research, SP refers to

the measurable outcomes of a student’s academic

progress and achievement, particularly in relation to

their use of an online LMS. It includes factors such as

grades, test scores, completion of online activities and

assignments, engagement with digital learning mate-

rials and resources within the LMS. It is essential to

distinguish academic performance from performance

in a broader sense. For example, while academic

performance focuses speciﬁcally on a student’s suc-

cess in school or university, performance in a broader

sense could refer to a range of activities or tasks in

various settings. It is also crucial to note that SP is

not solely focused on positive outcomes. In addition

to measuring success, SP can also be used to iden-

tify speciﬁc learning activities or overall experience

in using the LMS so that teachers can take appropri-

ate actions in both assistance and evaluation tasks.

To address this research question, we have two

main objectives:

1. Identify the key attributes in LMS data that are rel-

evant for predicting students’ performance. Many

researchers have identiﬁed the speciﬁc attributes

for predicting SP as presented later in section 2.

However, there is a need for more identiﬁcation

of various attributes that could help educators or

teachers with the selection process. Our research

will conduct a literature review by analyzing var-

ious data points, such as grades, assessment re-

sults, engagement metrics, demographic informa-

tion, and teacher pedagogies, to determine which

factors could be used for predicting SP.

2. Propose an appropriate prediction algorithm to

forecast SP based on the critical attributes identi-

ﬁed in the ﬁrst objective. This includes using pre-

dictive algorithms and data mining techniques to

analyze the data and identify patterns and trends

that can be utilized to predict future SP.

To test our research question, we have developed

the following hypotheses:

1. Hypothesis 1: there is a statistically signiﬁcant

relationship between student attendance (mea-

sured by the number of modules completed) and

academic performance (measured by the ﬁnal

grades) in Moodle. This hypothesis investigates

how consistent attendance inﬂuences SP predic-

tion.

2. Hypothesis 2: the extent of student interaction

with the LMS (measured by the number of inter-

action logs) has a statistically signiﬁcant impact

on academic performance (measured by the ﬁnal

grades). This hypothesis aims to explore whether

higher levels of engagement with the LMS posi-

tively inﬂuence SP.

3. Hypothesis 3: the number of quizzes and tasks

submitted by students statistically impacts their

ﬁnal academic performance. This hypothesis ex-

amines whether students’ active participation in

quizzes and tasks within the LMS signiﬁcantly

predicts their overall performance.

We use data from the Cambodia Academy of Dig-

ital Technology (CADT) to conduct our ﬁrst tests,

as detailed in section 3. It is worth mentioning that

the access and the use of students’ data by all re-

search partners from France are regulated. Our re-

search complies with both Cambodian local regula-

tions and European GDPR. All personal and sensitive

information has been stripped of all data samples we

use in our research.

The rest of the paper presents our approach to

identifying a wide range of key attributes and ap-

propriate prediction algorithms. Our hypotheses will

help us better understand the impact of attributes we

have collected from CADT. The second section pro-

vides an overview of the most relevant related works.

We examine critical attributes for predicting SP and

explore the technology context and methodology ap-

proach in sections 3. We discuss our experimental

results in section 4 and conclude this paper by high-

lighting signiﬁcant areas that we have been working

on since completing the tests we have conducted with

our ﬁrst three hypotheses.

2 RELATED WORK

This section is dedicated to a comprehensive review

of current research on predicting student outcomes

using machine learning and data mining techniques

in educational settings that exploit LMS data, such

as Moodle. The studies we mentioned here explore

various aspects of SP, such as academic performance,

retention, and learning behaviors. The studies also

cover different techniques, including Decision Trees,

Neural Networks, Logistic Regression, Support Vec-

tor Machines, Na

ıve Bayes, and Random Forests.

A meta-study by F

elix et al. (2018); Namoun and

Alshanqiti (2020) systematically reviewed papers that

DATA 2024 - 13th International Conference on Data Science, Technology and Applications

286

used data mining and machine learning to predict stu-

dent outcomes as a proxy for student SP. The re-

view stated that existing studies mainly focused on the

course level, using predictors such as previous aca-

demic performance, demographic data, and course-

related variables. Machine learning algorithms, in-

cluding Decision Trees, Neural Networks, Support

Vector Machines, Na

ıve Bayes, and Random Forests,

were found to accurately predict student outcomes,

with some studies reporting prediction accuracies of

over 90%. However, the review also noted limitations,

such as the lack of validation of the models on new

datasets, limited explanatory power, lack of standard-

ized evaluation metrics, and potential ethical concerns

related to using sensitive student data.

In another study, Felix et al. (2019) used a dataset

of 1,307 students’ activity logs in a course, includ-

ing variables related to student interactions in forums,

chats, quizzes, activities, time spent on the platform,

and grades. They built a predictive model of student

outcomes using Na

ıve Bayes, Decision Trees, Mul-

tilayer Perceptron, and Regression algorithms, with

the Na

ıve Bayes model performing the best with an

accuracy of 87%. The study found that the number

of interactions with the system, attendance, and time

spent on the platform were essential variables in pre-

dicting student outcomes. Nevertheless, the study was

limited to a single course and did not consider other

factors inﬂuencing student outcomes, such as prior

knowledge or motivation.

The study of Hirokawa (2018) used machine

learning methods like Random Forests, Support Vec-

tor Machines, and Decision Trees to forecast aca-

demic achievement. The study unveiled that previous

academic records were essential for predicting aca-

demic performance, followed by the student’s gender

and age. At the same time, other attributes, such as

extracurricular activities and family background, had

a lesser impact. Yet, the study showed some limita-

tions, as it did not focus on data from LMS activities

and excluded inﬂuence factors such as teacher peda-

gogies.

In the same context, Gaftandzhieva et al. (2022)

used a machine learning algorithm to predict stu-

dents’ ﬁnal grades in an Object-Oriented Program-

ming course using data from Moodle LMS activities.

The study found that the Random Forest algorithm

had the highest prediction accuracy of 78%, and at-

tendance was strongly correlated with ﬁnal grades.

The study’s weaknesses, however, included its limited

sample size and singular course focus. Other stud-

ies have used data mining and machine learning to

predict the likelihood of students dropping out of a

course (Quinn and Gray (2019)), the likelihood of stu-

dent’s success in a course (Arizmendi et al. (2022)),

or predicting student grades using both academic and

non-academic factors (Ya

gcı (2022)). Some studies

have also focused on predicting student outcomes in

speciﬁc contexts, such as interaction logs (Brahim

(2022)), assessment grades, and online activity data

(Alhassan et al. (2020)), or based on teacher pedago-

gies (Trindade and Ferreira (2021)).

Thus far, the studies we cover demonstrate the

potential of data mining and machine learning tech-

niques to predict various student outcomes in educa-

tional settings. However, they also highlight chal-

lenges such as the need for extensive and diverse

datasets, the lack of validation of models on new

datasets, the lack of study on predicting SP at the pro-

gram level, and potential ethical concerns related to

using sensitive student data. Furthermore, the focus

on a single course at the course level did not lead the

educators to make a ﬁnal decision on overall students’

performance at the end of the program or academic

year. Meanwhile, predicting students’ performance

at the program level is demanded. Based on this ob-

servation, our research examines the critical attributes

in LMS data relevant to predicting SP at the program

level. Focusing on the variables collected from Moo-

dle, our research investigates the relationship between

student engagement activities and SP, presented later

in section 3.2.1.

3 METHODOLOGICAL

APPROACH

3.1 Attributes Identiﬁcation

The attributes identiﬁed in our study are derived from

an extensive literature review as previously presented

in section 2. We dedicated time to scrutinize vari-

ous papers that use different attributes to predict aca-

demic outcomes, success or dropout rates. We un-

covered six composite attributes commonly used in

predictive models: Demographic Details, Education

Information, Family Expenditure, Habits, College Fa-

cilities, and Teacher Pedagogy, as illustrated in Figure

1. These attributes guide teachers, providing a foun-

dational understanding to navigate the numerous at-

tributes available in an LMS and their relationships.

It is crucial to note that Figure 1 does not represent

an exhaustive list of all possible predictors of SP. In-

stead, it serves as a starting point to facilitate decision-

making for teachers in selecting pertinent attributes

from the multitude available in LMS when aiming to

predict SP. The intention is to offer researcher and

institutions the ability to decide when and which at-

Students’ Performance in Learning Management System: An Approach to Key Attributes Identiﬁcation and Predictive Algorithm Design

287

Figure 1: The attributes for predicting SP.

tributes to use based on relevance, practicality, and

ease of interpretation within their speciﬁc context.

After a thorough and rigorous process of attribute

identiﬁcation, we have gained valuable knowledge to

focus our effort on developing our predictive model.

Speciﬁcally, attributes presented later in section 3.2.1

have been prioritized from the broader set deﬁned in

our model. This strategic selection aims to provide a

more practical and targeted approach for both teach-

ers and the research team at CADT and in France.

3.2 Predictive Approach

This research uses data mining techniques and predic-

tive algorithms within the Moodle environment. The

methodology adapted for this research is divided into

several steps, as shown in Figure 2.

Figure 2: The predictive approach.

3.2.1 Data Extraction and Preprocessing

The dataset used in our research comprises two pri-

mary sources: the Moodle LMS and Google Sheets.

The Moodle data has around 1000 students enrolled in

both short course training and a bachelor program, to-

talling around 5 million records. By conducting a pre-

liminary analysis, we kept only the bachelor’s degree

data because the data from the short course training

does not contain any assessment score or ﬁnal score,

which is the critical target variable. Moreover, the stu-

dents from short course training had less interaction

with Moodle. Our study aimed to predict SP at the

program level. Among all registered students, many

were enrolled in short courses or workshops, which

did not provide a comprehensive view of their per-

formance within a program. Short courses typically

focus on speciﬁc topics and may only cover some as-

pects required to evaluate a student’s overall perfor-

mance in a program. To ensure that our analysis and

predictions were based on a thorough understanding

of SP in the learning environment, we focused on 160

students representing a diverse range of courses, in-

cluding Linear Algebra, Discrete Mathematics, Prob-

ability and Statistics, C Programming Language, Vi-

sual Art, Soft Skills, and Information Technology Es-

sentials. Because the short course training data could

not be used, we selected only 2 million records from

the bachelor program. Queries were used to count

the number of records in each attribute to obtain data

at the program level for both the ﬁrst and second

DATA 2024 - 13th International Conference on Data Science, Technology and Applications

288

terms of the ﬁrst-year program. Finally, we obtained

a dataset with 310 records from Moodle. Afterwards,

another set of 310 records was collected from Google

Sheets, representing this time the ﬁnal scores of stu-

dents enrolled in both terms of their ﬁrst year of the

bachelor’s degree, aligning with the data collected

from Moodle during the academic year 2022. The

dataset of two terms represents two program levels.

The attributes in the dataset provide information on

various aspects of a student’s engagement and perfor-

mance in all courses. The attributes are collected and

counted with queries are attendance, number of inter-

action log, total quiz submitted, total task submitted,

total assignment submitted, time spent on course, and

outcome score.

3.2.2 Selection Feature

Feature selection is an essential step in predicting SP

as it allows for the identiﬁcation of the most rele-

vant attributes that contribute to the outcome. Several

feature selection methods are available, such as the

Pearson correlation coefﬁcient, Spearman’s rank cor-

relation coefﬁcient, mutual information, and recursive

feature elimination. Each technique offers advantages

and disadvantages depending on the dataset and the

problem being addressed.

Our study employs the Pearson correlation coef-

ﬁcient as a feature selection method. By employ-

ing the Pearson correlation coefﬁcient, the informa-

tion related to the education information category,

speciﬁcally engagement activities, as found in col-

lected data from CADT’s Moodle. Therefore, our

study will investigate the relationship between stu-

dent engagement activities and performance using the

attributes already mentioned in section 4.2.1. It is

worth reminding that these attributes are signiﬁcant

predictors of student outcomes in the existing litera-

ture. However, there may be differences between the

attributes discussed in the literature and those in our

dataset. Nonetheless, our primary goal was to iden-

tify and analyze attributes that could be feasibly ob-

tained from Moodle while still providing valuable in-

sights into SP. In addition, by focusing on Moodle-

based data, we aimed to develop a model easily ap-

plied and adapted in similar LMS environments. Plus,

the selected Moodle-based attributes still capture es-

sential aspects of SP. Our ﬁndings can contribute to

the broader understanding of factors that impact aca-

demic success in online learning environments.

3.2.3 Classiﬁer

To predict student outcomes and select the best clas-

siﬁer, several classiﬁcation methods are used with our

dataset for comparison. As found in our studies in

related work, the most commonly used data mining

classiﬁers include Decision Trees, Random Forest,

Neural Networks, Naive Bayes, and Support Vector

Machine. In this research, we have made comparisons

of those ﬁve classiﬁers with our dataset.

3.2.4 Performance Evaluation Measures

To evaluate the performance of our classiﬁer, we em-

ployed the confusion matrix, which is a widely used

as an evaluation measure in machine learning for

summarizing classiﬁer performance. It provides in-

formation about the number of true positive, true neg-

ative, false positive, and false pessimistic predictions

made by the classiﬁer. According to Aguiar et al.

(2014), this information is presented in a matrix for-

mat, where each row represents the actual class, and

each column represents the predicted class. The en-

tries in the matrix provide insight into the classiﬁer’s

ability to predict each class and its tendency to mis-

classify instances. This evaluation measure is crucial

in unbalanced datasets, such as in this case, where the

number of failing students is much smaller than that

of successful students F

elix et al. (2018).

3.2.5 Student’s Performance Predictive

Algorithm

We have deﬁned the algorithm formula for predicting

SP as a mathematical equation that uses a combina-

tion of various student attributes and their correspond-

ing coefﬁcient values to calculate a predicted score.

The formula starts by taking the sum of all attribute

values and then normalizing each value by dividing

it by the range of possible values for that attribute.

Then, each normalized attribute value is multiplied by

its corresponding coefﬁcient value, representing the

weight or importance of that attribute in the overall

prediction. Finally, the sum of all these weighted at-

tribute values is divided by the sum of all the coef-

ﬁcients to arrive at a ﬁnal performance score. This

score can then be used to classify students as likely to

succeed or likely to struggle in their studies.

sp =

∑

i=1

− min

max

− min

(1)

where











sp, x, C ∈ R

0 ≤ sp, C ≤ 1

∑

i=1

= 1

sp the student’s performance

the value of attribute i

the coefﬁcient of attribute i

Students’ Performance in Learning Management System: An Approach to Key Attributes Identiﬁcation and Predictive Algorithm Design

289

max

the maximum value of attribute i

min

the minimum value of attribute i

N the number of attributes

This approach for predicting SP is based on math-

ematical modeling and statistical analysis principles;

by using a weighted average calculation incorporat-

ing multiple variables, the algorithm can provide a

more accurate and reliable prediction of a student’s

likely performance. Moreover, the normalization of

attribute values ensures that all variables are treated

equally, regardless of their scales or units of measure-

ment.

4 EXPERIMENTAL RESULTS

The experimental results of our research on assisting

teachers in using predictive techniques to evaluate the

student’s performance at CADT are as follows:

Figure 3: The accuracy of each classiﬁer.

The data from CADT are analyzed with ﬁve dis-

tinct classiﬁers (as previously stated in section 4.2.4)

to examine the dataset and assess each model’s accu-

racy. Based on the outcome shown in Figure 3, the

best two classiﬁers are the Decision Tree classiﬁer,

with an accuracy of 87.22%, and the Random Forest

classiﬁer achieved a higher accuracy of 89.44%.

While accuracy is a common and intuitive met-

ric for evaluating classiﬁer performance, it has limita-

tions, especially in class imbalances or unequal mis-

classiﬁcation costs. Performance evaluation metrics,

such as confusion matrices, provide a more detailed

and nuanced understanding of how well a classiﬁer is

performing. Next, we used confusion matrix to eval-

uate the classiﬁers. Based on the results in Figure 4,

we demonstrated that both Random Forest and Deci-

sion Tree offer good accuracy. Therefore, for the pur-

pose of comparison, we selected the prediction classi-

ﬁers, which include Decision Tree and Random For-

est. However, the latter was chosen for predicting SP

due to its superior performance.

Figure 4: The comparison of classiﬁers accuracy of Ran-

dom Forest and Decision Tree.

4.1 Attribute Coefﬁcients

At the end of the classiﬁer performance evaluation,

we obtained the coefﬁcients of feature values for

each attribute to illustrate its importance in predict-

ing SP when using decision trees and random for-

est classiﬁers, as depicted in Figure 5. The co-

efﬁcients indicate the weight each attribute has in

the classiﬁers. For example, the coefﬁcient val-

ues for the number of interaction log, and time

spent on course are higher for both classiﬁers, in-

dicating that these attributes are essential and im-

pactful in determining SP. On the other hand, the

coefﬁcients for the attendance, total quizzes

submitted, total tasks submitted, and total

assignments submitted are relatively low because

they are already factored into the ﬁnal grade. This

indicates that these attributes have little importance

in inﬂuencing SP prediction, even if some teachers

might utilize them outside of Moodle.

Figure 5: The coefﬁcient of each attribute of Decision Tree

and Random Forest.

DATA 2024 - 13th International Conference on Data Science, Technology and Applications

290

4.2 Veriﬁcation of Hypotheses Through

Regression Analysis Results

The results we have analyzed and presented in the

previous sections can serve as a valuable guide for

teachers, assisting them not only in selecting the key

attributes but also in understanding the relevance, de-

gree of importance and impact of each attribute in pre-

dicting SP. The last step in our study is to examine the

independent variables’ P-value and conﬁdence inter-

val (CI) to verify our hypotheses using the regression

analysis results as shown in Table 1.

Table 1: Regression results.

Hypo. t-statistic P-value CI (95%)

H1 16.782 0.000 [2.269, 2.870]

H2 32.168 0.000 [5.133, 5.800]

H3 24.816 0.000 [3.899, 4.569]

Hypothesis 1. According to the regression results

in Table 1, the P-value for the independent variable

attendance with the dependent variable grade is

0.000. It indicates a statistically signiﬁcant relation-

ship between attendance and SP. On top of that, the

conﬁdence interval of [3.899, 4.569] suggests that the

effect of attendance on the total quiz submitted is pos-

itive and signiﬁcant.

Hypothesis 2. Based on the regression results in

Table 1, the P-value for the independent variable

number of interaction log with the dependent

variable grade is 0.000, indicating a statistically sig-

niﬁcant relationship between the two variables. Fur-

thermore, the conﬁdence interval of [5.133, 5.800]

shows that the effect of number of interaction

log on grade is positive and signiﬁcant.

Hypothesis 3. As illustrated in Table 1, the P-

value for the independent variable time spent on

course with the dependent variable grade is 0.000,

showing a statistically signiﬁcant relationship be-

tween the two variables. However, the conﬁdence

interval of [3.899, 4.569] indicates that the effect of

time spent on the course on SP is positive and signiﬁ-

cant.

In conclusion, all three hypotheses are supported

by the regression results, with all independent vari-

ables showing a statistically signiﬁcant impact on

their respective dependent variables. Our ﬁndings

suggest that the utilization of these key attributes and

the prediction algorithm can assist teachers at CADT

in assessing SP and identifying those at risk of not

performing well. Indeed, the empirical validation we

provided here can contribute to aspects beyond SP

prediction. For instance, by understanding student

performance and obtaining pertinent data for the anal-

ysis process, educators can adapt and personalize in-

terventions and support strategies for a more targeted

approach to student success. We hope that the in-

tegration of these ﬁndings into educational settings,

where data-driven decision-making is a powerful tool,

will transform our teaching practices, leading to better

academic performance and well-being for students.

5 CONCLUSION AND FUTURE

WORK

Our research aimed to assist teachers in identifying

key attributes and choosing prediction algorithms to

evaluate students’ performance. Following a rigorous

literature review, our research effort also included an

empirical study. Indeed, data from an authentic learn-

ing situation at CADT has been used in our ﬁrst at-

tempt to gain a better understanding of predictive an-

alytics in education, which has become increasingly

important. On top of that, through our analysis, we

demonstrated the identiﬁed key attributes, as listed in

section 4.2.1, have a statistically signiﬁcant impact on

student performance. We also determined that Ran-

dom Forest models are effective in predicting student

performance.

The major contribution of our research effort can

be summarized in two aspects. First, whereas pre-

vious studies have primarily emphasized the identi-

ﬁcation of factors inﬂuencing student outcomes, our

research goal is to explore and explain the impact of

the identiﬁed key attributes and their complex rela-

tionships. Second, the initial iteration of our predic-

tive algorithm is designed for both course and level

programs, while existing approaches from our liter-

ature review mainly focused on the course level for

predicting students’ outcomes. The ﬁrst data anal-

ysis batches helped us not only determine what and

how important attributes are in predicting student per-

formance, but also understand in which areas we can

improve teaching practices and support students from

predicting their academic performance. However, it is

essential to point out that our research has some limi-

tations and further research and validation are needed.

Current and future work in this area include:

1. Expanding the Sample Size and Validation: we

are currently conducting a study that covers a big-

ger dataset from our partners in France. We also

expect to validate the results of our study by com-

paring them to other studies and testing the model

on new data to ensure its ability to generalize well

to unseen data.

2. Implementing the Model: our next challenge

Students’ Performance in Learning Management System: An Approach to Key Attributes Identiﬁcation and Predictive Algorithm Design

291

will be a real-time application. For that, we are

studying the possibilities of integrating explain-

able AI methods like LIME or SHAP. As a matter

of fact, we are interested in helping our lecturer

colleagues interpret the model predictions.

3. Evaluation Metric: we acknowledge the value of

using Cohen’s Kappa for evaluating classiﬁers on

imbalanced datasets. Thus, we decided to incor-

porate Cohen’s Kappa as an additional evaluation

metric, alongside the confusion matrix. Indeed,

by using both evaluation approaches, we aim to

provide a more robust and comprehensive assess-

ment of the classiﬁer’s performance in predicting

SP.

Overall, our research lays a foundation for ad-

vancing students’ performance prediction, beneﬁting

CADT and French partner universities and potentially

impacting the wider educational community. The

positive results suggest broader implications, inﬂu-

encing global educational practices and fostering a

more data-informed and supportive learning environ-

ment.

REFERENCES

Aguiar, E., Chawla, N., Brockman, J. B., Ambrose, G. A.,

and Goodrich, V. (2014). Engagement vs perfor-

mance: using electronic portfolios to predict ﬁrst

semester engineering student retention. Proceedings

of the Fourth International Conference on Learning

Analytics And Knowledge.

Albreiki, B., Zaki, N., and Alashwal, H. (2021). A sys-

tematic literature review of student’ performance pre-

diction using machine learning techniques. Education

Sciences.

Alhassan, A., Zafar, B. A., and Mueen, A. (2020). Predict

students’ academic performance based on their assess-

ment grades and online activity data. International

Journal of Advanced Computer Science and Applica-

tions, 11.

Arizmendi, C. J., Bernacki, M. L., Rakovi

c, M., Plumley,

R., Urban, C. J., Panter, A., Greene, J. A., and Gates,

K. M. (2022). Predicting student outcomes using dig-

ital logs of learning behaviors: Review, current stan-

dards, and suggestions for future work. Behavior Re-

search Methods, 55:3026 – 3054.

Brahim, G. B. (2022). Predicting student performance from

online engagement activities using novel statistical

features. Arabian Journal for Science and Engineer-

ing, 47:10225 – 10243.

Costaa, C., Alvelosa, H., and Teixeiraa, L. (2015). Cen-

teris 2012-conference on enterprise information sys-

tems the use of moodle e-learning platform : a study

in a portuguese university.

Felix, I., Ambrosio, A., Duilio, J., and Sim

oes, E. (2019).

Predicting student outcome in moodle. In Proceed-

ings of the Conference: Academic Success in Higher

Education, Porto, Portugal, pages 14–15.

elix, I. M., Ambrosio, A. P., da Silva Neves Lima, P., and

Brancher, J. D. (2018). Data mining for student out-

come prediction on moodle: a systematic mapping.

Anais do XXIX Simp

osio Brasileiro de Inform

atica na

Educac¸

ao (SBIE 2018).

Gaftandzhieva, S., Talukder, A., Gohain, N., Hussain, S.,

Theodorou, P., Salal, Y. K., and Doneva, R. (2022).

Exploring online activities to predict the ﬁnal grade of

student. Mathematics.

Hirokawa, S. (2018). Key attribute for predicting student

academic performance. In International Conference

on Education Technology and Computer.

Namoun, A. and Alshanqiti, A. (2020). Predicting student

performance using data mining and learning analytics

techniques: A systematic literature review. Applied

Sciences.

Quinn, R. J. and Gray, G. (2019). Prediction of student

academic performance using moodle data from a fur-

ther education setting. Irish Journal of Technology

Enhanced Learning.

Ramesh, V., Parkavi, P., Ramar, K., Thai-Nghe, N., Busche,

A., Schmidt-Thieme, L., Carol, Bastin, P., Thiyagaraj,

Yosuva, S., and Arulkumar, V. (2013). Predicting stu-

dent performance: A statistical and data mining ap-

proach. International Journal of Computer Applica-

tions, 63:35–39.

Trindade, F. R. and Ferreira, D. J. (2021). Student perfor-

mance prediction based on a framework of teacher’s

features. International Journal for Innovation Educa-

tion and Research.

Venkatachalam, S. T., Doraisamy, S. C., Golzari, S.,

Norowi, N. B. M., Sulaiman, M. N. B., Liu, H., Mo-

toda, H., Setiono, R., Zharo, Z., and Maben, A. F.

(2011). Identifying key performance indicators and

predicting the result from student data. International

Journal of Computer Applications, 25:45–48.

gcı, M. (2022). Educational data mining: prediction of

students’ academic performance using machine learn-

ing algorithms. Smart Learning Environments, 9.

DATA 2024 - 13th International Conference on Data Science, Technology and Applications

292