How Far Can We Trust the Predictions of Learning Analytics Systems?

Amal Ben Soussia and Anne Boyer

Universit

e de Lorraine, LORIA, France

Keywords:

Learning Analytics, Prediction Systems, Online Learning, Trust Granularities, Trust Index, k-12 Learners.

Abstract:

Prediction systems based on Machine Learning (ML) models for teachers are widely used in the Learning

Analytics (LA) ﬁeld to address the problem of high failure rates in online learning. One objective of these

systems is to identify at-risk of failure learners so that teachers can intervene effectively with them. Therefore,

teachers’ trust in the reliability of the predictive performance of these systems is of great importance. However,

despite the relevance of this notion of trust, the literature does not propose particular methods to measure the

trust to be granted to the system results. In this paper, we develop an approach to measure a teacher’s trust in

the prediction accuracy of an LA system. For this aim, we deﬁne three trust granularities, including: the overall

trust, trust per class label and trust per prediction. For each trust granularity, we proceed to the calculation of

a Trust Index (TI) using the concepts of conﬁdence level and conﬁdence interval of statistics. As a proof of

concept, we apply this approach on a system using the Random Forest (RF) model and real data of online k-12

learners.

1 INTRODUCTION

Prediction systems based on Machine Learning (ML)

models are a widespread solution in the Learning An-

alytics (LA) literature to identify at-risk of failure

learners (L

opez Zambrano et al., 2021).

When they are for teachers, these systems are in-

tended to enable effective and accurate instructional

intervention with at-risk learners. Indeed, teachers

refer to the prediction outcomes of these systems in

their pedagogical monitoring and in taking speciﬁc

corrective actions with the less performing learners.

Thus, teachers’ trust in the reliability of the accu-

racy of an LA system’s predictions is of great impor-

tance. In other words, after identifying the learner’s

academic situation, teachers also need to know how

far they can trust the system’s prediction results. This

notion of trust is interesting as it ensures the teach-

ers’ acceptability of the prediction results for an ef-

fective and accurate pedagogical interventions. For

example, for a teacher, not correctly identifying an

at-risk learner is worse than identifying a successful

learner as at-risk. And since prediction systems are

often characterized by the instability and the oscilla-

tion of their results (Ben Soussia et al., 2022), such

an example is quite common. In such a situation,

teachers’ trust in the system’s performance comes

into play to give leeway to the predictive outcomes.

Indeed, (Qin et al., 2020) deﬁnes trust in Artiﬁcial

Intelligence-based educational systems as the willing-

ness of users to receive knowledge, provide personal

information and follow suggestions based on the be-

lief that these systems and their managers or develop-

ers will act responsibly. However, the LA literature

does not address the problem of how far teachers can

trust the predictive performance of an educational sys-

tem. Furthermore, the LA often discusses trust from

an ethical and black-box, in relation to the nature and

type(s) of used data, point of view. Given the impor-

tance of the trust notion in LA, the main question is:

how to measure the trust index to be granted to the

performance of a prediction system?

To answer this question, we focus in this paper

on developing an approach to measure a Trust Index

(TI) of teachers in the prediction accuracy of an LA

system. First, we deﬁne three trust granularities in

an LA system, including : (1) the overall trust, (2)

trust per class label of the system and (3) trust per

prediction made by the system. Then, for each trust

granularity, we compute a TI using the concepts of

conﬁdence level and conﬁdence interval of the statis-

tics. For the TI of each trust granularity, we propose

an algorithm to compute the conﬁdence level to be

granted to the system performance. For trust granu-

larities (1) and (2), we proceed to the computation of

the conﬁdence intervals using the most popular statis-

tical method called : Normal Approximation Interval

150

Ben Soussia, A. and Boyer, A.

How Far Can We Trust the Predictions of Learning Analytics Systems?.

DOI: 10.5220/0012057800003470

In Proceedings of the 15th International Conference on Computer Supported Education (CSEDU 2023) - Volume 2, pages 150-157

ISBN: 978-989-758-641-5; ISSN: 2184-5026

 2023 by SCITEPRESS – Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)

Based on a Test Set (Raschka, 2018).

To validate this approach, we apply these algo-

rithms on an LA system using Random Forest (RF)

model and real data of k-12 learners enrolled online

within a French distance education center (CNED

The rest of this paper is organized as follows. The

Section 2 presents the related work and discusses on

our contribution with respect to the literature. Sec-

tion 3 presents statistical information. Section 4 for-

malizes the problem and introduces the trust granular-

ities and their TI algorithms. Section 5 describes our

case study. Section 6 presents the experimental part

and the obtained results. Section 7 concludes on the

results and introduces the perspectives.

2 RELATED WORK

Artiﬁcial intelligence (AI) is an emerging science of

dealing with the simulation of intelligent behavior in

computers (Bitkina et al., 2020). However, while very

promising, AI has been implicated in trust issues, and

concerns have been raised about the use of AI in vari-

ous initiatives and technologies (Lockey et al., 2021).

For these reasons, trust in AI is gaining so much in-

terest lately. (Siau and Wang, 2018) conﬁrms that the

level of trust a person has in someone or something

can determine that person’s behavior. (Stanton et al.,

2021) deﬁnes user trust in AI application as based on

the perception of its reliability. The actual reliabil-

ity of the AI is inﬂuential to the extent that it is per-

ceived by the user. Trust is a function of the user’s

perceptions of technical reliability characteristics. In

this context, (Lockey et al., 2021) introduces concepts

related to ﬁve central AI trust challenges including the

ability to know and explain AI, accountability for ac-

curacy and fairness of systems outcomes, systems au-

tomation and minimization of direct human involve-

ment, the inclusion of human-like features into an

AI’s design and accountability for data privacy.

The literature of LA has also taken advantage of

AI to widely propose systems and dashboards to solve

learning issues and monitor learners’ behaviors and

their academic situations. The issue of trust is also ad-

dressed in LA and is gaining the interest of the actors

of the ﬁeld. (Tsai et al., 2021) discusses on the trust

factors and threats in LA among : data accuracy, eq-

uity of treatment, potential misuse of LA. . . . To con-

vince stakeholders that LA dashboards and systems

work for their best interest, building trust, according

to (Biedermann et al., 2018), involves several layers,

from the integrity and quality of data sources, over

Centre National d’Enseignement

a Distance

secure storage and processing to the effectiveness of

the analytics results. (Baneres et al., 2021) proposes

a trustworthy early warning system to detect at-risk

learners early to help them to pass the course. The

infrastructure of this system is built on the basis of

the requirements of the European Assessment List for

trustworthy artiﬁcial intelligence including : human

agency and oversight, robustness and safety, privacy

and data governance, transparency, diversity, fairness,

accountability. . . . In order to monitor students learn-

ing, (Susnjak et al., 2022) proposes a dashboard that

incorporates data-driven prescriptive capabilities in-

volving counterfactuals into its display. In addition,

the proposed dashboard has a high degree of trans-

parency as it communicates to the learners the relia-

bility of the predictive models, the key factors driv-

ing the predictions, and the conversion of a black-box

predictive model to a human-interpretable model so

that learners can understand how their prediction is

derived. This paper has demonstrated the importance

of predictive models interpretability in building trust

with dashboard users through transparency of evolu-

tion beyond black-box predictive models and, in do-

ing so, to meet new regulatory requirements.

In summary, the trust notion is gaining interest in

LA. However, most of the work is limited (1) to the

ethical aspects of AI (e.g data protection and privacy)

to increase the trust of stakeholders in the systems, (2)

to the trust in the data, their quality and relevance as

well as (3) to the transparency of the used models. De-

spite the importance of accuracy and fairness of sys-

tems outcomes as an AI trust challenge, to our knowl-

edge, no work in the LA highlights the importance of

measuring trust in the systems predictions. Several

research works emphasize the importance of measur-

ing conﬁdence intervals of statistics of AI-based sys-

tem predictions as a broader mean of validation than

measuring only accuracy. In this paper, we build an

approach to measure the trust in the predictions of an

LA system for online teachers. We deﬁne three trust

granularities. For each one of them, we compute a

Trust Index (TI) using the concepts of conﬁdence in-

tervals and conﬁdence levels of statistics.

3 STATISTICAL BACKGROUND

Conﬁdence intervals are interesting in modeling and

simulation as they are commonly used for model val-

idation. A conﬁdence interval is a range of values es-

timate of a parameter of a population calculated from

a sample drawn from the population. A conﬁdence

interval has an associated conﬁdence level, which is a

percentage between 0% and 100% (Petty, 2012).

How Far Can We Trust the Predictions of Learning Analytics Systems?

151

In this paper, to create conﬁdence intervals, we

use the most popular method and that guarantees good

results : Normal Approximation Interval Based on a

Test Set (Raschka, 2018). Using this method, the con-

ﬁdence interval is calculated from a single train-test

split and follows this structure :

conﬁdence interval = [estimated parameter − margin,

estimated parameter + margin]

where the margin is the standard error of the corre-

sponding estimated parameter. In our context, the es-

timated parameter is the accuracy metric of ML, and

the margin is calculated as follows:

margin = z

accuracy

test

× (1 − accuracy

test

)

(1)

Where accuracy

test

is the accuracy of the predic-

tions and n is the size of the test dataset. z

is the

critical value for the normal distribution for a given

conﬁdence level (Petty, 2012).

In our work, each conﬁdence interval is equal to:

conﬁdence interval = [accuracy − margin , accuracy

+ margin]

In the literature, to create conﬁdence intervals,

some conﬁdence level values and their corresponding

values are commonly used. In this work, we con-

sider that the conﬁdence level in a system depends on

its predictions. Therefore, for each trust granularity,

we propose an algorithm to compute the conﬁdence

level. Then, we can proceed to the creation of conﬁ-

dence intervals following the Normal Approximation

Interval Based on a Test Set method.

4 TRUST INDEX OF EACH

TRUST GRANULARITY

In this section, we formalize the problem and intro-

duce the three trust granularities : (1) the overall trust,

(2) trust per class label and (3) trust per prediction.

Then, we present the TI of each trust granularity.

4.1 Problem Formalization

The use of predictive systems by teachers is

widespread in LA. Through the identiﬁcation of at-

risk learners, these systems allow teachers to monitor

the activity of their learners and intervene with those

in critical academic situations. Therefore, a teacher’s

trust in the prediction results is important for the sys-

tem acceptability and thus for the best follow-up.

Assume that Y = {C

,..,C

} is the set of class

labels. Let S = {S

,..,S

} be the set of students

in the test dataset and T = {t

,..,t

} be the set

of prediction times. At each t

∈ T , each S

∈ S

is represented by a vector X

=< f

, f

.., f

where f

∈ R represents the learning features of S

and C

∈ Y his class label. Let y

test

= {y

,.., y

} is the

set of real class labels of each leaner, where y

∈ Y .

Let y

prediction

= {y

pred

,.., y

pred

} is the set of predicted

class labels of each learner of S, where y

pred

∈ Y . The

objective is to have at each prediction time, y

= y

pred

which is not always obvious in a real context.

4.2 Overall Trust Granularity

In this section, we introduce the overall trust granu-

larity. Then, we present the algorithm for its TI.

4.2.1 Deﬁnition

The objective of LA systems is to accurately predict

at-risk learners so that teachers can intervene effec-

tively. Therefore, it is necessary for teachers to show

trust in the overall prediction performance. In this

context, we deﬁne the overall trust as a TI computed

from the conﬁdence level and then the conﬁdence in-

terval that can be granted to the accuracy of all the

predictions made by the system. Based on this def-

inition, this overall trust granularity tells the teacher

how far he can trust the overall performance of the

system independently of the existing class labels and

the individual predictions.

4.2.2 Overall Conﬁdence Level Algorithm

In this section, we propose the Algorithm 1 to com-

pute the overall conﬁdence level of an LA system.The

Algorithm 1 takes as input y

test

, y

prediction

which are

respectively the sets of real and predicted class labels

and Y which is the set of class labels of the system.

This Algorithm returns con f idence

level

(a percentage)

which is the conﬁdence level to give to the overall per-

formance of the system. The Algorithm starts by ini-

tializing to 0 the variable con f idence

(Line 1) which

is the general conﬁdence of all predictions. Then, the

Algorithm iterates over the predicted class labels set

prediction

(Line 2). The Algorithm initializes to 0 the

variable con f idence

, which is the conﬁdence value

to give to the i

prediction (Line 3). The Algorithm

veriﬁes if the prediction at the i

index is equal to the

value of the y

test

at the same index (Line 4). If so, the

Algorithm assigns 1 to con f idence

(Line 5). Else,

it assigns to con f idence

the value of (1/size(Y )),

which refers to the number of class labels in Y (Line

7). At Line 9, the value of con f idence

is added to

con f idence

. At Line 11, the overall conﬁdence level

CSEDU 2023 - 15th International Conference on Computer Supported Education

152

given by con f idence

level

is calculated by dividing the

value of con f idence

by the size of y

test

Algorithm 1: Overall Conﬁdence Level.

Require: y

test

, y

prediction

, Y

Ensure: con f idence

level

1: con f idence

← 0

2: for each i in y

prediction

3: con f idence

← 0

4: if (y

prediction

[i] == y

test

[i]) then

5: con f idence

← 1

6: else

7: con f idence

← 1/size(Y )

8: end if

9: con f idence

← con f idence

+ con f idence

10: end for

11: con f idence

level

= con f idence

/size(y

test

)

Once, the con f idence

level

is calculated, we com-

pute the value of z

and then the overall margin

(margin

overall

) of the conﬁdence interval:

margin

overall

= z

accuracy

test

× (1 − accuracy

test

)

size(y

test

)

(2)

size(y

test

) corresponds to the number of data points in

test dataset.

4.3 Trust per Class Label Granularity

In this section, we introduce the trust per class label

granularity. Then we present the algorithm for its TI.

4.3.1 Deﬁnition

In LA systems for class labels prediction, learners

are usually classiﬁed into more than one class. Thus,

the system could potentially perform differently with

each of these classes. Therefore, it is of great interest

for the teacher to know how far she/he can trust the

performance of the system when it comes to the pre-

dictions of a particular class. In this perspective, we

deﬁne the trust per class label as TI computed from

the conﬁdence level and the conﬁdence interval to be

granted to the accuracy of predictions of a given class

label.Such a deﬁnition allows for a more thorough ex-

amination of the reliability of predictions. The pur-

pose of such a TI is to enable effective intervention

with learners of each of the system’s class labels.

4.3.2 Conﬁdence Level per Class Label

Algorithm

In this section, we propose the Algorithm 2 to com-

pute the conﬁdence level of each class label for

the TI of each class label’s predictions. The Algo-

rithm 2 takes as input the set of class labels Y and

T which is the set of class probabilities tables for

each data point of the test dataset. This Algorithm

returns con f idence

which is a list of class labels

and their corresponding conﬁdence levels. This Al-

gorithm starts by iterating over the class labels in Y

(Line 1). For each C

∈ Y , the Algorithm initializes

to 0 the variable probability

, which corresponds to

the sum of the prediction probabilities of each data

point in C

(Line 2). Then, the Algorithm iterates

over T

, which is the prediction probabilities table

of C

(Line 3). At Line 4, the value of T

at index i

is added to probability

. At Line 6, the conﬁdence

level of the class label C

given by con f idence

level

is calculated by dividing the value of probabil ity

by the size of T

. Then, the measured conﬁdence

level con f idence

level

is saved in con f idence

with

its corresponding class label C

Algorithm 2: Conﬁdence Level of each class label -

TI class(Y , T ) -.

Require: Y , T

Ensure: con f idence

1: for each C

in Y do

2: probability

← 0

3: for each i in T

4: probability

← probability

+ T

[i]

5: end for

6: con f idence

level

== probability

/size(T

)

7: con f idence

← put(C

,con f idence

level

)

8: end for

After applying the Algorithm 2, we can compute

for each class label C

the value of z

and then of the

margin

of the conﬁdence interval of C

as follows:

margin

= z

accuracy

× (1 − accuracy

)

card

(3)

Where accuracy

is the accuracy of the predictions

of this particular class label C

and card

is the num-

ber of learners who are really labeled as in C

4.4 Trust per Prediction Granularity

In this section, we introduce the trust per prediction

granularity. Then, we present the algorithm for its TI.

4.4.1 Deﬁnition

The objective of LA systems is to predict at-risk learn-

ers so that teachers can intervene with them. Accu-

How Far Can We Trust the Predictions of Learning Analytics Systems?

153

rate results are important for effective and personal-

ized interventions. Therefore, it is pertinent for the

teacher to know how far he can trust the reliability of

each single prediction of the system. We deﬁne the

trust per prediction as the TI computed from the con-

ﬁdence level to be granted to the accuracy of that pre-

diction independently of the performance of the sys-

tem with the rest of data points. Such a TI allows the

teacher to have the reliability of the system perfor-

mance with each prediction. This trust granularity ﬁts

in with the goal of personalized pedagogical interven-

tion with each learner of the educational system.

4.4.2 Conﬁdence Level per Prediction Algorithm

In this section, we propose the Algorithm 3 to calcu-

late the TI of this trust granularity which is the conﬁ-

dence level of each prediction made by the system.

The Algorithm 3 takes as input y

test

and y

prediction

which are respectively the sets of real and predicted

class labels. It requires also the set of class labels

Y and T which is the set of class probabilities ta-

bles for each data point of the test dataset. This Al-

gorithm returns C

prediction

, which is a list of predic-

tion indexes and their corresponding conﬁdence lev-

els. This Algorithm starts by iterating over the pre-

dictions of y

prediction

(Line 1). For each prediction,

the Algorithm initializes the variable con f idence

0 (Line 2). Then, it veriﬁes if the prediction at the

i index is the same of which of the test set at the

same index (Line 3). If so, the Algorithm assigns 1

to con f idence

(Line 4). Else, the prediction at the i

index corresponds to a class label C

among Y (Line

6). At Line 7 and 8, the Algorithm extracts the con-

ﬁdence level of C

by calling the Algorithm 2 and it

assigns it to con f idence

. At Line 10, the measured

conﬁdence con f idence

is saved along with its corre-

sponding index i in C

prediction

Algorithm 3: Conﬁdence Level per each prediction.

Require: y

test

, y

prediction

, Y , T

Ensure: C

prediction

1: for each i in y

prediction

2: con f idence

← 0

3: if (y

prediction

[i] == y

test

[i]) then

4: con f idence

← 1

5: else

6: C

← y

prediction

[i]

7: con f idence

← extract(T I class(Y,T ))

8: con f idence

← con f idence

9: end if

10: C

prediction

← put(i, con f idence

)

11: end for

For this trust granularity, the TI of a prediction is

the conﬁdence level calculated using the Algorithm 3.

5 CASE STUDY

Our case study is the k-12 learners enrolled online

within CNED. CNED offers multiple fully distance

courses to a large number of heterogeneous and phys-

ically dispersed learners. In addition, learning is quite

speciﬁc; for example, learners of same cohorts do not

necessarily start their school year at the same time and

everyone of them studies on her/his own pace. Given

these learning particularities, CNED records high fail-

ure rates among its learners every year. In order to

minimize the failure risk and improve the pedagog-

ical monitoring of teachers, CNED aims to provide

teachers with system based on LA technologies to

help them in identifying accurately at-risk of failure

learners. For the CNED, gaining the trust of teach-

ers in the reliability of the results of this system is

paramount to the success of this LA-based strategy.

Based on the grades average and according to the

French system where marks are out of 20, learners of

each module are classiﬁed into 3 classes as follows :

• success (C

): when the average is higher than 12

• medium risk of failure (C

): when the average is

between 8 and 12

• high risk of failure (C

): when the average is

lower than 8

This system tracks learners activity on a weekly basis

to allow teachers to regularly monitor their learners.

Thus, on each prediction week, each learner is repre-

sented by a vector composed of learning features and

a class label among {C

, C

Indeed, this system uses learning traces of 647

learners enrolled in the physics-chemistry module for

35 weeks during 2017-2018 school year and is mod-

eled with the Random Forest (RF) model.

6 EXPERIMENTS AND RESULTS

In this section, we analyze the results of applying our

approach.

6.1 Overall TI Results

The overall trust granularity TI is important to analyze

the reliability of the whole system’s performance.

The Figures 1 and 2 represent the conﬁdence inter-

vals of the predictions overall accuracy using 90% and

CSEDU 2023 - 15th International Conference on Computer Supported Education

154

95% conﬁdence levels respectively, which are com-

monly used by default in the literature. Thus, at each

prediction time, the margins of the conﬁdence inter-

vals of these Figures are calculated based on these

conﬁdence levels and on the accuracy of the predic-

tions at this time point. We notice that the conﬁdence

intervals of these two Figures are quite wide. The

margins of the conﬁdence intervals of the 90% con-

ﬁdence level vary between 0.06 and 0.04. While the

margins of the conﬁdence intervals of the 95% conﬁ-

dence level are between 0.08 and 0.04.

Figure 1: Overall Conﬁdence intervals with a conﬁdence

level= 90%.

Figure 2: Overall Conﬁdence intervals with a conﬁdence

level=95%.

To calculate the TI of the overall trust granularity

following our approach, we ﬁrst calculate the over-

all conﬁdence levels at each prediction time based

on the Algorithm 1 and which the evolution is pre-

sented by Figure 3. We can observe that both curves

of this Figure have the same shape. In fact, the values

of the conﬁdence levels of the system performance

evolve according to the system accuracy values. In

other words, at each prediction time, we have a conﬁ-

dence level that is more correlated to the performance

of the system, rather than assigning a default conﬁ-

dence level. Given the yielded conﬁdence levels, we

compute the conﬁdence intervals to be granted to the

overall accuracy of the system following the Equa-

tion 2. The evolution of the system accuracy values

and their corresponding conﬁdence intervals is given

by the Figure 4. We remark that during all predic-

tion times, the conﬁdence intervals of Figure 4 are

narrower than those of Figures 1 and 2. In fact, the

margins of the intervals with respect to the accuracy

values are smaller as they vary between 0.05 and 0.04.

In statistics, it is better to have narrow conﬁdence in-

tervals as it proves the accuracy and the relevance of

the used conﬁdence level. A large conﬁdence interval

suggests that the sample does not provide a precise

representation of the estimate, whereas a narrow con-

ﬁdence interval demonstrates a greater precision.

Instead of using default conﬁdence levels, using

the Algorithm 1 to calculate the conﬁdence level en-

ables to create more accurate and narrower conﬁdence

intervals for the accuracy of the predictions. Thus,

this approach enables to measure a more precise TI

for this trust granularity to give the teacher a ﬁner idea

of the reliability of the system’s predictions.

Figure 3: Overall conﬁdence level and accuracy evolution.

Figure 4: Overall Conﬁdence intervals with a calculated

overall conﬁdence level.

6.2 TI per Class Label Results

The class label trust granularity TI is interesting as it

allows to follow the reliability of the system perfor-

mance with each class label.

The Figure 5 presents the conﬁdence levels evolu-

tion of the system performance with each class label.

For each class C

, C

and C

, the conﬁdence levels

are calculated following the Algorithm 2. This ﬁg-

ure shows that the conﬁdence levels of class labels

of the same system evolve very differently over time.

Indeed, the conﬁdence levels of each class label de-

pend on its population and the ability of the system

to correctly predict the data points of that class. In

fact, the highest conﬁdence levels are recorded with

the class of successful learners C

. While the low-

How Far Can We Trust the Predictions of Learning Analytics Systems?

155

est conﬁdence levels are recorded with the two risk

classes, especially C

Figure 5: Evolution of all conﬁdence levels of all class la-

bels of the system.

Given these conﬁdence levels, the Figures 6, 7

and 8 present the evolution over time of the conﬁ-

dence intervals to be granted to the prediction accu-

racy of the class labels C

, C

and C

respectively.

From these ﬁgures, we notice that the conﬁdence in-

tervals for the accuracy of C

predictions are narrow.

That said, the margins of the intervals with respect to

the accuracy values are small and vary between 0.03

and 0.01. We can highly trust the system when it

comes to the prediction of C

as it gets rarely wrong

with this class label. However, if we compare the

results of Figure 7 and 8, we remark that the conﬁ-

dence intervals for the accuracy of C

are narrower

than those for C

especially at some time points such

as the prediction weeks 10, 22 and 23. Such a re-

sult shows that the accuracy of the system predictions

with C

is more reliable than with the medium risk

class C

. Therefore, the system’s performance is more

trustworthy when it comes to the prediction of the risk

class C

than the risk class C

, which is the class of

learners in a fuzzy learning situation. The obtained

results show the relevance of the measurement of TI

of the trust granularity per class as it allows to follow

the reliability of the performance of the system with

each of the class labels. The TI is different from on

class label to another.

6.3 TI per Prediction Results

The TI of the trust per prediction granularity is inter-

esting for tracking the reliability of the system perfor-

mance with each single data point.

Applying the Algorithm 3 on the results of our

system, the Figure 9 presents the number of data

points by each conﬁdence level among 1 which cor-

responds to the total trust in the prediction result and

the conﬁdence levels corresponding to the C

, C

and

class labels. For simpliﬁcation reasons, the results

are presented every 5 weeks from week 5 to week 35.

This Figure shows that as time progresses, the sys-

tem predicts better and more correctly each data point.

Figure 6: Conﬁdence intervals of the class label C

Figure 7: Conﬁdence intervals of the class label C

Figure 8: Conﬁdence intervals of the class label C

Indeed, we notice that the number of data points with

total trust level gradually increases from week 5 to

week 35. Until week 10, the Algorithm assigns the

conﬁdence level of the prediction accuracy of the

class label C

to a fairly large number of data points.

We explain this result by the fact that the learners of

this class are the most numerous. Indeed, the predic-

tions made by the same system at a given time have

different levels of conﬁdence from one another. And

as time progresses and the system acquires new infor-

mation, the conﬁdence level of the prediction of a data

point changes and can take several values. Indeed, at

each prediction time and based on its TI, we can know

if a single prediction is trustworthy or not.

These results show the importance of going

through the measurement of TI for this ﬁne granular-

ity of trust to be able to accurately track each learner’s

prediction and to provide personalized interventions

with the less performing learners.

CSEDU 2023 - 15th International Conference on Computer Supported Education

156

Figure 9: Number of prediction per each conﬁdence level.

6.4 Discussion

In summary, the experimental study yielded the fol-

lowing results.

Following the Algorithm 1, we obtain conﬁdence

levels that are more correlated with the overall accu-

racy of the system. As a result, we have narrow con-

ﬁdence intervals and a more accurate TI of the over-

all trust granularity that is more representative of the

reliability of the system performance. For the same

system and at the same prediction time, the TI of the

prediction reliability is different from one class label

to another. This TI depends on the population of that

class and the ability of the system to correctly detect

the learners of that class. For the same system and at

the same prediction time, the TI of the prediction reli-

ability is different from a single data point to another.

In addition, the TI of a same data point changes over

time and can take on several values.

7 CONCLUSION

In this paper, we established an approach to measure

how far a teacher can trust the predictive performance

of an LA system. We started by deﬁning three trust

granularities in an LA system, including : the overall

trust, trust per class label and trust per each predic-

tion made by the system. Then, for each trust gran-

ularity, we proceeded to the calculation of a TI us-

ing the conﬁdence level and conﬁdence intervals of

statistics. The experimental results show the impor-

tance of going through all these trust granularities to

get a detailed idea of the reliability of the system per-

formance. In fact, The TI of the overall trust gran-

ularity shows the teacher how much she/he can trust

the overall performance of a predictive system. Both

other trust granularities give a ﬁner idea of how much

trust to be granted to the system when it comes to a

particular class or prediction. Indeed, at the same pre-

diction time, the TI is different from one class label

to another and also from one single prediction to an-

other. This notion of trust comes into play in order to

ensure an effective intervention adapted to the situa-

tion of each learner or group of learners.

As a perspective of this work we intend to propose

a global trust index for the whole system computed

from the TI of the three trust granularities.

REFERENCES

Baneres, D., Guerrero-Rold

an, A. E., Rodr

ıguez-Gonz

alez,

M. E., and Karadeniz, A. (2021). A predictive analyt-

ics infrastructure to support a trustworthy early warn-

ing system. Applied Sciences, 11(13):5781.

Ben Soussia, A., Labba, C., Roussanaly, A., and Boyer,

A. (2022). Time-dependent metrics to assess perfor-

mance prediction systems. The International Journal

of Information and Learning Technology, 39(5):451–

465.

Biedermann, D., Schneider, J., and Drachsler, H. (2018).

Implementation and evaluation of a trusted learning

analytics dashboard. In EC-TEL (Doctoral Consor-

tium).

Bitkina, O. V., Jeong, H., Lee, B. C., Park, J., Park, J., and

Kim, H. K. (2020). Perceived trust in artiﬁcial intel-

ligence technologies: A preliminary study. Human

Factors and Ergonomics in Manufacturing & Service

Industries, 30(4):282–290.

Lockey, S., Gillespie, N., Holm, D., and Someh, I. A.

(2021). A review of trust in artiﬁcial intelligence:

Challenges, vulnerabilities and future directions.

opez Zambrano, J., Lara Torralbo, J. A., Romero Morales,

C., et al. (2021). Early prediction of student learn-

ing performance through data mining: A systematic

review. Psicothema.

Petty, M. D. (2012). Calculating and using conﬁdence

intervals for model validation. In Proceedings of

the Fall 2012 Simulation Interoperability Workshop,

pages 10–14.

Qin, F., Li, K., and Yan, J. (2020). Understanding user

trust in artiﬁcial intelligence-based educational sys-

tems: Evidence from china. British Journal of Edu-

cational Technology, 51(5):1693–1710.

Raschka, S. (2018). Model evaluation, model selection,

and algorithm selection in machine learning. arXiv

preprint arXiv:1811.12808.

Siau, K. and Wang, W. (2018). Building trust in artiﬁcial

intelligence, machine learning, and robotics. Cutter

business technology journal, 31(2):47–53.

Stanton, B., Jensen, T., et al. (2021). Trust and artiﬁcial

intelligence. preprint.

Susnjak, T., Ramaswami, G. S., and Mathrani, A. (2022).

Learning analytics dashboard: a tool for providing

actionable insights to learners. International Jour-

nal of Educational Technology in Higher Education,

19(1):12.

Tsai, Y.-S., Whitelock-Wainwright, A., and Ga

sevi

c, D.

(2021). More than ﬁgures on your laptop:(dis) trust-

ful implementation of learning analytics. Journal of

Learning Analytics, 8(3):81–100.

How Far Can We Trust the Predictions of Learning Analytics Systems?

157