Audits for Trust: An Auditability Framework for AI-Based Learning

Analytics Systems

Linda Fernsel

, Yannick Kalff

and Katharina Simbeck

Computer Science and Society, HTW Berlin University of Applied Sciences, Treskowallee 8, 10318 Berlin, Germany

{linda.fernsel, yannick.kalff, simbeck}@htw-berlin.de

Keywords:

Audit, Auditability, Artiﬁcial Intelligence, Learning Analytics.

Abstract:

Audits contribute to the trustworthiness of Learning Analytics (LA) systems that integrate Artiﬁcial Intelligence

(AI) and may be legally required in the future. We argue that the efﬁcacy of an audit depends on the auditability

of the audited system. Therefore, systems need to be designed with auditability in mind. We present a framework

for assessing the auditability of AI-integrating systems in education that consists of three parts: (1) veriﬁable

claims about the validity, utility and ethics of the system, (2) evidence on subjects (data, models, or the system) in

different types (documentation, raw sources and logs) to back or refute claims, (3) means to validate evidence

such as technical APIs, monitoring tools, or explainable AI principles must be accessible to auditors. We apply

the framework to assess the auditability of the Learning Management System Moodle, which supports an AI-

integrating dropout prediction system. Moodle’s auditability is limited by incomplete documentation, insufﬁcient

monitoring capabilities, and a lack of available test data.

1 INTRODUCTION

Artiﬁcial Intelligence (AI) signiﬁcantly impacts the

ﬁeld of Learning Analytics (LA). LA itself is gain-

ing relevance in higher education (Baek and Doleck,

2023), K-12 classes (Paolucci et al., 2024), pre-school

settings (Crescenzi-Lanna, 2020), and generally for vir-

tual education (Elmoazen et al., 2023; Heikkinen et al.,

2023). AI adds to the utility of LA elements of educa-

tional data mining (Baek and Doleck, 2023; Romero

and Ventura, 2020), deep learning capabilities, machine

learning (Ouyang et al., 2023), predictive and prescrip-

tive analytics (Sghir et al., 2022; Xiong et al., 2024),

or multi-modal models capable of processing complex

physical behavioral data and stimuli (Crescenzi-Lanna,

2020). AI in LA offers data-driven insights into learn-

ing processes and students’ behavior to predict learning

success, risks of failure or drop-out, and to prescribe

proactive measures (Susnjak, 2024). Above that, AI

technologies promise opportunities to improve learn-

ing situations and outcomes for all students, especially

those disadvantaged and struggling (Khalil et al., 2023).

However, AI technologies in LA entail ethical is-

sues (Rzepka et al., 2022; Rzepka et al., 2023) and open

https://orcid.org/0000-0002-0239-8951

https://orcid.org/0000-0003-1595-175X

https://orcid.org/0000-0001-6792-461X

questions about their utility or maturity in education

(Drugova et al., 2024). Especially the potential threat

to equality and equity principles in education raised

research and practitioner interest in mitigating discrimi-

natory elements of AI models in LA (Simbeck, 2024;

Rzepka et al., 2023). Above that, ethical concerns create

the urgency for adequate legislation to counter negative

effects or prevent biased systems before they harm. AI

and AI-driven LA face regulation, such as the European

AI Act (European Union, 2024), or frameworks to im-

pose ethical requirements on AI products (Toreini et al.,

2022; Fjeld et al., 2020; Slade and Tait, 2019), and to

mitigate potential negative effects and discriminatory bi-

ases (Baker and Hawn, 2022; Prinsloo and Slade, 2017).

For this case, the “AI Act” (European Union, 2024)

aims to regulate “high-risk” AI systems that could vio-

late the “health and safety or the fundamental rights of

persons” (European Union, 2024, Recital 52). AI-based

LA systems can be considered “high-risk” because of

their impact on personal educational success, which

directly affects the individual “ability to secure [one’s]

livelihood” (European Union, 2024, Recital 56).

Regulatory frameworks like the AI Act require au-

dits of AI systems that certify their legal and ethical

compliance (Berghoff et al., 2022; Toreini et al., 2022).

The AI Act mandates two types of audits for high-risk

AI systems: conformity assessments before deploy-

ment (European Union, 2024, Art. 43) and post-market

Fernsel, L., Kalff, Y. and Simbeck, K.

Audits for Trust: An Auditability Framework for AI-Based Learning Analytics Systems.

DOI: 10.5220/0013254300003932

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 17th International Conference on Computer Supported Education (CSEDU 2025) - Volume 2, pages 51-62

ISBN: 978-989-758-746-7; ISSN: 2184-5026

monitoring after system deployment (European Union,

2024, Art. 72). Audits provide accountability and trans-

parency, which is also necessary for establishing trust in

AI systems (Toreini et al., 2022; Williams et al., 2022;

Bose et al., 2019; Springer and Whittaker, 2019). On a

practical level, audits allow stakeholders, such as sys-

tem providers or deploying institutions, regulators, and

subjects of the systems’ decisions, to understand how

the system decides and to identify and correct biases

or errors (Springer and Whittaker, 2019; Nushi et al.,

2018; Rzepka et al., 2023). However, audits struggle

with systems that are inaccessible, opaque, or propri-

etary (Fernsel et al., 2024b).

We argue that AI-based LA systems must be au-

ditable for any audit to achieve its results. We deﬁne AI

systems as software systems that implement methods

of machine learning. Machine learning is “the computa-

tional process of optimizing the parameters of a model

from data, which is a mathematical construct generating

an output based on input data” (European Parliament,

2023, Recital 6a). AI-based LA systems implement

machine learning methods to leverage learning data for

analysis, predictions, and prescriptions in educational

contexts (Baek and Doleck, 2023; Romero and Ventura,

2020; Ouyang et al., 2023).

2 AUDITABILITY OF AI

2.1 Audits of AI Systems

Technical, legal, and ethical reasons make audits of AI

systems necessary to ensure accountability for accurate,

compliant, and fair technological systems (Raji et al.,

2020; Falco et al., 2021; Ayling and Chapman, 2022).

For this purpose, auditing techniques, for example, from

the ﬁeld of ﬁnance, are adapted for AI systems (M

okan-

der et al., 2022). However, there are no standards for

audit quality (Alagi

c et al., 2021), and residual risks

remain uncertain (Knechel et al., 2013).

An audit analyses if an AI system complies with

legal regulations, organizational standards, or ethical

values. It compares claims made by stakeholders, like

developers or deployers of AI-based LA tools, to the

system’s actual behavior. Claims concern an AI LA

system’s validity, utility, and ethics: i.e., its models,

components, data sets, or scopes are suitable for the

intended purpose (validity); the system does fulﬁll its

intended use (utility); ethical, moral, or legal standards

are taken into consideration (ethics) (Minkkinen et al.,

2024; Ayling and Chapman, 2022). Auditors then re-

cover “auditable artifacts” (Ayling and Chapman, 2022)

to validate whether an AI system is implemented and

operating as claimed. AI’s functionality, application

ﬁeld, and associated risks require interdisciplinary com-

petencies and skills in mandating and conducting audits

(Landers and Behrend, 2023). Above that, AI systems’

design and implementation principles affect audits and

make processes of assessing claims, actual behavior,

and auditable evidence feasible (Li and Goel, 2024;

Ayling and Chapman, 2022; Falco et al., 2021).

We consider auditability as given when a system is

reviewable independently (Williams et al., 2022; Wol-

nizer, 2006). (Weigand et al., 2013) conceptualize au-

ditability as a) the system provides information on how

relevant values should be used or produced (claims),

b) the system generates information on how relevant

values are used or produced (evidence), and c) stake-

holders can validate these claims based on the provided

evidence. The complexity of AI systems creates spe-

cial requirements for audits and, thus, for a system’s

auditability. (Li and Goel, 2024) propose a framework

for auditability that focuses on training data, underlying

models, and organizational governance processes: “AI

auditability demands more comprehensive information

about the nature, process, quality assurance, and gover-

nance of training data, detailed process and governance

information about AI model commissioning, develop-

ment, deployment, and long-term monitoring, and the

governance structure relevant to developing and manag-

ing the AI system” (Li and Goel, 2024). (Berghoff et al.,

2022) approach auditability of AI systems from a cyber

security perspective where increased system complexity

impedes system auditability. (Raji et al., 2020) propose

a joint internal audit process of auditors and auditees,

who provide claims and artifacts as evidence. The joint

product of an audit process should be a remediation

plan to mitigate risks (Raji et al., 2020).

Claims are normative statements on a system’s func-

tionality, scope, and purpose. System providers and

system deployers deﬁne claims in system standards,

targeted ﬁelds of application, scope and use cases, or

as part of the source code documentation (Stoel et al.,

2012). Another set of claims stems from ethical or

moral standpoints, laws, regulations, and standards that

guide software implementation and use (Brundage et al.,

2020).

In the context of AI-based LA, evidence is “relevant

information about its execution” (Alhajaili and Jhumka,

2019) that allows us to analyze and trace errors in deci-

sions. Auditees should enable evidence collection by

organizational structures and processes that document

a system’s operation (Awwad et al., 2020; Stoel et al.,

2012). Additionally, organizational processes, system

logs, or data provide insights into AI-based LA’s func-

tioning and institutional setting.

Auditors with system access and evidence can ver-

ify whether an AI-integrating system meets the derived

CSEDU 2025 - 17th International Conference on Computer Supported Education

claims. However, AI-based LA systems present chal-

lenges when designing test cases and selecting test

data. Unlike other software, AI can handle a broader

range of input data and assume more possible states

(Berghoff et al., 2022). Therefore, a diverse and sub-

stantial amount of appropriate test data is required, but

scarce (Tao et al., 2019; Fernsel et al., 2024b). Espe-

cially balanced test data representing marginalized stu-

dent groups is scarce (Fernsel et al., 2024b). Further, in

pre-deployment audits, predicting all de facto use cases

and, therefore, all possible claims (Tao et al., 2019) is

difﬁcult. After deployment, models can be updated by

learning from new training data or through feedback

loops, and introduce bias in the process (Berghoff et al.,

2022; Awwad et al., 2020). Therefore, the tests imple-

mented for an AI are not necessarily realistic and make

repeated audits after deployment necessary (Eitel-Porter,

2021; M

okander and Floridi, 2021).

2.2 Enabling Auditability of

AI-integrating Systems

Because of the limited auditability of AI-integrating

systems, some audits relied on self-audits and required

auditees to answer a set of questions about the design

principles of the system and the measures undertaken

to guarantee functionality and compliance (Raji et al.,

2020). While this is a valid approach, we argue that

future AI-integrating systems, including AI-based LA

systems, must be designed with auditability in mind to

enable independent assurance. Even though these sys-

tems are inherently complex to audit, system providers

and deploying institutions can take steps to enable inde-

pendent auditability. Auditable AI-integrating systems

require planning, documentation, the implementation of

speciﬁc functionalities, such as logs, API, monitoring

tools or explanations, and sometimes access to the sys-

tem sources, such as program code, model conﬁguration,

and data.

Planning for Auditability. As AI-integrating LA sys-

tems are very complex, sufﬁcient auditability will only

be reached if it is planned for during the system design

process. To help the completeness of evidence, “ac-

countability plans” outline what and how information

should be captured (Naja et al., 2022). Plans should also

determine applicable deﬁnitions of ethical standards and

the available data for their evaluation (Galdon-Clavell

et al., 2020). (Slade and Tait, 2019) and (Kitto and

Knight, 2019) discuss relevant ethical standards in LA.

Based on the accountability plan, the system’s organiza-

tional processes (project and risk management, design

and development processes) must be adjusted for au-

ditability. Workﬂow mechanisms to increase auditabil-

ity include logging of model training and validation

results, storing of model metadata, and continuous mon-

itoring (Kreuzberger et al., 2022).

Documentation. The auditability of an AI-integrating

system is further inﬂuenced by the completeness of doc-

umentation. The AI Act requires documentation for

high-risk AI systems on the system in general, the mod-

els, and the relevant data (European Union, 2024, Art.

11). System-related documentation should include func-

tionality and limitations of the system (European Union,

2024, Recital 66). The system auditability can be in-

creased by documenting design and implementation

choices, including the policies, external requirements,

and organizational processes like project and risk man-

agement (Raji et al., 2020; Beckstrom, 2021). Model-

related documentation should include information on

algorithms for training, testing, and validation. Model

parameters should also be documented, and documenta-

tion should also elaborate on the model performance

(Beckstrom, 2021; Mitchell et al., 2019) (European

Union, 2024, Art. 13, 3b), which includes providing

complex evaluation metrics like ROC curves or a model-

speciﬁc “measure of conﬁdence” for each output (Ash-

more et al., 2022). Data-related documentation should

contain the data structure (Beckstrom, 2021; European

High-Level Expert Group on AI, 2019) and informa-

tion on data provenance, including the data acquisi-

tion method, data transformations, and data processing

(Beckstrom, 2021; Gebru et al., 2021), e.g., for labeling

and feature calculation. Documenting provenance con-

tributes to reproducibility and helps to discover where

biases originate and which data operations (e.g., data

processing steps) inﬂuence them (Toreini et al., 2022).

Aspects of data quality such as the balance of classes

in the training, validation, and veriﬁcation data sets

(Beckstrom, 2021), and data completeness (Ashmore

et al., 2022) should be documented as well. For data

sets, suggested standards for documentation are Data

Sheets (Gebru et al., 2021) and Dataset Nutrition Labels

(Holland et al., 2018).

Providing Sources. AI-based LA results often lack

reproducibility (Haim et al., 2023). Auditability can be

increased further by providing the raw sources of an AI-

integrating system, including the system source code

(Tagharobi and Simbeck, 2022; Beckstrom, 2021), the

model itself, including its model weights and the train-

ing and test data for evaluation purposes (Beckstrom,

2021). Under some circumstances, raw data cannot be

provided due to privacy issues. For such cases, the audi-

tor could be enabled to collect or create (synthesized)

test data for the audit (El Emam et al., 2020). Synthetic

data can also be helpful when data is scarce, for exam-

Audits for Trust: An Auditability Framework for AI-Based Learning Analytics Systems

ple, to protect student privacy (Dorodchi et al., 2019) or

for underrepresented minorities (El Emam et al., 2020).

Implementing Auditability. Enabling auditability

requires speciﬁc system functionalities for externaliz-

ing system information, such as logging (Eitel-Porter,

2021; Bose et al., 2019), secure access to the system for

auditors (Awwad et al., 2020), monitoring tools (Ash-

more et al., 2022; Bharadhwaj et al., 2021; Eitel-Porter,

2021; Alhajaili and Jhumka, 2019), and explanations for

model behavior (Brundage et al., 2020; Shneiderman,

2020; Guidotti et al., 2018).

Auditors can use logs to understand the data ﬂow

through the system (Falco et al., 2021). Logs record

the production process of any result like predictions

or data sets (Kale et al., 2022) and, thus, enhance the

auditability of AI systems (Brundage et al., 2020; Shnei-

derman, 2020). Secure system access for external audi-

tors, e.g., via APIs, is a prerequisite for an audit (Awwad

et al., 2020). (Springer and Whittaker, 2019) show that

APIs allow systematic tests of scenarios based on the

system’s claims. An API can also enable secure third-

party access to logs (Alla and Adari, 2021). Monitoring

tools help to analyze performance, detect model be-

havior changes, and recognize violations of (ethical)

constraints (Eitel-Porter, 2021). It involves tracking

various aspects of the system, such as model input,

the environment of use, internal model properties, and

model output (Ashmore et al., 2022). Constant moni-

toring after deployment is a regular part of the AI life

cycle (Alla and Adari, 2021) and post-market monitor-

ing is a requirement for high-risk AI systems under the

AI Act (European Union, 2024, Art. 72). Explanations,

like feature importance explanations or counterfactual

explanations (Bhatt et al., 2020), which are provided

as part of the user interface, can help users or auditors

understand the system’s output (Shneiderman, 2020).

3 A FRAMEWORK FOR

ASSESSING AUDITABILITY OF

AI SYSTEMS

Based on our discussion of audits and auditability of

AI systems and methods to enhance the auditability

of AI-integrating systems, we propose a framework to

assess and identify opportunities to improve the au-

ditability of AI-integrating systems. Figure 1 visualizes

the framework for assessing the auditability. Any audit

process has three steps displayed from the bottom to the

top: First, auditors designate veriﬁable claims about the

system. Then, auditors identify, generate, and collect

suitable evidence. Finally, the auditors validate claims

based on the evidence they retrieve from the AI-based

LA system.

Figure 1: A framework for assessing auditability of AI sys-

tems.

Verifyable Claims. Developers or deploying organi-

zations ensure the properties of the AI-based LA system

and the processes in which it is applied. Auditors can

derive veriﬁable claims from such assurance statements,

which form the benchmark for the actual functioning of

AI-based LA. Therefore, claims are the foundation for

any audit (Brundage et al., 2020). Claims can concern

validity: are methods correctly applied in the system,

and is the system output correct? Utility describes if

the system’s functionality can be considered helpful in

its use case. Finally, adherence to underlying ethical

principles summarizes claims that the audited system

complies with applicable law (GDPR) or latent social,

organizational, or societal norms, e.g., corporate culture,

accessibility, or diversity, equity, and inclusion (DEI)

goals that attribute to fairness or objectivity (Landers

and Behrend, 2023).

Evidence. Once auditors deﬁne the claims, they must

identify, create, and collect evidence (Raji et al., 2020).

Evidence comprises different subjects: system, model

and data. It can take various forms as documentation,

raw sources like source code, model weights, and raw

data, and ﬁnally, logs (Raji et al., 2020; Brundage et al.,

2020; Tagharobi and Simbeck, 2022; Beckstrom, 2021);

(European Union, 2024, Art. 11, Art. 12).

Different evidence subjects assess speciﬁc aspects

of AI-based LA systems. For the system, evidence

CSEDU 2025 - 17th International Conference on Computer Supported Education

should prove the system’s functionality and its limi-

tations (European Union, 2024, Recital 66). Evidence

must legitimize the underlying design and implementa-

tion choices and organizational processes (Beckstrom,

2021; Raji et al., 2020; European Union, 2024). Evi-

dence for the implemented models contains the algo-

rithms in use, model parameters, and model perfor-

mance indicators (Beckstrom, 2021; Mitchell et al.,

2019; European Union, 2024). Evidence on data in-

forms about structure, provenance, and quality of test,

training, or production data (Beckstrom, 2021; Ash-

more et al., 2022; Gebru et al., 2021; Toreini et al.,

2022; European Union, 2024).

Means of Validation. In the validation step, auditors

access and assess any evidence about the AI-based LA

systems to validate the claims about the system’s valid-

ity, utility, and ethics. Means to validate claims can be

integrated into the AI-based LA system—either as an

interface to access raw data via APIs for further test-

ing, in the form of monitoring tools to observe system

output and parameters and deliver readily interpretable

results or as explainable AI principles on dashboards

(Fernsel et al., 2024b; Eitel-Porter, 2021; Bharadhwaj

et al., 2021). The availability and ease of access to the

evidence to validate claims determine the auditability

of AI-based systems. It can further be an indicator of a

system’s transparency.

Utilizing the Framework. The framework aims to

assess the auditability of an AI-based LA system and

facilitate the design of auditable systems. We discussed

several aspects of auditability, and their respective rele-

vance varies with every audit situation. The identiﬁed

claims dictate the required evidence subjects (system,

model, data) and evidence types (sources, documen-

tation, logs). The evidence types, in turn, specify the

technical means of validation. When utilizing our frame-

work to assess a system’s auditability, auditors must

consider which evidence is necessary to prove a stated

claim and judge whether the evidence is sufﬁciently

available and accessible.

Gathering claims requires a heuristic search, docu-

ment analysis, and Q&A interviews with responsible po-

sitions to determine relevant claims and their hierarchy.

Gathering evidence depends on subject and type and is

closely related to the technical means of validation. Doc-

umentation is the most essential evidence, as it is the

easiest to access and understand (Beckstrom, 2021). Ar-

guably, the most challenging evidence could be source

codes or logs of proprietary or security-sensitive sys-

tems (Alikhademi et al., 2022). However, evidence in

the form of sources and logs must not be neglected: it

may be necessary to complete information from the

documentation or establish the credibility of the docu-

mentation (Beckstrom, 2021).

For the auditing practice, the framework can be used

to deﬁne ex-ante responsibilities in the audited orga-

nization for providing claims and evidence. Further,

it can be operationalized as a checklist to control the

auditability of a system as an initial audit step or in the

development cycle of a system. Since system develop-

ment is an ongoing process, the framework assists in

assessing the auditability on the developers’ side and

offers guardrails for quality assurance measures.

4 CASE STUDY

In this section, we apply the proposed auditability frame-

work to assess the auditability of the dropout prediction

system in Moodle 4.3. Table 1 lists the derived claims,

and table 2 summarizes the results of desirable and

available claims.

Moodle’s dropout prediction system aims to prevent

students from dropping out of a course (Monlla

o Oliv

et al., 2018). The software ships with an un-trained

machine learning model (a model conﬁguration) that,

once trained on a particular Moodle platform, predicts

whether a student is likely to drop out of a course (Moo-

dle, 2023a; Monlla

o Oliv

e et al., 2018). A model con-

ﬁguration can be tested in an “evaluation mode” before

going live (Moodle, 2023a).

We chose this use case because Moodle is a com-

monly used open-source learning management system

with potentially “high-risk” AI-based LA components

under the AI Act. Additionally, dropout prediction mod-

els have repeatedly been shown to work better for major-

ity groups better represented in training data (Gardner

et al., 2019; Rzepka et al., 2022). Therefore, there is a

risk that some groups of students beneﬁt less from the

AI-based LA module than others.

4.1 Claims

We consult the documentation of Moodle’s student

dropout prediction system and additional literature to

identify claims. The ﬁrst claim v1 is that “[t]he accu-

racy and recall of the presented prediction model for

predicting at-risk students are good for a production

system” (Monlla

o Oliv

e et al., 2018). Since the dropout

prediction model design is based on the “Community

of Inquiry” framework (Garrison et al., 1999), a second

claim v2 is that cognitive depth (metric applied in Moo-

dle for “cognitive presence” of a student) and social

breadth (metric applied in Moodle for “social presence”

of a student) are valid indicators for dropout prediction

(Moodle, 2023a).

Audits for Trust: An Auditability Framework for AI-Based Learning Analytics Systems

Table 1: Overview of claims made for Moodle dropout prediction system by type. Bold blue: sufﬁcient evidence available across

types and subjects to validate claim; else: claim cannot be veriﬁed, even with some available evidence.

Type Claim

Validity (v1) sufﬁciently good predictions

(v2) cognitive depth and social breadth are valid indicators

Utility (u1) reduced dropout in online courses

Ethics (e1) AI-created predictions are marked as such

(e2) stakeholders can decide whether to use the system

(e3) GDPR conformity

(e4) equal efﬁciency for learners of different locations and ﬁnancial backgrounds

Table 2: Overview of desirable and available evidence for each claim by evidence type and subject.

✓

: required evidence of type

and subject is fully available.

□

: incomplete evidence of type and subject. Bold blue: sufﬁcient evidence available across types

and subjects to validate claim; else: claim cannot be veriﬁed, even with some available evidence.

Evidence type

Evidence subject

Documentation Logs Sources

System

Functionality & Limitations e1✓, e2✓, e3□, e4□ e3✓, e4✓

Design & Implementation choices v1□, v2□, u1□, e3□, e4□

Organizational processes v1✓, u1□, e3□, e4□

Model

Algorithms v1✓, e3□, e4□ e3✓, e4✓

Parameters v1□, e4□ v1✓, e4✓

Performance e4□ v1□, u1□, e4□

Data v1□, v2□, u1□, e4□

Structure v1✓

Provenance v1✓, v2□, e4□ v2□, e4□

Quality v1□, e4□

Furthermore, broad references can be found to the

utility of the dropout prediction system (is the system

functionality useful in its speciﬁc context?). Moodle’s

LA system should “not only predict events, but change

them to be more positive” (Moodle, 2023a). The Moo-

dle documentation asserts that the dropout prediction

system is most useful for courses that run entirely online

due to features that rely on Moodle activities (Moodle,

2023a). As utility claim u1, we can formulate that the

dropout prediction system reduces dropout rates in on-

line courses.

MoodleHQ, the organization leading the develop-

ment of Moodle, explains which ethical principles drive

the implementation and use of AI in Moodle: Users

should always know when AI is used, stakeholders

should be able to decide which AI components to use,

AI components should preserve users’ data privacy and

security, and AI components should be efﬁcient for all

learners, “regardless of their location or ﬁnancial sit-

uation” (Moodle, 2023b). Four ethics-related claims

can be derived from these principles to audit Moodle’s

dropout prediction system. Firstly, dropout predictions

are marked as being calculated by an AI (e1). Secondly,

stakeholders can decide whether to use the dropout

prediction system (e2). This implies that institutions

can activate the feature, and learners can opt in or out.

Third, student data collection, processing, and storage

by the dropout prediction system follows the EU Gen-

eral Data Protection Regulation GDPR (e3). And lastly,

the dropout prediction model is equally efﬁcient for all

learners, regardless of their geographical location and

ﬁnancial background (e4).

4.2 Required Evidence

We established that evidence in the form of documenta-

tion (Raji et al., 2020; Beckstrom, 2021), raw sources

(Tagharobi and Simbeck, 2022; Beckstrom, 2021) and

logs (Brundage et al., 2020) is suitable to verify claims.

Evidence can concern aspects of Moodle’s dropout

prediction system (the system), the dropout prediction

model, and the underlying data. As noted before, not

all evidence must be available to validate all claims.

This section examines which evidence is required to

validate which claim. The claims and evidence subjects

are indicated in cursive.

Validity. To prove v1 (sufﬁciently good predictions),

the most reliable way would be to reproduce the quality

CSEDU 2025 - 17th International Conference on Computer Supported Education

assessment conducted by (Monlla

o Oliv

e et al., 2018).

To evaluate the dropout prediction conﬁguration, audi-

tors require access to a Moodle system with test data

(data sources) to calculate the model performance. We

call this type of system a “test system”. In the absence

of openly available test data, additional documentation

on data quality requirements is helpful for the acquisi-

tion of suitable test data. In this use case, data can only

be acquired by exporting data from a Moodle platform,

not through synthesis. This is ﬁrstly because of the lack

of seed data. Secondly, even if sufﬁcient information

on data properties was available, auditors cannot im-

port data for model input. Instead, the model requires

meaningful related data, which cannot be synthesized

(Fernsel et al., 2024b).

If suitable test data cannot be obtained, at least the

reliability of the evaluation conducted by Monlla

o Oliv

can be judged. For that, the auditor needs to know the

details of the quality evaluation (organizational pro-

cesses) and the properties of the used training and test

data (data structure, provenance, and quality). Addi-

tionally, information on model algorithms (training and

testing, including feedback loops) and which model

parameters were chosen and why (system design and

implementation choices) could help to identify erro-

neous implementations that lead to invalid evaluation

results.

To prove v2 (cognitive depth and social breadth are

valid indicators), auditors need to verify whether this

claim is scientiﬁcally sound and supported by studies.

This information can be expected in the documenta-

tion on the translation of the “Community of Inquiry”

framework into cognitive depth and social breadth in-

dicators (system design and implementation choices).

Trust can be increased by examining the importance

of each indicator on the predictions made on a Moodle

instance that is already using the dropout prediction

model (data provenance). We call such systems “pro-

duction systems”.

Utility. The utility-related claim (reduced dropout

in online courses) requires evidence that indicates the

system’s impact on student behavior in online courses.

This information could be found in the documentation

on the scientiﬁc foundation of the dropout prediction

system and in conducted studies (design and implemen-

tation choices), as well as in applied evidence-based

design methods (organizational processes). If such evi-

dence is unavailable, auditors may verify the system’s

utility by analyzing the feedback given by humans for

predictions; i.e., did a student drop out and did Moodle

predict this correctly (model performance)—provided

that they have access to a production system.

Ethics. To validate e1 (AI-created predictions are

marked as such) and e2 (stakeholders can decide

whether to use the system), documentation on system

functionality and limitations can be helpful. To assess

e3 (GDPR compliance) documentation on system func-

tionality and limitations, design and implementation

choices, algorithms and organizational processes could

show whether data privacy and security mechanisms

have been included. Analyzing the source code could

provide detailed information about the system’s behav-

ior.

Several pieces of evidence can help verify e4 (equal

model performance across groups). The documentation

on system functionality and limitations, design and

implementation choices, and organizational processes

could reveal structural issues that might lead to a biased

system. It could also contain information on how risks

are handled. Evidence on the model algorithms and

parameters can uncover further potential for ethical

issues (Tagharobi and Simbeck, 2022). Documentation

of the model performance by risk group could indicate

the equality of prediction quality for different groups.

Properties of the training and test data (data provenance,

data quality) must be known to ensure the validity of

the performance evaluation. Trust can be increased if

data is available for reproducing or extending quality

measurements.

4.3 Means of Validation

Where evidence is not directly available, it should be

made accessible through means of validation. We ar-

gued that evidence can be made accessible through

APIs, monitoring, or explainable AI mechanisms. In this

subsection, we assess to what extent Moodle’s dropout

prediction system implements interfaces to access and

collect evidence for validating claims.

API. Moodle does not provide an API for secure third-

party access to the dropout prediction system. However,

the internal “Analytics API” may be used to access

and extend the machine learning capabilities of Moodle

with a plugin (Monlla

o Oliv

e et al., 2018), e.g., to add

further monitoring functions. In a different publication,

we used this approach to increase the auditability of

Moodle successfully (Fernsel et al., 2024b).

Monitoring. The primary monitoring capability is the

“evaluation mode” for evaluating model conﬁgurations

or models trained on other Moodle instances (Moodle,

2023a). An auditor needs access to a test system to use

this monitoring capability. The “evaluation mode” trains

a new model on some of the data from ﬁnished courses

on the platform and then tests it against the remaining

Audits for Trust: An Auditability Framework for AI-Based Learning Analytics Systems

data (Moodle, 2023a). A model trained on a different

platform is evaluated by testing it against the data on

the new platform. When doing so, the model trained

for evaluation is not retained. The “evaluation mode”

returns two values per selected analysis interval: the

weighted F1-score and the standard deviation (Moodle,

2023a). Depending on the chosen machine learning

module, more values like the Matthews’ correlation

coefﬁcient may be returned (Monlla

o Oliv

e et al., 2018).

Smaller monitoring capabilities for production sys-

tems are also available. Auditors can monitor which

courses cannot be used by the model, which students

have been classiﬁed as at risk of dropping out, which

indicators have been calculated for which student, and

which human feedback has been given for the dropout

prediction model: correct, “not applicable” or “incor-

rectly ﬂagged” (Moodle, 2023a).

Explanations. Moodle integrates explanations for AI

outputs in production systems. As mentioned above,

Moodle monitors the results of the dropout prediction

model together with the calculated indicators per stu-

dent. Moodle highlights inﬂuential indicators to explain

the model result (Moodle, 2023a).

4.4 Evidence Accessibility

Validity. The preferred way to prove v1 is to repro-

duce the dropout prediction model performance assess-

ments from (Monlla

o Oliv

e et al., 2018), which appears

to have been made with Moodle’s “evaluation mode” for

LA models (Moodle, 2023a). The performance assess-

ment requires a test system to obtain logs of the model

performance. However, as previously mentioned, data

sources for the test system are not publicly available.

If no test system is available, the validity of the

quality evaluation may be estimated by reviewing infor-

mation on the properties and production of the system.

Basic information on the structure and provenance

of the data used for MoodleHQ’s quality assessment

describe (Monlla

o Oliv

e et al., 2018). They also elab-

orate on the data quantity. The model training algo-

rithms—logistic regression and a feed-forward neural

network—(model algorithm) and the model evaluation

methods are documented as well (organizational pro-

cesses). Auditors will need to analyze relevant parts

of the source code: the values for ﬁxed model parame-

ters (like the number of training epochs, learning rate,

or batch size) are neither documented nor logged but

can only be found in the source code. Also, a source

code analysis (Tagharobi and Simbeck, 2022) found an

undocumented 500MB limit for training data.

To conclude, evidence is insufﬁciently available and

accessible to fully validate claim v1 that the dropout

prediction model correctly predicts dropout risks. The

documentation on the cognitive depth and social breadth

indicators (system design and implementation choices)

offers a starting point for auditors to validate v2 (Moo-

dle, 2023a; Monlla

o Oliv

e et al., 2018). MoodleHQ

does not provide studies that support their indicator

deﬁnitions. If an auditor can access a production sys-

tem, she could review the explanations logged for the

model’s predictions (data provenance) and evaluate the

soundness of the chosen indicators. Since this type of

access could be challenging, we deem the available

evidence insufﬁcient to effectively assess claim v2 that

cognitive depth and social breadth are valid indicators.

Utility. To assess the utility-related claim u1, the ﬁrst

step is to review the available documentation. The sci-

entiﬁc theory behind the choice of model features is

explained thoroughly (Monlla

o Oliv

e et al., 2018), but

no studies on the model’s impact are documented (de-

sign and implementation choices). Project management,

design, and development processes are not documented

either (organizational processes). Production system-

speciﬁc utility may be analyzed by viewing aggregated

information about the feedback given by humans for

predictions (model performance). In summary, the avail-

able evidence does not validate the claim u1 that the

dropout prediction system reduces dropout rates in on-

line courses.

Ethics. To validate ethics-based claim e1, documen-

tation on system functionality and limitations can be

reviewed. The screenshots displayed in the Moodle doc-

umentation show that users viewing dropout predictions

are made aware of the uncertainty of predictions (Moo-

dle, 2023a). However, users are not explicitly informed

that the AI-based LA system calculates the predictions.

The available evidence allows to reject claim e1.

Documentation on system functionality and limita-

tions could also help validate claim e2. The documen-

tation shows that teachers can decide how to use the

predictions, and administrators can turn the dropout pre-

diction system on or off (Moodle, 2023a). Students do

not appear able to opt in or out of being classiﬁed by the

dropout prediction model. In conclusion, the available

evidence indicates that claim e2—stakeholders can de-

cide whether to use the dropout prediction system—can

only partly be conﬁrmed.

Concerning e3, the documentation on system func-

tionality and limitations, design and implementation

choices, and organizational processes does not expli-

cate actions to comply with the GDPR, except that

exportable data is anonymous and access to insights

can be managed (Moodle, 2023a). We conclude that

only a source code analysis (algorithms, system func-

CSEDU 2025 - 17th International Conference on Computer Supported Education

tionality, and limitations) can assess the claim that the

dropout prediction system is GDPR compliant.

To validate e4, documentation on the choice of

model features and their underlying principles (design

and implementation choices), as well as limiting techni-

cal factors (system functionality and limitations) hint at

existing or absent bias in the dropout prediction system

(Monlla

o Oliv

e et al., 2018; Moodle, 2024; Moodle,

2023a). Source code analysis is required to complement

the documentation. However, it cannot rule out any bias

(Tagharobi and Simbeck, 2022). Evaluating the model

performance per group could provide additional evi-

dence on model fairness. Such a quality assessment

is not documented and thus needs to be conducted by

the auditor. Access to a production system (including

the database) and data sources, including demographic

data, is necessary. No evidence could be found that the

risk of model bias was considered in the design and de-

velopment (organizational processes). No information

on the data provenance (e.g., information on data acqui-

sition and pre-processing) or relevant data quality (e.g.,

information on representativeness) for MoodleHQ’s

quality assessment is available (Monlla

o Oliv

e et al.,

2018). We conclude that insufﬁcient evidence is avail-

able and accessible for an efﬁcient audit of claim e4

that model performance is equally high across groups.

5 DISCUSSION

We have demonstrated that our auditability assessment

framework is helpful for AI-based LA systems by suc-

cessfully applying it to Moodle’s dropout prediction

feature.

Through the structured approach of claims, ev-

idence, and means of validation analysis, we especially

predicted challenges that would await an audit of Moo-

dle’s dropout prediction feature. This can inform the

development of suitable ﬁxes and features that retroﬁt

auditability (Fernsel et al., 2024b).

Although Moodle is open source, sufﬁciently docu-

mented, and includes a comprehensive logging system

with explanations, only three of seven identiﬁed claims

are effectively auditable. The lack of documentation

depth primarily constrains auditability. More documen-

tation is needed on system design and implementation

choices to justify the validity and ethical design of the

system. When documentation is incomplete or not trust-

worthy enough, additional evidence for an audit of the

dropout prediction system must be collected from the

system, e.g., by monitoring. Two signiﬁcant challenges

hinder this approach. The ﬁrst challenge is the inade-

quacy of Moodle’s model monitoring capability. Predic-

We demonstrated the applicability of the framework to

prototype AI-based LAs in (Fernsel et al., 2024a).

tions are not preserved when evaluating a model con-

ﬁguration and are inaccessible to the auditor. Thus, the

auditor cannot verify the model’s performance and has

to rely on minimal metrics returned by Moodle. Imple-

menting an API to execute individual evaluation steps

and retrieve intermediate data or extend monitoring

capabilities prevents this shortcoming. The second chal-

lenge is the current absence of publicly available test

data. Therefore, data-based audits of Moodle’s dropout

prediction model are very resource-intensive.

The assessment of Moodle shows that future LA

systems need to provide system access to third-party

auditors, e.g., by creating “auditor” roles, recording

data (anonymized training data, predictions), and en-

abling auditors to control evaluation parameters. The

auditability assessment framework bears one caveat

when applied to any AI-integrating system. It is time

and effort-consuming to assess and improve auditability

if claims and evidence must be prepared and the tech-

nical means for validation are not readily accessible.

We are conﬁdent the improved audit quality will jus-

tify these additional costs. The auditability framework

assists in mitigating challenges and in procuring the

ongoing development of more robust and ethically fair

software for “high-risk” application ﬁelds.

6 CONCLUSION

AI in education continues to gain in importance. Regu-

lar auditing is essential for sustainable learning success

that achieves fair, non-discriminatory applications. De-

spite the increasing demand for auditing AI systems,

auditability is a neglected design requirement for most

AI systems. For this reason, we have sought to deﬁne

auditability to improve transparent and traceable audits

of AI-based Learning Analytics in development and

deployment. Lacking auditability negatively impacts

independent audits, which include lack of documenta-

tion, restricted access to the system and its raw sources

(code, model weights, or data), and incomprehensible

system output (Berghoff et al., 2022; M

okander and

Floridi, 2021; Alikhademi et al., 2022; Tagharobi and

Simbeck, 2022). Additionally, system-independent fac-

tors, such as heterogeneous ethical standards (M

okander

and Floridi, 2021) and difﬁculty achieving test coverage

for AI-integrating systems (Berghoff et al., 2022; Tao

et al., 2019) diminish auditability.

Following a review of auditability in general, AI au-

dit challenges, and factors enabling AI auditability, we

suggest a framework for a systematic approach to assess

and ensure speciﬁc requirements for the auditability of

AI-based LA. Our framework is based on three pillars:

claims, evidence, and means of validation. To make

Audits for Trust: An Auditability Framework for AI-Based Learning Analytics Systems

AI systems auditable, system providers and deployers

must provide certiﬁable claims about utility, validity,

and ethics (Landers and Behrend, 2023; Brundage et al.,

2020). Depending on the claims and the audit proce-

dure, substantial evidence must be made available to

auditors: evidence types include documentation, raw

sources, and logs (Brundage et al., 2020; Tagharobi and

Simbeck, 2022; Beckstrom, 2021; Raji et al., 2020). Ev-

idence subjects are the overall system, models and data

(European Union, 2024). AI-integrating systems should

provide APIs (Springer and Whittaker, 2019), moni-

toring tools (Ashmore et al., 2022; Bharadhwaj et al.,

2021; European Union, 2024; Eitel-Porter, 2021; Al-

hajaili and Jhumka, 2019) and explanations (Brundage

et al., 2020; Shneiderman, 2020; Guidotti et al., 2018)

to enable the validation of evidence. Audit require-

ments and standards for AI audits are being developed.

However, legislators and standardization bodies must

consider auditability requirements as well. We see this

as an important leverage point where our framework

can be applied to derive process requirements for exter-

nal audits, implement auditability by design in the QA

of system development, and give stakeholders a way

to insist on consistent audits. Finally, the framework

supports developing and maintaining robust, trustwor-

thy AI-based LA systems that foster acceptance among

students and teaching professionals.

We conclude that the proposed framework is use-

ful for auditors and system providers to prepare for

an audit and determine how much an AI-integrating

LA system is auditable. Moreover, developers of AI-

integrating systems can beneﬁt from the framework by

identifying areas for improving the auditability of their

products. We appeal to developers of AI-integrating

systems to consider auditability right from the start

when designing their systems to ensure trustworthy,

ethical, and future-ﬁt products that comply with current

and upcoming legislation, such as the European AI Act.

Considering that LA can potentially enhance learning

outcomes (Lang et al., 2022), increasing the auditability

of LA systems ultimately leads to an improved learning

experience for a broader audience.

ACKNOWLEDGEMENTS

This publication is part of the research project “Fair

Enough? Investigating the fairness of learning analytics

systems”, which was funded by the German Federal

Ministry of Education and Research (BMBF) Grant

No.: 16DHB4002/3. The authors would like to thank

the various reviewers who provided valuable comments

at different stages of the paper.

REFERENCES

Alagi

c, A., Turulja, L., and Bajgori

c, N. (2021). Identiﬁcation

of Information System Audit Quality Factors. Journal

of Forensic Accounting Profession, 1(2):1–28.

Alhajaili, S. and Jhumka, A. (2019). Auditability: An Ap-

proach to Ease Debugging of Reliable Distributed Sys-

tems. In 2019 IEEE 24th Paciﬁc Rim International

Symposium on Dependable Computing (PRDC), pages

227–2278, Kyoto, Japan. IEEE.

Alikhademi, K., Drobina, E., Prioleau, D., Richardson, B.,

et al. (2022). A review of predictive policing from the

perspective of fairness. Artiﬁcial Intelligence and Law,

30(1):1–17.

Alla, S. and Adari, S. K. (2021). What Is MLOps? In

Beginning MLOps with MLFlow: Deploy Models in

AWS SageMaker, Google Cloud, and Microsoft Azure,

pages 79–124. Apress, Berkeley, CA.

Ashmore, R., Calinescu, R., and Paterson, C. (2022). Assuring

the Machine Learning Lifecycle: Desiderata, Methods,

and Challenges. ACM Computing Surveys, 54(5):1–39.

Awwad, Y., Fletcher, R., Frey, D., Gandhi, A., et al. (2020). Ex-

ploring Fairness in Machine Learning for International

Development. MIT D-Lab, Cambridge.

Ayling, J. and Chapman, A. (2022). Putting AI Ethics to

Work. AI and Ethics, 2(3):405–429. PII: 84.

Baek, C. and Doleck, T. (2023). Educational Data Mining

versus Learning Analytics: A Review of Publications

From 2015 to 2019. Interactive Learning Environments,

31(6):3828–3850.

Baker, R. S. and Hawn, A. (2022). Algorithmic Bias in Edu-

cation. International Journal of Artiﬁcial Intelligence

in Education, 32(4):1052–1092.

Beckstrom, J. R. (2021). Auditing machine learning algo-

rithms. A white paper for public auditors. International

Journal of Government Auditing, 48(1):40–41.

Berghoff, C., B

oddinghaus, J., Danos, V., Davelaar, G., et al.

(2022). Towards Auditable AI Systems: From Principles

to Practice.

Bharadhwaj, H., Huang, D.-A., Xiao, C., Anandkumar, A.,

et al. (2021). Auditing AI models for Veriﬁed Deploy-

ment under Semantic Speciﬁcations.

Bhatt, U., Xiang, A., Sharma, S., Weller, A., Taly, A., et al.

(2020). Explainable machine learning in deployment.

In Proceedings of the 2020 Conference on Fairness,

Accountability, and Transparency, FAT* ’20, pages 648–

657. ACM.

Bose, R. P. J. C., Singi, K., Kaulgud, V., Phokela, K. K., and

et al. (2019). Framework for Trustworthy Software

Development. In 2019 34th IEEE/ACM International

Conference on Automated Software Engineering Work-

shop (ASEW), pages 45–48, San Diego, CA. IEEE.

Brundage, M., Avin, S., Wang, J., Belﬁeld, H., et al. (2020).

Toward Trustworthy AI Development: Mechanisms for

Supporting Veriﬁable Claims.

Crescenzi-Lanna, L. (2020). Multimodal Learning Analytics

Research with young Children: A systematic Review.

British Journal of Educational Technology, 51(5):1485–

1504.

CSEDU 2025 - 17th International Conference on Computer Supported Education

Dorodchi, M., Al-Hossami, E., Benedict, A., and Demeter,

E. (2019). Using synthetic data generators to promote

open science in higher education learning analytics. In

2019 IEEE International Conference on Big Data (Big

Data), pages 4672–4675. IEEE.

Drugova, E., Zhuravleva, I., Zakharova, U., and Latipov, A.

(2024). Learning Analytics driven Improvements in

Learning Design in higher Education: A systematic Lit-

erature Review. Journal of Computer Assisted Learning,

40(2):510–524.

Eitel-Porter, R. (2021). Beyond the promise: implementing

ethical AI. AI and Ethics, 1(1):73–80.

El Emam, K., Mosquera, L., and Hoptroff, R. (2020). Practi-

cal Synthetic Data Generation. O’Reilly Media, Inc.

Elmoazen, R., Saqr, M., Khalil, M., and Wasson, B. (2023).

Learning Analytics in virtual Laboratories: A system-

atic Literature Review of empirical Research. Smart

Learning Environments, 10(1).

European High-Level Expert Group on AI (2019). Ethics

guidelines for trustworthy AI — Shaping Europe’s digi-

tal future.

European Parliament (2023). Amendments adopted by the

European Parliament on 14 June 2023 on the proposal

for a regulation of the European Parliament and of the

Council on laying down harmonised rules on artiﬁcial

intelligence (Artiﬁcial Intelligence Act) and amending

certain Union legislative acts.

European Union (2024). Regulation (EU) 2024/1689 of the

European Parliament and of the Council of 13 June 2024

laying down harmonised rules on artiﬁcial intelligence.

Falco, G., Shneiderman, B., Badger, J., Carrier, R., Dahbura,

A., Danks, D., Eling, M., Goodloe, A., Gupta, J., Hart,

C., Jirotka, M., Johnson, H., LaPointe, C., Llorens, A. J.,

Mackworth, A. K., Maple, C., P

alsson, S. E., Pasquale,

F., Winﬁeld, A., and Yeong, Z. K. (2021). Governing

AI safety through independent audits. Nature Machine

Intelligence, 3(7):566–571.

Fernsel, L., Kalff, Y., and Simbeck, K. (2024a). Assessing

the auditability of ai-integrating systems: A framework

and learning analytics case study. arXiv, 2411.08906.

Fernsel, L., Kalff, Y., and Simbeck, K. (2024b). Where is

the evidence? In Poquet, O., Ortega-Arranz, A., Viberg,

O., Chounta, I.-A., McLaren, B. M., and Jovanovic, J.,

editors, Proceedings of the 16th International Confer-

ence on Computer Supported Education (CSEDU 2024),

volume 2, pages 262–269.

Fjeld, J., Achten, N., Hilligoss, H., Nagy, A., et al. (2020).

Principled Artiﬁcial Intelligence: Mapping Consensus

in Ethical and Rights-Based Approaches to Principles

for AI. Berkman Klein Center Research Publication,

(1).

Galdon-Clavell, G., Zamorano, M. M., Castillo, C., Smith, O.,

et al. (2020). Auditing Algorithms: On Lessons Learned

and the Risks of Data Minimization. In Proceedings of

the AAAI/ACM Conference on AI, Ethics, and Society,

pages 265–271, New York. ACM.

Gardner, J., Brooks, C., and Baker, R. (2019). Evaluating the

Fairness of Predictive Student Models Through Slicing

Analysis. In Proceedings of the 9th International Con-

ference on Learning Analytics & Knowledge, LAK19,

pages 225–234, New York. ACM.

Garrison, D. R., Anderson, T., and Archer, W. (1999). Crit-

ical Inquiry in a Text-Based Environment: Computer

Conferencing in Higher Education. The Internet and

Higher Education, 2(2):87–105.

Gebru, T., Morgenstern, J., Vecchione, B., Vaughan, J. W.,

et al. (2021). Datasheets for datasets. Communications

of the ACM, 64(12):86–92.

Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti,

F., and Pedreschi, D. (2018). A Survey of Methods

for Explaining Black Box Models. ACM Computing

Surveys, 51(5):93:1–93:42.

Haim, A., Shaw, S., and Heffernan, N. (2023). How to open

science: A principle and reproducibility review of the

learning analytics and knowledge conference. In LAK23:

13th International Learning Analytics and Knowledge

Conference, pages 156–164.

Heikkinen, S., Saqr, M., Malmberg, J., and Tedre, M. (2023).

Supporting self-regulated Learning with Learning An-

alytics Interventions: A systematic Literature Review.

Education and Information Technologies, 28(3):3059–

3088.

Holland, S., Hosny, A., Newman, S., Joseph, J., and Chmielin-

ski, K. (2018). The Dataset Nutrition Label: A Frame-

work To Drive Higher Data Quality Standards.

Kale, A., Nguyen, T., Harris, Frederick C., Jr., Li, C., Zhang, J.,

and Ma, X. (2022). Provenance documentation to enable

explainable and trustworthy AI: A literature review.

Data Intelligence, pages 1–41.

Khalil, M., Slade, S., and Prinsloo, P. (2023). Learning Ana-

lytics in Support of Inclusiveness and disabled Students:

A systematic Review. Journal of Computing in Higher

Education, pages 202–219.

Kitto, K. and Knight, S. (2019). Practical ethics for build-

ing learning analytics. British Journal of Educational

Technology, 50(6):2855–2870.

Knechel, W. R., Krishnan, G. V., Pevzner, M., Shefchik, L. B.,

and Velury, U. K. (2013). Audit Quality: Insights from

the Academic Literature. AUDITING: A Journal of

Practice & Theory, 32(Supplement 1):385–421.

Kreuzberger, D., K

uhl, N., and Hirschl, S. (2022). Machine

Learning Operations (MLOps): Overview, Deﬁnition,

and Architecture.

Landers, R. N. and Behrend, T. S. (2023). Auditing the AI

Auditors. A Framework for Evaluating Fairness and

Bias in high stakes AI predictive Models. The American

psychologist, 78(1):36–49.

Lang, C., Siemens, G., Friend Wise, A., Ga

sevi

c, D., and Mer-

ceron, A., editors (2022). Handbook of Learning Ana-

lytics - Second edition. Society for Learning Analytics

Research (SoLAR), 2 edition.

Li, Y. and Goel, S. (2024). Artiﬁcial Intelligence Auditability

and Auditor Readiness for Auditing Artiﬁcial Intelli-

gence Systems. SSRN Journal.

Minkkinen, M., Niukkanen, A., and M

antym

aki, M. (2024).

What about investors? ESG analyses as tools for ethics-

based AI auditing. AI & Society, 39(1):329–343. PII:

1415.

Audits for Trust: An Auditability Framework for AI-Based Learning Analytics Systems

Mitchell, M., Wu, S., Zaldivar, A., Barnes, P., Vasserman, L.,

Hutchinson, B., Spitzer, E., Raji, I. D., and Gebru, T.

(2019). Model Cards for Model Reporting. In Pro-

ceedings of the Conference on Fairness, Accountability,

and Transparency, FAT* ’19, pages 220–229, New York.

Association for Computing Machinery.

okander, J., Axente, M., Casolari, F., and Floridi, L. (2022).

Conformity Assessments and Post-market Monitoring:

A Guide to the Role of Auditing in the Proposed Euro-

pean AI Regulation. Minds & Machines, 32(2):241–268.

okander, J. and Floridi, L. (2021). Ethics-Based Audit-

ing to Develop Trustworthy AI. Minds and Machines,

31(2):323–327.

Monlla

o Oliv

e, D., Du Huynh, Q., Reynolds, M., Dougiamas,

M., et al. (2018). A supervised learning framework

for learning management systems. In Proceedings of

the First International Conference on Data Science, E-

learning and Information Systems, DATA ’18, pages

1–8, New York. Association for Computing Machinery.

Moodle (2023a). Documentation.

Moodle (2023b). Moodle and our AI principles.

Moodle (2024). Analytics API.

Naja, I., Markovic, M., Edwards, P., Pang, W., et al. (2022). Us-

ing Knowledge Graphs to Unlock Practical Collection,

Integration, and Audit of AI Accountability Information.

IEEE Access, 10:74383–74411.

Nushi, B., Kamar, E., and Horvitz, E. (2018). Towards Ac-

countable AI: Hybrid Human-Machine Analyses for

Characterizing System Failure. In Proceedings of the

AAAI Conference on Human Computation and Crowd-

sourcing, volume 6, pages 126–135, Zurich, Switzerland.

AAAI Press.

Ouyang, F., Wu, M., Zheng, L., Zhang, L., et al. (2023). Inte-

gration of artiﬁcial intelligence performance prediction

and learning analytics to improve student learning in

online engineering course. International Journal of

Educational Technology in Higher Education, 20(1):4.

Paolucci, C., Vancini, S., Bex Ii, R. T., Cavanaugh, C., Salama,

C., and de Araujo, Z. (2024). A review of learning ana-

lytics opportunities and challenges for K-12 education.

Heliyon, 10(4):e25767.

Prinsloo, P. and Slade, S. (2017). Ethics and Learning Analyt-

ics: Charting the (Un)Charted. In Lang, C., Siemens,

G., Wise, A., Gasevic, D., and University of Edinburgh,

U. K., editors, Handbook of Learning Analytics, pages

49–57. Society for Learning Analytics Research (So-

LAR).

Raji, I. D., Smart, A., White, R. N., Mitchell, M., et al. (2020).

Closing the AI accountability gap: deﬁning an end-to-

end framework for internal algorithmic auditing. In

Proceedings of the 2020 Conference on Fairness, Ac-

countability, and Transparency, FAT* ’20, pages 33–44.

ACM.

Romero, C. and Ventura, S. (2020). Educational data mining

and learning analytics: An updated survey. WIREs Data

Mining and Knowledge Discovery, 10(3).

Rzepka, N., Fernsel, L., M

uller, H.-G., Simbeck, K., and

Pinkwart, N. (2023). Unbias me! Mitigating Algorith-

mic Bias for Less-studied Demographic Groups in the

Context of Language Learning Technology. Computer-

Based Learning in Context, 6(1):1–23.

Rzepka, N., Simbeck, K., M

uller, H.-G., and Pinkwart, N.

(2022). Fairness of In-session Dropout Prediction. In

Proceedings of the 14th International Conference on

Computer Supported Education (CSEDU), pages 316–

326. Scitepress.

Sghir, N., Adadi, A., and Lahmer, M. (2022). Recent advances

in Predictive Learning Analytics: A decade systematic

review (2012-2022). Education and Information Tech-

nologies, pages 1–35.

Shneiderman, B. (2020). Human-Centered Artiﬁcial Intelli-

gence: Three Fresh Ideas. AIS Transactions on Human-

Computer Interaction, pages 109–124.

Simbeck, K. (2024). They shall be fair, transparent, and robust:

auditing learning analytics systems. AI and Ethics, 4.

Slade, S. and Tait, A. (2019). Global guidelines: Ethics in

Learning Analytics. ICDE.

Springer, A. and Whittaker, S. (2019). Making Transparency

Clear: The Dual Importance of Explainability and Au-

ditability. In Joint Proceedings of the ACM IUI 2019

Workshops, page 4, Los Angeles. ACM.

Stoel, D., Havelka, D., and Merhout, J. W. (2012). An analysis

of attributes that impact information technology audit

quality: A study of IT and ﬁnancial audit practition-

ers. International Journal of Accounting Information

Systems, 13(1):60–79.

Susnjak, T. (2024). Beyond Predictive Learning Analyt-

ics Modelling and onto Explainable Artiﬁcial Intelli-

gence with Prescriptive Analytics and ChatGPT. Inter-

national Journal of Artiﬁcial Intelligence in Education,

34(2):452–482. PII: 336.

Tagharobi, H. and Simbeck, K. (2022). Introducing a Frame-

work for Code based Fairness Audits of Learning An-

alytics Systems on the Example of Moodle Learning

Analytics. In Proceedings of the 14th International Con-

ference on Computer Supported Education (CSEDU),

volume 2, pages 45–55. Scitepress.

Tao, C., Gao, J., and Wang, T. (2019). Testing and Quality

Validation for AI Software–Perspectives, Issues, and

Practices. IEEE Access, 7:120164–120175.

Toreini, E., Aitken, M., Coopamootoo, K. P. L., Elliott, K.,

Zelaya, V. G., Missier, P., Ng, M., and van Moorsel, A.

(2022). Technologies for Trustworthy Machine Learn-

ing: A Survey in a Socio-Technical Context.

Weigand, H., Johannesson, P., Andersson, B., and Bergholtz,

M. (2013). Conceptualizing Auditability. In Deneck

ere,

R. and Proper, H. A., editors, Proceedings of CAiSE’13,

page 8, Valencia, Spain. CEUR.

Williams, R., Cloete, R., Cobbe, J., Cottrill, C., et al. (2022).

From transparency to accountability of intelligent sys-

tems: Moving beyond aspirations. Data & Policy,

4(2022).

Wolnizer, P. W. (2006). Auditing as Independent Authentica-

tion. Sydney University Press, Sydnesy.

Xiong, Z., Li, H., Liu, Z., Chen, Z., et al. (2024). A Review of

Data Mining in Personalized Education: Trends and Fu-

ture Prospects Current. Frontiers of Digital Education,

1(26-50).

CSEDU 2025 - 17th International Conference on Computer Supported Education