Regaining Control: Enabling Educators to Build Specialized AI Chat

Bots with Retrieval Augmented Generation

Barbara Pampel

1 a

, Simon Martin

1 b

and Ulrike Pad

2 c

University of Konstanz, Universitaetsstrasse 10, Konstanz, Germany

Hochschule f

ur Technik Stuttgart, Schellingstr. 24, 70174 Stuttgart, Germany

Keywords:

Retrieval-Augmented Generation, Large Language Models, Education.

Abstract:

Conversational AI (chat) bots are powerful and helpful tools, but are not suited for the unrestricted use in many

classrooms: They may hallucinate, easily veer from the topic of instruction, and are vulnerable to malicious

prompting. Retrieval-augmented generation (RAG) is a technique that allows educators to constrain chat bots

to a speciﬁc area of expertise, reducing hallucinations and vulnerability to mis-use. We are working on a

low-code solution that enables tech-savvy educators to build such a RAG-based chat bot system themselves,

thus retaining full control over the content and behavior of their bot. We present the ﬁrst version of this system

and promising initial feedback from educators and students on its suitability, reliability and ﬂexibility.

1 INTRODUCTION

Recent studies show the growing use of AI chat bots

based on Large Language Models (LLMs) among

school and university students, but raise concerns

about unsupervised and unreﬂective use without con-

sidering ethical or academic risks – not just pla-

giarism, but also a growing dependence on support

systems that may lead to bypassing critical thinking

processes (Medienp

adagogischer Forschungsverbund

udwest, 2024; Abbas et al., 2024; S

uße and Kobert,

2023; Ng et al., 2024). For example, students might

exploit these tools to complete assignments without

engaging in the learning process (Chang et al., 2023).

Further, the lack of connection between AI systems

and established educational theories creates a gap in

aligning these tools with curriculum-speciﬁc goals

and desired learning paths(Ouyang and Jiao, 2021).

Adding to the known problem of hallucinations in

LLM-generated text (Maynez et al., 2020), the level

of detail in the bot’s output can be inappropriate for

a speciﬁc course or teaching session, as an educator

usually does not have control over the output. This

can lead educators to completely ban AI tools.

On the other hand, there are various beneﬁts that

AI systems can offer learners, such as chat bots sup-

https://orcid.org/0000-0001-6492-0381

https://orcid.org/0009-0000-4149-7189

https://orcid.org/0009-0000-0664-7487

porting self-regulated learning (Chang et al., 2023).

In addition, we need to promote future-oriented learn-

ing. Educators and students must acquire skills in

handling cutting-edge technologies and prepare for

the demand in higher education and the workplace,

where AI systems will increasingly support various

tasks. Although several initiatives are currently be-

ing implemented, there still is a long road ahead (for

Germany, see Budde et al. (2024)).

With the technique of Retrieval-Augmented Gen-

eration (RAG, Lewis et al. 2020), AI systems can be

customized not only to meet students’ support needs

(e.g., learning level, language, subject matter) but also

to reﬂect educators’ decisions about what information

the system should contain and how it should react to

speciﬁc requests (e.g., for solving homework). This

also means that a RAG-based system is much less

vulnerable to malicious queries (prompt injection at-

tacks, e.g., Perez and Ribeiro 2022), because these

will not match the system’s knowledge. Further, using

locally hosted Open Source solutions can safeguard

sensitive student data and reduce cost.

Learners can access knowledge through these AI

systems interactively and without external restric-

tions, which has the potential to increase their engage-

ment and can encourage independent learning and re-

duce educators’ workload on routine questions.

For educators, active involvement in the develop-

ment of the RAG systems promotes both media lit-

eracy and critical engagement with AI systems. Us-

Pampel, B., Martin, S. and Padó, U.

Regaining Control: Enabling Educators to Build Specialized AI Chat Bots with Retrieval Augmented Generation.

DOI: 10.5220/0013425500003932

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 17th International Conference on Computer Supported Education (CSEDU 2025) - Volume 2, pages 371-378

ISBN: 978-989-758-746-7; ISSN: 2184-5026

371

ing custom AI systems ensures transparency and con-

trol over the information provided to students, and

designing system prompts allows tailoring the AI’s

role (e.g., tutor vs. discussion partner). We would

like educators to be more than passive users and in-

stead actively shaping the future of teaching with

and about AI technologies and enriching the teach-

ing landscape. We therefore designed a no-code RAG

system template that tech-savvy educators should be

able to recreate after a brief introduction, and example

use cases including the design of the knowledge base

and system prompts, that can be adapted to individual

needs (see Section 2).

We built our template with the Will, Skill, Tool,

Pedagogy (SWTP) model by Knezek and Christensen

(2015) in mind, which provides a framework for un-

derstanding the factors that inﬂuence how technology

is used in teaching. The aspect Will relates to the mo-

tivation and positive disposition of educators to adopt

technological innovations. Importantly, it refers to

a general readiness to experiment and test new ap-

proaches. The Skill dimension encompasses the tech-

nical competencies and conﬁdence needed to effec-

tively use and integrate technology, which means pro-

ﬁciency in using tools and adjusting them to differ-

ent educational contexts. The Tool dimension refers

to the availability and accessibility of necessary tech-

nological resources, such as platforms, software, and

devices. Finally, the Pedagogy dimension addresses

the teaching strategies and instructional approaches

that incorporate technology to enhance learning out-

comes. Various studies (Velazquez, 2006; Chris-

tensen and Knezek, 1999; Knezek and Christensen,

2015) found different strengths of these aspects in

predicting Technology Integration using slightly dif-

ferent models for different groups of educators (re-

garding country and skill level). We address all four

aspects: We gather motivated educators in our work-

shops and show them the potential of such systems

(Will), teach them the basic concepts of Generative

AI and RAG and how to use and adapt such systems

(Skill) with prototypes we developed in a no-code

framework using Open Source LLMs (Tool) and rec-

ommend initial typical use-cases (Pedagogy).

Due to the current high interest and growing

demand for AI-powered educational tools, we fol-

low an agile development approach accompanied by

a Design-Based Research methodology (Reimann,

2010). At different stages of development, we have so

far made various iterations of the RAG system avail-

able to students, monitored and evaluated its use, and

collected detailed feedback from participants to guide

further improvements (see Section 3).

Figure 1: Retrieval-Augmented Generation for Dialogue

(simpliﬁed).

1.1 Retrieval-Augmented Generation

(RAG)

LLM-based chat bots have inherent factual knowl-

edge from the training process (Petroni et al., 2019) as

well as knowledge of the desired behavior in conver-

sation through ﬁne-tuning for the dialogue task. How-

ever, the inherent knowledge of LLMs may be out-

dated or at the wrong level of detail for the use case.

In addition, utterances may include plausible halluci-

nations (Maynez et al., 2020) – erroneous statements

that are hard to identify for the user.

Retrieval-Augmented Generation (Lewis et al.

2020, see, e.g. Fan et al. 2024 for an overview), en-

hances and deﬁnes the relevant context for the con-

versation, thus strictly delimiting the facts used in

conversation. Figure 1 shows a (simpliﬁed) example:

The user query is initially matched against documents

relevant to the intended use case that have been col-

lected in a Knowledge Base. These are preprocessed

into snippets that can be retrieved, e.g., by specialized

language models (Fan et al., 2024) whenever they are

relevant to the user’s query. The LLM is then given

not just the user query, but also the relevant context

information needed to reliably answer it. This means

that it is only tasked with phrasing the given informa-

tion coherently (and appropriately for the context).

In a teaching context, this combined strategy al-

lows the educator to specify reliable and up-to-date

information at the right level of detail. Hallucinations

are signiﬁcantly reduced (Shuster et al., 2021), and

conversation can no longer drift away from the in-

tended topic, avoiding hacking attempts by prompting

or attempts to generate inappropriate output (prompt

injection, Yu et al. 2023) – a real concern in a teach-

ing setting (see Section 3.2). At the same time, the

LLM contributes ﬂuent and coherent output using the

retrieved information snippets.

CSEDU 2025 - 17th International Conference on Computer Supported Education

372

1.2 RAG in Education

Some projects take ﬁrst steps to integrate RAG in

teaching, but prototypes are rarely ready to use with

students in more than a pilot setting, and even fewer

systematic evaluations can be found in the literature.

Dong et al. (2023) used the OpenAI Assistants

API to build a RAG system where educators can up-

load course materials and tested it with a group of stu-

dents. The feedback for this trial was very positive

regarding ease of use and relevance of the answers,

but mixed regarding the level of detail of the answers.

In addition, the use of a commercial system raised

concerns about costs and data privacy. Mullins et al.

(2024) tried to address this by using Open Source sys-

tems. They tested a RAG system using a Llama LLM

and a Chroma vector database and Reddit as a data

source. This last point was problematic, resulting in a

low correctness rate of the output (below 50%). Kahl

et al. (2024) used chat bots for robotics education and

tried prompt engineering, LLM ﬁne-tuning and RAG

systems to improve factual accuracy. Here, especially

RAG was found to be suitable to improve the educa-

tional utility of LLMs in specialized domains.

In some projects custom RAG-based chat bot pro-

totypes were developed for (Higher) Education and

evaluated by educators. Dakshit (2024) for exam-

ple received positive feedback from computer science

faculty members on the potential of an RAG system

serving as a teaching aid for lecturers and as teach-

ing assistants answering students questions. Still, the

participants highlighted the need for careful consid-

eration of ethical implications and appropriate safe-

guards to ensure that the implementation of such sys-

tems is responsible and effective.

Most systems we found do not have an interface

for learners yet, so they are not ready to be deployed

as safeguarded AI-teaching assistants. Much further

advanced is the OwlMentor by Th

us et al. (2024),

which has a complex user interface with a chat func-

tion including RAG. It has been evaluated quite exten-

sively with positive results for its usability, but for the

task of helping students with understanding scientiﬁc

papers, no direct correlation between the use of the

system and learning gains was found. Additionally,

it relies on OpenAI’s GPT-3.5 Turbo model, a cost-

incurring component. Similarly, the now regularly

used CS50 Duck, developed by Liu et al. (2024) for

Harvard’s CS50 course, was tested to be very reliable

and received very positive feedback from students.

However, it uses the more expensive OpenAI GPT-

4 model. Furthermore, both OwlMentor and CS50

Duck were developed speciﬁcally for these courses

and, while the results highlight the potential of RAG

in education, they cannot be replicated without exten-

sive technical expertise.

Our focus is not only on the usability and effec-

tiveness of RAG in higher education but also on how

feasible it is for tech-savvy educators to recreate and

adapt a system themselves – this seems as yet implau-

sible for all of the systems mentioned above.

There already are some online platforms (gen-

erally requiring paid subscriptions) offering no-code

custom chat bots including the possibility to upload

own material and deﬁne a system prompt, among

them Custom GPT

by OpenAI, but the control over

their behavior is limited. In our tests, we could easily

lead the chat bots away from the knowledge base, trig-

ger hallucinations, and found them very vulnerable to

prompt injection. Further, using these web interfaces

with just text ﬁelds for prompts and buttons to upload

ﬁles, the structure and functionality of such systems

remain completely hidden, so educators would only

learn how to use them and not how they work.

2 TECHNICAL SETUP

2.1 Langchain, Flowise and Our Flow

The central part of creating a custom RAG system

is setting up the interaction with the LLM of choice.

But direct interaction with LLMs is often not possible

without deeper understanding of programming con-

cepts, API integration, and managing computational

resources to effectively query and deploy these mod-

els in applications.

LangChain

is a framework designed for building

applications that integrate LLMs and can signiﬁcantly

simplify several aspects of working with LLMs.

LangChain provides high-level abstractions that re-

duce the need for extensive programming knowl-

edge. Developers can use prebuilt components in-

stead of writing complex logic from scratch. While

LangChain lowers the entry barrier, effective use still

requires substantial programming and computer sci-

ence knowledge, including API conﬁguration, work-

ﬂow design, and integration of external tools.

Flowise

is an open-source, low-code tool based

on LangChain for creating customized LLM work-

ﬂows through a drag-and-drop interface. While this

interface simpliﬁes the process, foundational com-

puter science knowledge is still required for tasks like

conﬁguring APIs, managing data ﬂows, and optimiz-

https://openai.com/index/introducing-gpts/

https://www.langchain.com/

https://github.com/FlowiseAI/Flowise

Regaining Control: Enabling Educators to Build Specialized AI Chat Bots with Retrieval Augmented Generation

373

Figure 2: Flowise template for the implementation of an RAG.

ing workﬂows. However, with the guidance of experts

and a concrete example ﬂow, Flowise can be effec-

tively used by educators to create RAG systems with-

out deep technical expertise.

For this purpose, we designed a Flowise template

for the implementation of an RAG system that can

take on the role of a tutor (shown in Figure 2). The

main component of the template is the Conversa-

tional Retrieval QA Chain (1). Chains serve as the

foundation for building workﬂows that link inputs,

such as user queries, to outputs like responses or re-

trieved information. For example, a Conversational

Retrieval QA Chain combines conversation history

with external knowledge retrieval, enabling the sys-

tem to maintain context while accurately answering

questions by pulling relevant information from an ex-

ternal source. To maintain context, information is re-

trieved from the Buffer Memory (2) that stores the

conversation history. To use information from an ex-

ternal source (3) like a PDF, pre-processing is re-

quired. The text is extracted from a source and split

into chunks using a Recursive Character Text Split-

ter (4). Then, embeddings (5) for these chunks are

created. The embeddings create high-dimensional nu-

merical vectors that capture semantic and contextual

information. The vectors enable efﬁcient similarity

searches. These vectors are stored in a Vector Store

(6) knowledge base. For each query, the Conver-

sational Retrieval Chain retrieves the most relevant

information chunks based on the stored embeddings

from the Vector Store. The chosen LLM (7) uses

the retrieved information chunks and the conversation

context to generate a natural language response ac-

Figure 3: The user interface of SKIT.

cording to the conﬁgurable system prompt that out-

lines the bot’s role and goals.

Different LLMs can be used for this purpose. In

the very early stages of our project, we worked with

GPT-3.5, but later switched to LLMs that can be run

locally, such as Llama, due to data privacy concerns,

increased ﬂexibility, and because it is available at no

cost. At the moment, we primarily use Llama 3.1

SauerkrautLM. Although tests with self-hosted LLMs

were successful, performance was limited due to in-

sufﬁcient computational power on our current virtual

machine. During this research phase, we use the

LLMs hosted by Chat AI (Doosthosseini et al., 2024),

offered by the Gesellschaft f

ur wissenschaftliche

Datenverarbeitung mbH G

ottingen (GDWG), run-

ning on scalable high performance computing sys-

tems with secure cloud access and without storing or

using any user data.

CSEDU 2025 - 17th International Conference on Computer Supported Education

374

To evaluate the created RAG systems, we devel-

oped a small web application called SKIT (Spezial-

isierter KI Tutor, Specialized AI Tutor) that makes

bots powered by different Flowise workﬂows acces-

sible to test users online, while user interactions are

logged locally for analysis. The user interface of

SKIT is shown in Figure 3. Note that it gives no ac-

cess to the knowledge base or system prompt and is

therefore appropriate for independent use by students.

2.2 Assessing Feasibility in an Educator

Workshop

The template for SKIT is meant to provide an appro-

priate Tool (in the Will, Skill, Tool, Pedagogy frame-

work) that ﬁts the Skills of the intended users. To col-

lect feedback on this goal, the Flowise template (not

including SKIT’s web application) was tested during

a workshop attended by 23 educators and university

members engaged in teacher education. Following

a brief introduction to the fundamental concepts, the

template was presented and participants were given

the opportunity to recreate, test, and customize the

system. At the conclusion of the workshop, partic-

ipants were invited to provide feedback on their ex-

periences and perspectives. The results of the Likert-

scale questions are presented in Figure 4, which also

shows the very promising results: Participants on av-

erage found it likely or very likely that a system like

SKIT can be re-created and used by tech-savvy edu-

cators, in school and especially in higher education,

and that it is more suitable than commercial tools.

Furthermore, participants used a set of open ques-

tions regarding the potential and limitations of such

systems to propose scenarios in which they could

be beneﬁcial. Among the responses (translated from

German) were the following: ”Review tool; ﬁrst en-

gagement with a topic when students ask initial ques-

tions; its applicability in multilingual teaching would

be interesting; differentiation in teaching”, ”Assign-

ments can potentially be explained more individually.

(Other languages, simpliﬁcations, etc.; e.g., when

German is not the student’s native language.)” and

”Students who don’t have support at home can ﬁnd

help; I think it’s great that the prepared sources can

be integrated.”. One concern was that ”Students pre-

fer open access and often already have ChatGPT in-

stalled privately (e.g., in business schools).”.

Following the workshop, we got several requests

for the use of the SKIT template, not only from lectur-

ers but also from staff in academic advising and writ-

ing support services, as such systems could also be

very helpful in those areas to reliably and accurately

answer students’ questions. Some staff had already

tried commercial systems like OpenAI’s Custom GPT

or poe.com

but shared our experience that these chat

bots easily break the character deﬁned by the system

prompts and do not reliably restrict their responses to

the provided knowledge base.

Within the Will, Skill, Tool, Pedagogy framework,

these experiences underscore both the Will of edu-

cators and advisors to use reliable, customized chat

bot systems to support their students, and a perceived

good ﬁt of our Tool with the Skill set of future school

teachers and some current staff in Higher Education.

3 INITIAL USE CASES

Further testing regards the Pedagogy aspect of our

guiding framework. In different iterations of the

Design-Based Research cycle, we experimented with

(a) different types of chat bots with different learning

objectives and (b) exposing learners at different stages

of their studies to the chat bot in parallel to their reg-

ular lectures. These experiments show that the chat

bots can be ﬂexibly tailored to the educators’ objec-

tives, are indeed robust and factually reliable in a real-

world setting, and that students are well-equipped and

motivated by previous experience to use them.

3.1 Different Learning Objectives

We tested two types of bots, tailored to different learn-

ing objectives, in an asynchronous online course on

basic computer science knowledge (CS) for 30 stu-

dent teachers from non-CS subjects. The ﬁrst, which

we call a knowledge bot, was designed for a unit on

the history of CS, so mainly focusing on knowledge

content. The second version, which we call explain

bot, was designed primarily to describe and explain

procedures and algorithms, i.e., for units on encoding

numbers and encryption. It included a warning (spec-

iﬁed in the system prompt) whenever it was asked

to do calculations, explaining that this is not one of

its strengths and results should be veriﬁed carefully.

Both bots had a knowledge base extracted from our

digital self-learning units for the courses, containing

many textual descriptions and explanations, which

substantially differ from typical lecture slides.

The knowledge bot was rarely used, so we asked

students for reasons (multiple reasons could be se-

lected). The results indicate that 40% of the par-

ticipants managed very well without assistance and

had no questions or problems with the unit. How-

ever, 25.71% admitted they simply forgot they could

https://poe.com/

Regaining Control: Enabling Educators to Build Specialized AI Chat Bots with Retrieval Augmented Generation

375

1,5

2,5

3,5

4,5

I think that such a system...

… can be recreated by (tech-savvy) teachers.

… can be used by (tech-savvy) teachers.

... can be us efully implemented in school-lessons.

... can be us efully implemented in higher education.

… is more suiable than commercial tools.

Figure 4: Responses to Likert-Scale questions from 1 (for ”not at all”) to 5 (for ”very likely”).

use the bot. Other reasons for not using the bot in-

cluded not trusting its ability to help (2.86%), con-

cerns about anonymity (5.71%), bad experiences with

other chat bots (2.86%), or preferring a familiar chat

bot (5.71%). Interestingly, no participants preferred

human tutors over the bot. Additionally, 17.14%

listed other reasons, such as not having seen the pass-

word until later, being unable to log in, not know-

ing about the bot, attempting the task without help,

ﬁnding it unnecessary to use the chat bot for the spe-

ciﬁc unit, or simply working through the material in-

stead. It is worth mentioning, that the unit already is

designed quite interactively, as it mainly consists of

a digital interactive self-learning unit, including text,

videos, annotated ﬁgures and little quizzes, followed

by (graded) assignments. The results for these assign-

ments were very good with an average of 9.06 out of

10 points, so it seems that in fact no additional help

was needed.

The second bot, equipped to explain procedures

and algorithms, was designed for a more difﬁcult unit

on encoding numbers and characters. Here we mon-

itored 14 conversations with a total of 56 interac-

tions. The most common queries asked for explana-

tions (36%), but many users just directly copied ques-

tions from the assignments or quizzes (25%). Only

16% had general questions like the goal of the unit

or a summary, 14% of the questions were not close

enough to the content and 9% were social interac-

tions like ”hi” or ”thanks”. Apart from some tem-

porary technical problems with follow-up questions

which resulted in unanswered requests, the quality of

answers to successful requests was very good. The

bot either gave a correct answer or directed the user

to a speciﬁc section of the material. It warned the

user when doing calculations but did not even make

any mistakes here. If it could not generate an answer,

it responded with ”Sorry, I am not sure about this!” or

”I am not prepared for this topic”, just like we speci-

ﬁed in the system prompt. Unfortunately, this was the

case even for some questions about the content, due to

the mentioned problems with the follow-up questions,

but this has been solved in the meantime.

Interestingly, apart from one user asking for a

pizza recipe, none of the users tried to explore fur-

ther abilities of the bot, maliciously distract it from

the knowledge base or hijack it through prompt injec-

tion. Keep in mind that the participants do not study

computer science but only take part in this course for

basic CS knowledge. One user, aware that the con-

versations would be logged for research, directly ad-

dressed ”the researchers” in one message.

3.2 Different Learner Groups

Our second use case employs an instance of the

knowledge bot introduced in Section 3.1. We observe

its use in two different classes (and learner groups).

Setup of the bot comprised collecting the relevant

documents for each course (i.e., the existing lecture

slides) and phrasing a welcome message. The stan-

dard SKIT ﬂow was used otherwise, cutting prepara-

tion time for the educator to the minimum.

The students answered a questionnaire about their

previous experience with chat bots and their expecta-

tions of a custom bot, then they had free access to the

bot for several weeks. Participation was voluntary;

we report on the group of participants only.

One learner group consisted of 60 somputer sci-

ence students from a ﬁrst-semester Java program-

ming class

; the other consisted of 34 computer sci-

ence students in their third and fourth semesters in

an AI class. The students were familiar with chat

bots (like ChatGPT): more than 80% of answers from

both groups reported using tools like this at least once

a week and many, more often than that. However,

both groups of students were worried about the re-

All students requested access to the bot, but only 20%

of students answered the questionnaire.

CSEDU 2025 - 17th International Conference on Computer Supported Education

376

liability of chat bot output - half the participants in

each group named this concern (multiple concerns

could be given). This aligns with our motivation for

preparing easy-to-conﬁgure RAG bots to increase the

reliability of answers. The most frequently reported

bot uses were searching for information and, for the

younger students, text generation – the older students

listed coding in second place (multiple uses could be

named). Our knowledge bot was conﬁgured for infor-

mation search but intentionally offered only limited

coding support (no code generation, just information

about available methods and syntax).

We analyzed a total of 573 interactions with the

bots. During the initial weeks, students in both groups

explored the bots’ abilities and tried to generate unin-

tended output through prompt injection. There were

200 such interactions in total, and in the AI class,

where students learnt about the technical background

of chat bots, malicious prompts made up 60% of inter-

actions in one week. However, none of these attempts

was successful. Content-based interactions were suc-

cessful (deﬁned as a relevant bot answer to a student

query) in 42% of cases (some of the failures are likely

due to the technical issues reported above). Suc-

cessful bot output was almost always factually cor-

rect (only two of 155 bot answers were found to be

incorrect). Requests for deﬁnitions, explanations or

lists of items were more successful on average (be-

tween 74 and 56%), requests for non-covered meta-

information (like the course schedule) or for solutions

to the homework sheets always failed. While failure

on deﬁnition or explanation requests is frustrating to

the students, from the educator perspective this be-

havior is preferable to hallucinations.

3.3 Summary of Observations

These use cases highlight the reliability of the output

and the robustness of the RAG approach towards tam-

pering, two of our motivations for promoting RAG for

educational bots. The data also shows that university

students can be expected to be familiar with chat bots

and their drawbacks, which makes them equipped to

use reliable alternatives. However, this also shapes

their expectations, for example for code generation,

which may not ﬁt the educators’ intentions.

We also see that students’ use of the bots strikingly

differs by learner group. Some groups are very inter-

ested in exploring the bot and its robustness towards

malicious prompts, others only use it as intended. If

a learning unit is engaging and well-explained on its

own, the students may even not use the bot at all. In

the second use case, we observed that interest in the

bot waned over time, probably because of initial fail-

ures to answer relevant student requests and because

of the intentionally sparse coding support in the pro-

gramming class. This underscores our intuition that

no single bot is appropriate for all groups, but that ed-

ucators need the freedom to deﬁne specialized bots.

4 CONCLUSIONS

We have presented our work on a low-code Retrieval-

Augmented Generation (RAG) template – a Tool (in

the sense of the Will, Skill, Tool, Pedagogy frame-

work by Knezek and Christensen (2015)) that allows

educators to customize chat bots as resources for their

courses. Customization of the bot is possible both re-

garding the extent and depth of the bot’s knowledge

(through ﬁlling its knowledge base) and regarding the

bot’s behavior in the conversation with students (for

example, as a source of deﬁnitions and explanations

or as a discussion partner). Our goal was to both re-

duce the demands on educators’ skill sets and make it

easier for them to acquire deeper familiarity with AI

tools for teaching, strengthening the Skill dimension.

We have collected ﬁrst feedback on the tool: Ed-

ucators have attested to a good ﬁt with the Skills that

can be expected from tech-savvy teachers, and have

documented their Will to engage with AI tools and use

them for their students’ advantage. We have also col-

lected insights from classroom use of the tool in order

to inform the Pedagogy dimension of the framework.

We ﬁnd that our (university-level) students are famil-

iar with chat bots, but share our concerns about the

reliability and appropriateness of general bots. En-

couragingly, we also the bots themselves proved ro-

bust against tampering and accurate in their replies.

Our next goals are to further develop the usage

scenarios, improve technical aspects of the bots, and

also to develop prompting tips for our students to en-

sure they are getting the most out of their chat bot use.

ACKNOWLEDGEMENTS

This work was partially funded by fellowships to Bar-

bara Pampel and Ulrike Pad

o in the bwDigiFellow-

ship program 2024-2025 of the Stifterverband and

the Ministry of Science, Research and Arts Baden-

urttemberg and by the Federal Ministry of Edu-

cation and Research, project “MINT-ProNeD”, grant

number 01JA23M02K. We also thank the GWDG for

the possibility to use the Chat AI services for our re-

search.

Regaining Control: Enabling Educators to Build Specialized AI Chat Bots with Retrieval Augmented Generation

377

REFERENCES

Abbas, M., Jam, F. A., and Khan, T. I. (2024). Is it harmful

or helpful? examining the causes and consequences of

generative ai usage among university students. Inter-

national Journal of Educational Technology in Higher

Education, 21(1):10.

Budde, J., Tobor, J., and Friedrich, J. (2024). K

unstliche

Intelligenz. Wo stehen die deutschen Hochschulen?

Hochschulforum Digitalisierung.

Chang, D. H., Lin, M. P.-C., Hajian, S., and Wang, Q. Q.

(2023). Educational design principles of using ai chat-

bot that supports self-regulated learning in education:

Goal setting, feedback, and personalization. Sustain-

ability, 15(17):12921.

Christensen, R. and Knezek, G. (1999). Stages of adop-

tion for technology in education. Computers in New

Zealand Schools, 11(3):25–29.

Dakshit, S. (2024). Faculty perspectives on the potential

of rag in computer science higher education. ACM

SIGITE.

Dong, C., Chen, K., Cheng, S., and Wen, C. (2023). How

to build an ai tutor using llms and rag. Unpublished

Manuscript.

Doosthosseini, A., Decker, J., Nolte, H., and Kunkel, J. M.

(2024). Chat ai: A seamless slurm-native solution for

hpc-based services.

Fan, W., Ding, Y., Ning, L., Wang, S., Li, H., Yin, D., Chua,

T.-S., and Li, Q. (2024). A survey on rag meeting llms:

Towards retrieval-augmented large language models.

In Proceedings of the 30th ACM SIGKDD Conference

on Knowledge Discovery and Data Mining, KDD ’24,

page 6491–6501, New York, NY, USA. Association

for Computing Machinery.

Kahl, S., L

ofﬂer, F., Maciol, M., Ridder, F., Schmitz,

M., Spanagel, J., Wienkamp, J., Burgahn, C., and

Schilling, M. (2024). Enhancing ai tutoring in

robotics education: Evaluating the effect of retrieval-

augmented generation and ﬁne-tuning on large lan-

guage models. Autonomous Intelligent Systems

Group.

Knezek, G. and Christensen, R. (2015). The will, skill, tool

model of technology integration: Adding pedagogy as

a new model construct. International Association for

Development of the Information Society.

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin,

V., Goyal, N., K

uttler, H., Lewis, M., tau

Yih, W., Rockt

aschel, T., Riedel, S., and Kiela,

D. (2020). Retrieval-augmented generation for

knowledge-intensive NLP tasks. In Proceedings of

the 34th Conference on Neural Information Process-

ing Systems (NeurIPS 2020), Vancouver, Canada.

Liu, R., Zenke, C., Liu, C., Holmes, A., Thornton, P., and

Malan, D. J. (2024). Teaching cs50 with ai: leveraging

generative artiﬁcial intelligence in computer science

education. In Proceedings of the 55th ACM Techni-

cal Symposium on Computer Science Education V. 1,

pages 750–756.

Maynez, J., Narayan, S., Bohnet, B., and McDonald, R.

(2020). On faithfulness and factuality in abstractive

summarization. In Jurafsky, D., Chai, J., Schluter,

N., and Tetreault, J., editors, Proceedings of the 58th

Annual Meeting of the Association for Computational

Linguistics, pages 1906–1919, Online. Association

for Computational Linguistics.

Medienp

adagogischer Forschungsverbund S

udwest (2024).

Jim-Studie. Jugend, Information,(Multi) Media. Ba-

sisstudie zum Medienumgang.

Mullins, E. A., Portillo, A., Ruiz-Rohena, K., and Piplai, A.

(2024). Enhancing classroom teaching with llms and

rag. In Proceedings of SIGITE 2024, El Paso, TX.

ACM.

Ng, D. T. K., Tan, C. W., and Leung, J. K. L. (2024). Em-

powering student self-regulated learning and science

education through chatgpt: A pioneering pilot study.

British Journal of Educational Technology.

Ouyang, F. and Jiao, P. (2021). Artiﬁcial intelligence in

education: The three paradigms. Computers and Ed-

ucation: Artiﬁcial Intelligence, 2:100020.

Perez, F. and Ribeiro, I. (2022). Ignore previous prompt:

Attack techniques for language models. In Proceed-

ings of the Workshop on Machine Learning Safety at

NeurIPs.

Petroni, F., Rockt

aschel, T., Riedel, S., Lewis, P., Bakhtin,

A., Wu, Y., and Miller, A. (2019). Language models

as knowledge bases? In Inui, K., Jiang, J., Ng, V.,

and Wan, X., editors, Proceedings of the 2019 Con-

ference on Empirical Methods in Natural Language

Processing and the 9th International Joint Conference

on Natural Language Processing (EMNLP-IJCNLP),

pages 2463–2473, Hong Kong, China. Association for

Computational Linguistics.

Reimann, P. (2010). Design-based research. In Method-

ological choice and design: Scholarship, policy and

practice in social and educational research, pages 37–

50. Springer.

Shuster, K., Poff, S., Chen, M., Kiela, D., and Weston, J.

(2021). Retrieval augmentation reduces hallucination

in conversation. In Moens, M.-F., Huang, X., Specia,

L., and Yih, S. W.-t., editors, Findings of the Associ-

ation for Computational Linguistics: EMNLP 2021,

pages 3784–3803, Punta Cana, Dominican Republic.

Association for Computational Linguistics.

uße, T. and Kobert, M. (2023). Generative ai at school-

insights from a study about german students’ self-

reported usage, the role of students’ action-guiding

characteristics, perceived learning success and the

consideration of contextual factors. Zenodo.

us, D., Malone, S., and Br

unken, R. (2024). Explor-

ing generative ai in higher education: A rag system to

enhance student engagement with scientiﬁc literature.

Frontiers in Psychology, 15.

Velazquez, C. M. (2006). Cross-cultural validation of the

will, skill, tool model of technology integration. Uni-

versity of North Texas.

Yu, J., Wu, Y., Shu, D., Jin, M., and Xing, X. (2023). As-

sessing prompt injection risks in 200+ custom gpts.

arXiv preprint arXiv:2311.11538.

CSEDU 2025 - 17th International Conference on Computer Supported Education

378