A Comparison of Different Approaches of Model Editors for Automatic

Item Generation (AIG)

Florian Stahr

1 a

, Sebastian Kucharski

1 b

, Iris Braun

1 c

and Gregor Damnik

2 d

Chair of Distributed and Networked Systems, TUD Dresden University of Technology, Dresden, Germany

Chair of n, Germany

{ﬂorian.stahr, sebastian.kucharski, iris.braun, gregor.damnik}@tu-dresden.de

Keywords:

Automatic Item Generation, AIG, Assessment, Cognitive Model, Item Model.

Abstract:

The Automatic Item Generation (AIG) approach allows users to generate tasks or items based on user-deﬁned

knowledge models created with associated editors. The challenge is that these editors typically require a cer-

tain level of technical expertise, which limits the users who can beneﬁt from the AIG approach. To overcome

this, editors can be used with strict user guidance, following a purist approach to avoid feature overload. How-

ever, once users are familiar with AIG, the purist approach may hinder their productivity. This paper examines

the relationship between the users who can beneﬁt from AIG, the AIG model editing approach used, and its

usability aspects. In addition, it tries to identify further perspectives for the development of AIG model editors

that make them accessible to both experienced and novice users. For this purpose, we conceptualized an editor

that allows more modeling freedom and compared it with a previously developed editor that enforces strict

user guidance. Our evaluation shows that the new editor can use more AIG features, but is harder to get used

to, and that an appropriate approach may be to dynamically adapt the guidance and features based on the user’s

goal and expertise.

1 INTRODUCTION

For the traditional construction of learning tasks and

(self-)test items, two types of knowledge are required

(Figure 1). First, subject-speciﬁc knowledge of do-

main experts is needed to specify the content of the

tasks and items (cf. e.g., (Krathwohl, 2002; Proske

et al., 2012)). For example, knowledge of the dif-

ferent characteristics of ﬁsh and mammals is required

to construct tasks or items related to the species of

vertebrates. Second, constructing tasks or items re-

quires pedagogical knowledge so that learners can un-

derstand and respond to them and achieve their learn-

ing goals. The goal is either to initiate a learning pro-

cess or to verify that a learning process has been suc-

cessfully completed (e.g., (Krathwohl, 2002; Proske

et al., 2012)). For this purpose, different types of tasks

or items, different operators and different interactive

components such as feedback or assistance have to be

considered (e.g., (Proske et al., 2012)). The fact that

https://orcid.org/0009-0001-2825-3507

https://orcid.org/0009-0003-4210-5281

https://orcid.org/0009-0000-0900-2158

https://orcid.org/0000-0001-9829-6994

Figure 1: Knowledge domains required to construct learn-

ing tasks and test items in a traditional manner or according

to the AIG approach.

both types of knowledge are rarely found in the same

person makes the traditional task or item construction

process a complicated one, as a number of experts

must come together to construct a substantial task or

item pool for learning or testing purposes (Gierl et al.,

2012; Damnik et al., 2018).

Automatic Item Generation (AIG e.g., (Embret-

son and Yang, 2007; Gierl et al., 2012; Damnik et

al., 2018; Wancham et al., 2023; Kucharski et al.,

2023; Kucharski et al., 2024)) is an alternative ap-

proach to creating tasks or items that can reduce this

Stahr, F., Kucharski, S., Braun, I. and Damnik, G.

A Comparison of Different Approaches of Model Editors for Automatic Item Generation (AIG).

DOI: 10.5220/0013496000003932

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 17th International Conference on Computer Supported Education (CSEDU 2025) - Volume 1 , pages 765-776

ISBN: 978-989-758-746-7; ISSN: 2184-5026

765

Figure 2: The stages of the traditional AIG process based on (Kosh et al., 2019) and (Gierl and Lai, 2016) that are connected

with the knowledge domains from Figure 1. Adapted from (Kucharski et al., 2024).

high effort per task or item (cf. e.g., (Kosh et al., 2019;

Kucharski et al., 2024)). With this approach, do-

main experts and pedagogical experts do not need to

come together to construct each task or item individu-

ally. Instead, domain experts ﬁrst systematically rep-

resent their subject-speciﬁc knowledge using a cog-

nitive model. The pedagogical experts then use their

knowledge to specify an item model that deﬁnes what

the intended tasks or items should look like. Finally,

a software component uses both models to automati-

cally generate a set of tasks or items (cf. Figure 2 and

(Kucharski et al., 2024)).

One problem with AIG is that although it reduces

the amount of knowledge that needs to be speci-

ﬁed per item by domain experts and pedagogical ex-

perts (cf. e.g., (Kosh et al., 2019; Kucharski et al.,

2024)), it requires a third type of knowledge (see Fig-

ure 1). This is the knowledge of technically experi-

enced users (i.e., computer scientists) that is needed to

translate the systematic representations of the domain

experts and the pedagogical experts into a machine-

readable format, to technically connect the represen-

tations of both experts, and to program the actual al-

gorithms that implement the generation of tasks and

items.

One possible solution to this problem could be a

tool that can be used to specify the required mod-

els, translate them internally into a machine-readable

format, and initiate the generation process. This

tool could reduce the required knowledge of techni-

cally experienced users by encapsulating this type of

knowledge. To do so, it would either have to be self-

explanatory and thus easy to use, or it would have to

include appropriate guidance features such as tutori-

als to make it easy to learn. The AIG Model Editor

(AME) has been developed with the goal to be such a

tool, according to the principle AIG for all (Kucharski

et al., 2023). This freely available editor

has al-

ready been used to generate item pools for various

domains such as biology, psychology, and computer

science that are comparable in quality to manually

constructed items (cf. (Kucharski et al., 2024)).

During several rounds of evaluation of this editor,

it was found that although good usability was exam-

ined for all evaluated user groups after a reasonable

learning process, the desired direction of further de-

velopment of our editor differed between the consid-

ered groups. On the one hand, there are users with a

low level of technical expertise who support the strict

guidance provided or want even more support. The

intention of the development of an editor that is fo-

cused on these users, is to ensure that everyone can

make use of the AIG approach, which is consistent

with our AIG for all principle. On the other hand,

there are users with a high level of technical exper-

tise who would prefer more freedom and less guid-

ance during the modeling process (cf. Section 3.2).

For this user group the goal is to speed up the pro-

cess of generating items using the AIG approach in

order to optimize the use that can be made out of this

approach. These two intentions are somewhat contra-

dictory, since more functionality usually means that a

tool is less easy to use, and thus more likely to be used

by fewer people, or only by certain people (i.e., peo-

ple with a high level of technical expertise). Other-

wise, an approach that allows more modeling freedom

can also be used to take advantage of aspects of the

AIG approach that were previously not applicable be-

https://ame.aig4all.org, November 3, 2024

AIG 2025 - Special Session on Automatic Item Generation

766

cause of the strict guidance provided.

The purpose of this paper is to analyze this rela-

tionship between the two AIG editing approaches, the

types of people who can use AIG, and the exploitable

aspects of the AIG approach. It is structured as fol-

lows. First, in Section 2 we present related work that

focuses on the study of AIG editors. Then, in Sec-

tion 3 we present the editor introduced earlier and de-

scribe the features implemented to make this editor

available to user groups with different levels of tech-

nical expertise and different level of prior knowledge.

In addition, in Section 3.2 we analyze the limitations

of this editor in terms of taking full advantage of the

AIG approach, which we have gathered over several

rounds of evaluation. In Section 4 we present a recon-

ceptualized AIG editor that provides more modeling

freedom and is designed to address the identiﬁed lim-

itations. We analyze this editor in terms of the ex-

tended capabilities of the AIG approach that it can

exploit, and in terms of the guidance functionalities

that have been elaborated to try to make this editor

still accessible to a wide range of users with different

levels of prior knowledge and technical expertise. In

Section 5 we present the evaluation results of both ed-

itors. In Section 6 we discuss the results of the evalu-

ation of both editors, highlighting what can be seen as

advantages or disadvantages depending on the type of

user. Section 7 concludes the paper and summarizes

insights for future research corresponding to model

editors for the AIG approach that can be derived from

our ﬁndings.

2 RELATED WORK

Various related work focuses on making AIG func-

tionality available to end users by providing appro-

priate editors or programming packages. Some rele-

vant approaches are presented below. There are also a

number of commercial solutions, presented in (Christ

et al., 2024), which are not discussed further. The

approaches presented typically focus only on provid-

ing the necessary capabilities to make the AIG ap-

proach available to a focused group of end users who

are occasional users with a low level of technical ex-

pertise. In contrast, this work focuses on investigating

the relationship between two diametrically opposed

AIG modeling approaches, the types of people who

can use AIG, and the exploitable aspects of the AIG

approach. Thus, it focuses on gathering insights on

how to build AIG editors with respect to user groups

and the aspects of AIG approaches that are intended

to be used by these user groups in general.

(Mortimer et al., 2012). presents the Item Genera-

tOR (IGOR) system. IGOR is a web-based tool, an

evolution of the previous monolithic desktop appli-

cation, that can be used by a community to work on

an item model as a basis for automatically generated

items. The item model contains all the information

needed to generate items, such as an item stem, re-

sponse options, auxiliary information and variables to

be adjusted during generation. Based on this model,

much research has been done to optimize and stan-

dardize the generation process, including the devel-

opment of a taxonomy of item model types. With the

advancement of the previous monolithic IGOR appli-

cation, research has moved towards the goal of mak-

ing the AIG process accessible not only to individ-

ual researchers developing AIG methods. However,

the comparison of different AIG modeling approaches

and the investigation of the relationship between the

modeling approach and the type of users who can

make use of the AIG approach, as well as the usable

AIG features, is not the focus of this research.

(Merker et al., 2023). introduced an approach

based on allowing the user to program the tasks or

items to be generated using the Python package Py-

Rope

according to the principle of coding, not click-

ing. PyRope provides pre-built functionality for pro-

viding context, feedback, and sample solutions for

generated tasks or items, as well as randomization

functionality during the generation process. The gen-

erated tasks or items can then be presented to the

learner in a variety of ways, such as a Jupyter note-

book. This approach imposes no restrictions on the

aspects of the AIG approach that can be used, but

it does require the user to have basic programming

skills, although the level of programming skills re-

quired is lowered by assembling the required func-

tionality from the provided package. This is a barrier

to usability for users with low technical expertise.

(Christ et al., 2024). presents the ALADIN tool.

The goal of ALADIN is to automatically generate

learning tasks whose difﬁculty can be conﬁgured ac-

cording to the learner’s prior knowledge, and to pro-

vide a tool that assists the learner in completing the

tasks by providing hints and solutions (Christ et al.,

2022). A user interface is provided for specifying the

generators to be used, based on a user-friendly drag-

and-drop assembly of deﬁned generator elements in

a graph. Thus, although available to different user

groups, the aspects of the AIG approach that are avail-

https://github.com/PyRope-E-Assessment/pyrope,

November 10, 2024

A Comparison of Different Approaches of Model Editors for Automatic Item Generation (AIG)

767

able to all users are limited by the available generator

elements.

In the ﬁrst step of the AIG process, the devel-

opment of a cognitive model, subject matter experts

should elicit their knowledge as a basis for item and

task development. While the described works present

tools that lead to the generation of items, they do not

provide mechanisms to help subject matter experts

perform the ﬁrst step well and in the most support-

ive way. This ﬁrst step can essentially be seen as

a form of knowledge structuring. There are several

forms available that could be used to accomplish such

a task in general, like concept maps, mind maps, and

knowledge graphs, to name a few. (Brade, 2015) has

developed requirements for a tool to assist in mak-

ing an expert’s implicit knowledge explicit, and in

the ﬁrst iterations of structuring knowledge in gen-

eral. These include the ability to freely sketch and

edit, to freely place, rearrange, and categorize graphi-

cal objects, and to allow for temporary structures and

inconsistencies that can be resolved as the user de-

velops the correct structure. A good AIG tool should

meet these requirements in a way that best supports

subject matter experts in formalizing their knowledge

in the ﬁrst step of the AIG process, upon which all

subsequent steps are built.

3 AIG FOR ALL

The AIG Model Editor (AME) was ﬁrst introduced

and described in detail in (Kucharski et al., 2023). It is

a web application

that can be accessed by any inter-

ested user without any special system requirements,

such as the need to install speciﬁc software. The main

editing view of this editor is divided into two areas.

On the left side, the cognitive model (Gierl and Lai,

2013a) can be modeled. A cognitive model in the

AIG Model Editor is represented by multiple graphs

encapsulated in layers (Kucharski et al., 2023). The

sources of information in the cognitive model that de-

pend on each other are represented as nodes in these

graphs. The dependencies between them and their

features are represented by edges. Edges are speci-

ﬁed by the modeler using drag-and-drop functional-

ity in the graphs. There are two types of layers, and

two types of graphs encapsulated within these lay-

ers. First, there is a base layer. The base layer en-

capsulates the cognitive model problem, which is the

central source of information (i.e., the central issue or

topic of the domain being modeled), and the sources

of information that are directly related to the cognitive

model problem. Second, there are several condition

layers. A condition layer encapsulates conditions be-

tween sources of information that are not the problem

in the form of logical implications.

On the right side of the editor, the item model can

be edited. The item model (Gierl and Lai, 2013a) con-

sists of several item templates that deﬁne the type and

appearance of the items to be generated, as well as

how the items should be generated (i.e., which sources

of information should be considered). Once deﬁned,

the right side of the editor can also be used to trigger

the generation process, either for a selected set of item

templates or for all by inserting valid combinations of

features from the sources of information into the item

templates according to the constraints deﬁned in the

cognitive model. The generation process is described

in detail in (Kucharski et al., 2023).

3.1 User Guidance

As described in Section 1, the AIG Model Editor was

developed with the idea of making the AIG approach

available to everyone. In order to do this, the editor

needs to address two issues that are included in the

notion of technical expertise (cf. Figure 1). First, the

editor must clarify what must be speciﬁed to make

the AIG approach work (i.e., which models and how

they interact). Second, the editor must explain how

to use itself, or be self-explanatory so that further ex-

planations are unnecessary. The goal was to imple-

ment both aspects in such a way that users who have

never heard of AIG, users with a lower level of tech-

nical expertise, users who are AIG experts, as well as

computer scientists could use the editor to generate

tasks or items using AIG. The four aspects of the AIG

Model Editor presented in this section have been de-

veloped and reﬁned over several rounds of evaluation

to achieve this goal.

Linear Modeling Approach. The modeling ap-

proach of our editor is implemented through a linear

user interaction ﬂow. In our editor’s implementation

of the AIG approach, we have reduced the number

of times that the user must choose one or more pro-

cessing paths that ultimately lead to the same goal as

proposed by (Baum et al., 2021), where the user has

a choice between a graphical modeling approach and

a textual modeling approach. This avoids the confu-

sion that can result from forcing the user to choose a

processing method in a process they do not yet fully

understand. So it simpliﬁes the modeling process, and

the simpler the modeling process, the easier and faster

it can be understood by a non-technical user, and the

easier it is to create a tutorial to explain that process,

as described below.

The modeling process follows the steps of the AIG

AIG 2025 - Special Session on Automatic Item Generation

768

approach, shown in Figure 2. The editor is designed

to natively guide the user through these steps, avoid-

ing the distractions of unnecessary features by striv-

ing for simplicity. To this end, a big plus as an en-

try point to the modeling process ﬁrst guides the user

to specify the problem and automatically create the

appropriate source of information (cf. (Kucharski et

al., 2023)). Once the user understands where sources

of information need to be speciﬁed, the user can pro-

ceed to specify the additional sources of information

required. Connecting features is then done by drag-

and-drop, which is also supposed to be user friendly

(cf. Section 6). The user can proceed to the item

model on the right once the cognitive model on the

left is complete.

Divide and Conquer Principle. Instead of provid-

ing an overview of all elements in a domain (i.e., all

sources of information in a cognitive model), as pro-

posed by (Baum et al., 2021), we work with multiple

graphs showing only the connections between a re-

duced set of information sources that are related to

a speciﬁc subproblem of the overall cognitive model.

The advantage of a single graph is that it provides an

overview of all sources of information and their rela-

tionships at once. However, when there is more than

one central problem with connected sources of infor-

mation, but connections between multiple sources of

information, special mechanisms are needed to rep-

resent this information in a single graph as types of

edges, as suggested by (Baum et al., 2021). Read-

ing this information requires a higher level of techni-

cal expertise, while it has been found that the simpler

graphs resulting from splitting the graph can be read

by users with a lower technical expertise (Kucharski

et al., 2023).

In addition to the partitioning of the cognitive

model applied by AME, the cognitive model itself

is usually only a piece of the corresponding domain.

Typically, a domain consists of more than one prob-

lem, which are also interrelated as sources of infor-

mation. Although it is not technically required for the

generation process, we enforce the speciﬁcation of a

problem for a cognitive model for the following rea-

son. Requiring a problem helps the user to structure

his or her knowledge by ﬁrst deﬁning a central point

and then gathering information about it, rather than

stumbling into an empty space of possibilities with-

out a clear idea of where to start. This is consistent

with the linear modeling approach described above.

Interactive Tutorial. To get started with AIG mod-

eling, we have built a tutorial into the editor. This

tutorial is automatically offered to new users and can

Figure 3: Welcome dialog of the AME interactive tutorial,

which explains the AIG process and the features of the edi-

tor.

be canceled and repeated at any time. It interactively

guides the user through the creation of an exemplary

model from the ﬁeld of biology to ensure that it can be

understood by everyone. The tutorial explains how to

use the editor, the AIG approach itself, and what vari-

ous actions in the editor are used for in the context of

the AIG process, as shown in Figure 3.

It is therefore aimed at users who are new to the

AIG approach, as well as users with a lower under-

standing how to use the editor technically. Because of

the linear modeling approach described above, this tu-

torial can exhaustively cover the entire modeling pro-

cess, which is the same for each model to be created,

in a reasonable amount of time.

Sample Model. The same sample model created

during the tutorial can also be loaded as a starting

point for creating new custom models. Instead of

starting the modeling process from scratch, the user

can start from a working model or simply modify the

model to see how the generated items differ to learn

how the AIG process works with the editor. The sam-

ple model was designed to include a minimal set of

most of the available features, including a base layer,

some generic features, case-speciﬁc features, and a

condition layer (cf. (Kucharski et al., 2024)).

3.2 Limitations

During the use of the editor by lecturers with experi-

ence in AIG modeling for productive item generation,

some shortcomings of the editor became apparent.

A Comparison of Different Approaches of Model Editors for Automatic Item Generation (AIG)

769

While the editor focuses on modeling a well-

separated, usually bounded cognitive model, the lec-

turers expected to be able to model their entire do-

main. They then wanted to select the relevant parts of

the model by selecting the sources of information in

the item model. Although it is theoretically possible

to model entire domains with the editor, the imple-

mentation of the split representation of information

chosen for the divide and conquer principle is inap-

propriate for this use case. This is because the func-

tionality required for modeling large models quickly

and ﬂexibly is not intended to be supported for rea-

sons of clarity. Speciﬁcally, only one layer can be

viewed at a time, sources of information cannot be

moved between layers, features cannot be moved be-

tween sources of information, and the source of in-

formation used as a problem in the base layer and the

source of information used as a condition in a condi-

tion layer cannot be changed. This makes it inconve-

nient to work with large models with different layers

and thus to represent complete knowledge domains.

However, the ability to work with cognitive models

that represent whole domains would allow the gen-

eration of more complex tasks and items, make AIG

more useful in practice, and improve the reusability

of the models created.

In addition, the editor makes heavy use of dialogs

that enforce a predeﬁned modeling path and force the

modeler to focus on a particular part of the editor

at a time. This simpliﬁes the modeling process for

new users by reducing the amount of information pre-

sented at a time and the number of actions that can

be performed. However, once the user has become

familiar with the editor, this lack of ﬂexibility can be

a disadvantage. The need to open a dialog in order

to perform a certain action can cause the user’s train

of thought to be interrupted while using the editor.

In other words, carefully considered model additions

may be partially forgotten when a dialog is opened.

The reason for this is that only one model addition can

be realized at a time and opening a dialog seems to re-

position the focus by preparing the mind to perform

the action the dialog is for, while clearing thought-up

subsequent model additions. These must be mentally

rebuilt after the dialog is closed as the focus is reposi-

tioned, now back on the editor interface. These prob-

lems lead to the fact that the requirements developed

by (Brade, 2015) are not being fulﬁlled, in particu-

lar the ability to freely edit, place, and rearrange as

well as the possibility to resolve temporary inconsis-

tencies step by step. However, addressing these is-

sues would make the AIG process more attractive to

advanced users and more convenient for the rapid cre-

ation of large models.

Finally, the decision to minimize the conﬁguration

options of the AIG process and instead force a con-

crete modeling path for the sake of clarity also limits

the exploitable features of the AIG process and pre-

vents the optimization of the path to the ﬁnal items

based on the modeler’s experience. For example, con-

ﬁguration options to adjust the grammar (e.g., arti-

cles) based on certain components of the items are

not supported by this editor.

4 OPTIMIZING THE USE OF AIG

WITH MORE MODELING

FREEDOM

To address the identiﬁed limitations, a new version of

the AIG Model Editor

was designed, implemented,

and evaluated. In planning this new release, it be-

came clear that providing greater modeling ﬂexibil-

ity especially for advanced users requires more user

guidance during the creation of an AIG model for ﬁrst

time users. Section 4.1 presents the new concept and

Section 4.2 new features to guide the user.

4.1 Concept of the Revised AIG Editor

With the requirement to be able to modify the parts of

a model more easily, the concept of everything being

a node was introduced. This means that each model

component is a node on a large canvas (Figure 4) that

can be easily moved and rearranged as needed during

the modeling process. It also generalizes all model

components through a common base data structure

and allows for easy expansion in the future. There are

10 kinds of nodes: features, sources of information

(SOIs), and layers for the cognitive model; item tem-

plates; item selections (cf. (Kucharski et al., 2024));

item generation runs; lists for item templates, item

selections, and item generation runs; and a list-based

model description.

Feature nodes only have a label and can be tagged

to represent false features. Links can be used to con-

nect features to model constraints. SOI nodes have

a label, are of a speciﬁc kind, and can have Feature

nodes attached to them. The available kinds of SOI

nodes are Source of Information, Problem, Condition

and Conclusion, which are based on what a source of

information node represented in the previous editor.

Layer nodes are also of a certain kind (Base Layer or

Condition Layer), which also corresponds to their use

in the previous editor. What is different is that they

are essentially a rectangle that implies a group of SOI

https://ame-v2.aig4all.org, January 16, 2025

AIG 2025 - Special Session on Automatic Item Generation

770

Figure 4: Screenshot of the second version of AME. It shows two base layers colored green, a condition layer colored orange,

two generic SOI nodes outside of the layers, a multiple choice item template in dark blue (top-right) and an item generation

run node in a lighter blue (bottom-right) listing generated items for the multiple choice item template. Within the editor,

“depth” is used as a synonym for “case-speciﬁc” and “surface” as a synonym for “generic” with regard to the function of

features and SOIs.

nodes when it encloses them. While each node can

be moved and arranged as needed during modeling,

a ﬁnal model structure is expected for item genera-

tion. Therefore, base layers are expected to contain

one or more Source of Information SOI nodes and one

Problem SOI node. Condition layers are expected to

contain one or more Condition SOI nodes and one or

more Conclusion SOI nodes. Source of Information

SOI nodes may also be placed outside of any layer. In

this case, they are expected to have features without

any links as they represent generic features.

Item Template nodes and Item Selection nodes are

similar to the previous editor in terms of input ﬁelds.

The difference is that item templates have an addi-

tional ﬁeld to specify their cognitive basis, i.e. a selec-

tion of layers and generic sources of information that

should be used exclusively for the generation of items

based on the given item template. In other words, a

model can have many layers and generic sources of

information, but only a subset of them can be used

during a generation for an item template as the user

chooses.

Item templates and item selections are now also

placed on the canvas. Each of them has a play button

to generate items. These are displayed after genera-

tion within the Item Generation Run nodes. Because

the canvas can get cluttered with many of these nodes,

each can be closed and reopened via a corresponding

list node for each node kind.

While the graphical representation of a model may

be easier to use in some situations, a more textual rep-

resentation may be more valuable in others. There-

fore, List Based Model Description nodes have been

added that allow to specify SOI nodes and their fea-

tures, as well as layers and their SOI nodes, in a more

textual list form. Once the model components have

been speciﬁed, the textual representation can be used

to generate graphical template nodes, so that a user

only needs to add links as needed, and can rearrange

and move nodes as the initial idea evolves.

4.2 User Guidance

Three new user guidance features have been intro-

duced. The ﬁrst one is the User Guidance Center (Fig-

ure 5) with resources to educate the user. These are a

video giving an introduction to AIG in general and a

video providing a tutorial on how to create an AIG

A Comparison of Different Approaches of Model Editors for Automatic Item Generation (AIG)

771

Figure 5: Screenshot of the User Guidance Center dash-

board overlaid on the editor surface.

model with the editor. It also offers the possibility

to start from a sample model. By integrating these re-

sources directly into the editor, users can quickly refer

back to them when needed, as they are only a few, if

not one, click away.

The second user guidance feature is the User

Guidance Graph (Figure 6). It shows all the steps

involved in creating items from starting to create a

new model to using generated items. Its purpose is to

give users a quick overview about all the steps so they

can quickly evaluate what they have already done and

what is still to do until being able to ﬁnally use items.

The visibility of the User Guidance Graph can be tog-

gled so that the user can see it when needed, or not.

Each step can also be clicked. This will open the tuto-

rial video in the user guidance center and jump to the

timestamp of that step. This allows users to quickly

review how to perform the step if they have forgotten

how to do it.

The third new feature is near real-time feedback.

Inspired by how syntax errors are shown in program-

ming editors like VS Code for programming lan-

guages like TypeScript, the editor validates the cur-

rent model in the background every two seconds. If

errors are found, they are displayed at the location in

the model where they occurred. Such a location can

be a link or a speciﬁc node like a SOI node, or an

Item Template node, or, more speciﬁcally, a particu-

lar input ﬁeld such as the one for the question of an

item template. While hovering an error indicator, the

error message is presented containing an explanation

of the error as well as a suggestion for a ﬁx, which

can be applied by clicking a button if programmati-

cally possible. This functionality is intended to re-

duce the effort a user would have to ﬁgure out why an

item generation failed and how to ﬁx the error. It is

also a tool to guide the user to arrive at the expected

ﬁnal model structure, while allowing rapid restructur-

ing of individual parts of the model. In addition, it is

expected that seeing, understanding, and resolving er-

rors in this way should further the user’s understand-

ing of the expected ﬁnal model structure, as well as

what each node represents and how it is intended to

be used. This is especially important for ﬁrst time

users who haven’t yet internalized the knowledge of

AIG.

5 EVALUATIONS

Both editors were evaluated individually. The focus

lied on how easy they are to use by ﬁrst time users

who are new to the editor and potentially new to AIG

too. As a quantitative measure, the System Usability

Scale (SUS) developed by (Brooke, 1996) was used in

both cases. It is a questionnaire containing 10 items

with a 5-point Likert scale, which are worded posi-

tively and negatively in alternating order, each consid-

ering a different aspect of usability. A score between

0 and 100 can be obtained by calculation. (Sauro,

2011) has determined that a score above 68 would in-

dicate a user-friendly system on average. In addition

to the SUS score, qualitative feedback was collected

through additional questions and observation of par-

ticipants during the tests.

5.1 Evaluation of First Editor Version

The test procedure and the quantitative results of the

evaluation of the ﬁrst version of the editor have al-

ready been presented in (Kucharski et al., 2024). 12

people from different ﬁelds participated in the eval-

uation. They were ﬁrst introduced to the editor both

orally and through the in-editor tutorial, then had to

create an AIG model on their own, generate items

with it, and ﬁnally give their feedback through a ques-

tionnaire. The editor achieved an average SUS score

of 81, which is above the threshold of 68 and indi-

cates usability between good and excellent (Bangor

et al., 2009). In terms of qualitative feedback, the par-

ticipants expressed that the interactive tutorial made

it easy for them to learn the features of the editors,

so that after the brief introduction to the AIG process,

they were able to quickly create their own models and

generate their own items, even though they were all

new to AIG. However, as they began to model more

complex scenarios, some of them complained about

a lack of ﬂexibility in the item model, which made

it difﬁcult for them to use the AIG approach effec-

tively. The reason was that in the example they chose

to model, the features of the sources of information

all required different articles (in German language).

Due to the principle of purism, to make the editor as

easy to learn as possible and thus accessible to every-

AIG 2025 - Special Session on Automatic Item Generation

772

Figure 6: Screenshot of the User Guidance Graph.

one, the editor does not support the conﬁguration of

variations corresponding to tense, case, or other lin-

guistic properties. This made it difﬁcult to use the

generation process effectively in the chosen use case

and still generate linguistically correct sentences.

5.2 Evaluation of Second Editor Version

The evaluation of the rebuild editor was slightly dif-

ferent. A total of nine people participated in the eval-

uation. They had different experiences with AIG in

the past, including none, research in the ﬁeld, par-

ticipation in a previous evaluation of an AIG model

editor, or having used an AIG model editor in some

other way. The procedure was chosen to best re-

ﬂect a real ﬁrst user experience, where a new user

would both discover the editor and create an AIG

model completely on their own. Therefore, partic-

ipants were not given a verbal introduction, but in-

stead were instructed to watch the two introductory

videos in the User Guidance Center. They were then

given a schematic description of a cognitive model

and an item model similar to the visualization cho-

sen by (Gierl et al., 2012). They were instructed to

model them in the editor. After that, an item se-

lection should be created and employed to generate

items. This whole process was also shown in the tu-

torial video. Finally, the participants were asked to

extend the model on their own and to give their feed-

back via a questionnaire. The evaluation took place

via video conference, during which the participants

were observed both verbally and on their screen with

the editor open. Participants only received verbal help

if they got stuck and could not get out on their own.

The rebuild editor reached a SUS score of 66,

which is just below the threshold of 68 and indicates a

usability between ok and good (Bangor et al., 2009).

Regarding the editor concept, eight out of nine partic-

ipants agreed or strongly agreed that they liked the

concept of everything being a node in one canvas

(Figure 7). The User Guidance Center was also per-

ceived to be very helpful. Participants were less clear

about the helpfulness of the User Guidance Graph and

the near real-time feedback. One explanation for this

is that these features were not well integrated into the

editor or that their value was not clearly communi-

cated. On the other hand, it is plausible that they were

not needed during the evaluation because the focus

was on ﬁrst time users and a predeﬁned model had to

be realized. Their value may only become apparent

when an AIG model has to be created for a topic cho-

sen by the user alone, as it would be the case in the

real world.

In terms of user guidance and user education, all

participants agreed or strongly agreed that they un-

derstood the concept of AIG, but less agreed with

statements about the understanding of how individ-

ual model components or parts are realized within the

editor. This was especially true for the different sub-

models (i.e., cognitive models and item models) and

their relationship, the purpose of layers, why there is

not one graph but multiple, and the purpose of surface

SOIs.

As participants were observed during the test, this

problem was also present in the way the participants

solved the tasks. They have re-watched parts of the tu-

torial video while modeling, with some jumping back

and forth between re-watching and modeling. Also,

the videos were sped up twice by default. The inten-

tion was to make them more compact, as the AIG in-

troduction video was originally about 5 min 30 s long,

and the tutorial video about 27 min long. This turned

out to be a false intention, as all participants expressed

that they were too fast by default. Although there

were controls to slow them down, the audio quality

deteriorated in this case because the audio tracks were

not prepared for it.

Another challenge was deciding what kind of

layer to add. The video mentioned when to use which

layer. However, it did not explain how to determine

this based on the graphical model description given to

the participants. This was also the case when condi-

A Comparison of Different Approaches of Model Editors for Automatic Item Generation (AIG)

773

Figure 7: Results of the evaluation of the second editor version.

tions needed to be modeled. Besides, keeping track of

the impact of a new connection on all the conditions

already modeled in a condition layer was seen as an

additional difﬁculty.

Overall, a subset of participants felt well informed

and supported by both the tutorial video and the user

guidance graph about what steps to take and in what

order. Other participants felt overwhelmed from the

beginning or at times. They cited the amount of in-

formation contained in the videos, uncertainty about

what step they were in, and the need to know the AIG

approach as reasons for this perception. To mitigate

these reasons, recommendations were made such as

more clearly indicating the current step of the AIG

process, reducing the number of possible actions to

those required for the step, and better explaining when

to use which kind of SOI or layer.

6 DISCUSSION

The evaluations showed that there are big differences

between the two editor versions. The ﬁrst version

achieved a SUS score of 81 and was therefore per-

ceived as good to excellent to use. It was expected

that it would be a challenge for the second version of

the editor to reach this score, especially since there

are fewer restrictions. The evaluation conﬁrmed this

expectation, revealing a SUS score of only 66, just be-

low the threshold of 68. It showed that while the con-

cept of everything being a node was generally liked,

testers felt overwhelmed, e.g. by the amount of new

information they had to comprehend, or were some-

times not sure what action to take next. While it

should be noted that the process of both evaluations

was different, it is still possible to draw a number of

lessons from both evaluations.

For the ﬁrst editor, it was decided to structure

the editor in a way that encourages users to follow

the predeﬁned order of the steps of the AIG process.

This order is also communicated throughout the tu-

torial. This means that ﬁrst time users can comfort-

ably work through a sequence of outlined steps and

get to know the AIG process as well as the editor. In

this way, the editor enforces that the model is always

in a valid state and ready to generate items. Conse-

quently, novice users of the editor or the AIG pro-

cess in general can rapidly initiate the process, as the

scope for errors is minimized, akin to the provision

of pre-built functional blocks by ALADIN (Christ et

al., 2022). Where such an approach works less effec-

tively is when users want to deviate from the prede-

ﬁned functional blocks or sequence of steps, perform

steps only halfway, leaving the model in an invalid

state, and ﬁnish them non-linearly as ideas for extend-

ing the model emerge.

For the second editor, it was decided to general-

ize all model components in the form of nodes that

can be arranged and moved around as needed. This

allows for editor states that are invalid with respect to

the ﬁnal expected model structure. It supports users

to more easily deviate from the linearity of the AIG

process and build a model more non-linearly, espe-

cially in the case of adding and moving features be-

tween sources of information, sources of information

between layers, and changing the kind of a SOI (i.e.

Problem, Source of Information, Condition, and Con-

clusion). This correlates with the freedom provided

by PyRope (Merker et al., 2023), which comes from

the ability to extend the generation functionality with

custom programming code. As a disadvantage of

more modeling freedom, the evaluation of the second

editor emphasized that without sufﬁcient guidance,

especially ﬁrst time users can become overwhelmed,

not knowing what to do next or what is left to do.

Another aspect by which the two editors can be

compared is the way in which they support cogni-

tive model content to be used for generation. The

AIG 2025 - Special Session on Automatic Item Generation

774

ﬁrst editor version allows many different sources of

information to be used in many different layers. But

for the item generation, all available sources of in-

formation and layers make up the whole cognitive

model. It is not possible to exclude certain layers

or generic sources of information from the genera-

tion. To achieve this behavior, they would have to be

deleted. The second editor version has changed this

by allowing users to specify within the cognitive basis

of an item template which layers and generic sources

of information should be considered when generating

items based on it.

This enables two additional editor features, espe-

cially for advanced users. First, a broader knowledge

domain can be modeled within one model, but selec-

tively used for generation. Users can create multi-

ple different layers about different problems and if-

then relationships. But for a given item template, they

can choose on which layers and generic sources of in-

formation items generated by the template should be

based on, without having to delete all the others or to

create multiple models. Second, having everything in

one model should also facilitate the reuse, combina-

tion, and remix of existing model components, further

expanding the knowledge domain represented. What

this means is that while the ﬁrst version of the edi-

tor encourages having multiple smaller task-speciﬁc

models, the second version of the editor allows espe-

cially advanced users to create larger domain-speciﬁc

models that essentially encompass multiple smaller

task-speciﬁc models.

7 CONCLUSION AND OUTLOOK

In this paper, we presented two versions of an AIG

model editor, both with different user personas in

mind. The ﬁrst version focuses on providing a clear

path for ﬁrst time users to create an AIG model. It

provides users with an interactive tutorial and a sam-

ple model to get them started quickly. The second

version re-conceptualizes the editor with the goal of

giving more capabilities especially to advanced users.

It introduces the concept of everything being a node

that can be freely moved and arranged on one large

canvas. It also adds user guidance features such as

near real-time feedback, the User Guidance Center,

and the User Guidance Graph to provide guardrails

for the additional freedom.

Both editors were evaluated individually using

slightly different methods. The ﬁrst editor version

achieved a SUS score of 81 and the second version

a score of 66. While the scores are not directly com-

parable due to the different evaluation procedures, the

qualitative feedback collected reveals the themes of

the problems with each approach. Participants who

were new to AIG and evaluated the ﬁrst editor ver-

sion, were able to quickly create an AIG model. The

restrictive nature of the editor only became apparent

as they moved beyond the initial learning phase and

attempted to model more complex scenarios.

Participants in the second evaluation were also

able to successfully create an AIG model and gener-

ate items, but there was a greater learning curve. This

was attributed to the amount of information contained

in the introductory videos, uncertainty about which

current AIG process step they were in, and the need

to know the AIG approach. As a mitigation, they sug-

gested that more guidance was needed.

All in all, the main questions that follow from this

work are: if there are different types of users with dif-

ferent types of needs and goals, sometimes in conﬂict,

should they all be accommodated within a single “su-

per” AIG model editor, and if so, how? Or should

there be a pool of multiple, more specialized editors

from which users can choose?

Our current view is that there is a case for both.

However, what seems to be an interesting step to in-

vestigate next is an editor for AIG model creation that

would combine both worlds of the presented editors.

This could look like this.

What the decisions made during the development

of both editors amount to is a set of restrictions for

each editor. From a technical point of view, it is ar-

gued that the second version is less restrictive than

the ﬁrst, but could be limited to the ﬁrst through sim-

ulation. For such a simulation, the current guidance

and modeling needs of the user need to be determined.

Based on this, a mechanism could dynamically adapt

the restrictions to be enforced by the editor when the

user needs either more guidance or more freedom in

modeling. The nature and feasibility of such a mech-

anism should be investigated by future work.

To accomplish this, the use of large language mod-

els as reasoning engines that could decide about the

user’s current guidance and modeling needs and sug-

gest and/or apply adjustments seems like a promising

possibility that should be explored.

REFERENCES

Bangor, A., Kortum, P., & Miller, J. (2009). De-

termining what individual SUS scores mean:

Adding an adjective rating scale. Journal of Us-

ability Studies, 4(3), 114–123.

Baum, H., Damnik, G., Gierl, M. & Braun, I. (2021).

A Shift in automatic Item Generation towards

A Comparison of Different Approaches of Model Editors for Automatic Item Generation (AIG)

775

more complex Tasks, INTED2021 Proceedings,

pp. 3235-3241.

Brade, M. (2015). Visualization methods for the inter-

active acquisition and structuring of information

in the context of free-form knowledge model-

ing [Visualisierungsmethoden f

ur das interaktive

Erfassen und Strukturieren von Informationen

im Kontext der Freiform-Wissensmodellierung].

Doctoral Dissertation, TUD Dresden University

of Technology.

Brooke, J. B. (1996). SUS: A ’Quick and Dirty’ Us-

ability Scale.

Christ, P., Laue, R., & Munkelt, T. (2022). ALADIN

– Generator for Tasks and Solution (Hints) in

Computer Science and Related Fields [ALADIN

– Generator f

ur Aufgaben und L

osung(shilf)en

aus der Informatik und angrenzenden Diszi-

plinen].

Christ, P. L., Munkelt, T., & Haake, J. M.

(2024). An Authoring Tool for the Graph-

ical Conﬁguration of Item Generators

[Ein Autorenwerkzeug zur graﬁschen

Konﬁguration von Aufgabengeneratoren].

https://doi.org/10.13140/RG.2.2.33616.11528

Damnik, G., Gierl, M., Proske, A., K

orndle, H., &

Narciss, S. (2018). Automatic Item Generation

as a Means to Increase Interactivity and Adap-

tivity in Digital Learning Resources [Automa-

tische Erzeugung von Aufgaben als Mittel zur

Erh

ohung von Interaktivit

at und Adaptivit

at in

digitalen Lernressourcen]. In E-Learning Sym-

posium 2018 (pp. 5-16). Universit

atsverlag Pots-

dam.

Embretson, S. E., & Yang, X. (2007). Automatic item

generation and cognitive psychology. In C. R.

Rao & S. Sinharay (Eds.), Handbook of statis-

tics: Psychometrics, Volume 26 (pp. 747–768).

Amsterdam, The Netherlands: Elsevier.

Gierl, M. J., Lai, H., & Turner, S. (2012). Using

automatic item generation to create multiple-

choice items for assessments in medical educa-

tion. Medical Education,46, 757–765.

Gierl, M. J., & Haladyna, T. M. (Eds.). (2013). Au-

tomatic item generation: Theory and practice.

Routledge.

Gierl, M. J., & Lai, H. (2013a). Evaluating the qual-

ity of medical multiple-choice items created with

automated processes. Medical education, 47(7),

726-733.

Gierl, M. J., & Lai, H. (2016). Automatic item gener-

ation. In S. Lane, M. R. Raymond, & T.M. Hala-

dyna (Eds.), Handbook of test development (2nd

ed., pp. 410–429). New York, NY: Routledge.

Kosh, A. E., Simpson, M. A., Bickel, L., Kellogg,

M., & Sanford-Moore, E. (2019). A cost–beneﬁt

analysis of automatic item generation. Educa-

tional Measurement: Issues and Practice, 38(1),

48-53.

Krathwohl, D. R. (2002). A Revision Bloom’s Taxon-

omy: An Overview. Theory into Practice, 41(4),

212-218.

Kucharski, S., Damnik, G., Stahr, F., & Braun, I.

(2023). Revision of the AIG Software Toolkit:

A Contribute to More User Friendliness and

Algorithmic Efﬁciency. In J. Jovanovic, I.-A.

Chounta, J. Uhomoibhi, & B. McLaren: Pro-

ceedings of the 15th International Conference

on Computer Supported Education - Volume 2:

CSEDU. SciTePress, pages 410-417.

Kucharski, S., Stahr, F., Braun, I. & Damnik, G.

(2024). Overcoming Student Passivity with Au-

tomatic Item Generation. In O. Poquet, A.

Ortega-Arranz, O. Viberg, I.-A. Chounta, B.

McLaren and J. Jovanovic: Proceedings of the

16th International Conference on Computer Sup-

ported Education. SciTePress, pages 789 - 798.

Merker, J., Hain, H., Sch

obel, K., & Brassel,

P. (2023). 6. E-Assessment in STEM Fields:

Coding Exercises with Python & Jupyter [E-

Assessment in MINT-F

achern: Coden von

Ubungsaufgaben mit Python & Jupyter]. Dig-

itale Lehre im Rahmen der Grundlagenausbil-

dung in MINT-F

achern an Hochschulen, 96.

Mortimer, T., Stroulia, E., & Yazdchi, M. V. (2012).

IGOR: A Web-Based Automatic Item Genera-

tion Tool. In Automatic Item Generation (pp.

217-230). Routledge.

Proske, A., K

orndle, H. & Narciss, S. (2012). Inter-

active learning tasks. In N. M. Seel (Ed.), Ency-

clopedia of the Sciences of Learning (pp. 1606-

1610). New York: Springer.

Sauro, J. (2011). Measuring usability with the Sys-

tem Usability Scale (SUS). Retrieved from

https://measuringu.com/sus/. Last accessed: Jan-

uary 13, 2025.

Wancham, K., Tangdhanakanond, K., & Kan-

janawasee, S. (2023). Development of the auto-

matic item generation system for the diagnosis

of misconceptions about force and laws of mo-

tion. Eurasia Journal of Mathematics, Science

and Technology Education, 19(6), em2282.

AIG 2025 - Special Session on Automatic Item Generation

776