Automatic Item Generation Integrated into the E-Assessment-System

JACK

Michael Striewe

Trier University of Applied Sciences, Department of Computer Science, Trier, Germany

Keywords:

Automatic Item Generation, E-Assessment System, Educational Technology Engineering.

Abstract:

Automatic item generation (AIG) can save time in the production of high-quality assessment items, but re-

quires to create and maintain appropriate software tools that ﬁt into a larger context in which the generated

items are to be used. Hence, sustainable AIG solutions not only require sophisticated item generation capa-

bilities, but also appropriate software design. This paper presents a concept for AIG that is integrated into an

e-assessment system and promotes reusability and extensibility as its main software quality properties. The

paper demonstrates the practicality of the concept and discusses the underlying software structure.

1 INTRODUCTION

Automatic item generation (AIG) is well-known for

saving the time required to produce high-quality as-

sessment items with deﬁned properties, but it comes

at the cost of time required for item modeling and tool

development (Kosh et al., 2019). Item modeling is

an integral part of a particular approach to automatic

item generation (Gierl and Lai, 2012a) and thus can-

not be eliminated if that approach is to be used. In-

stead, there is a chance to reduce the need for tool

development by proper software engineering with a

focus on reusability and integration. However, there

are many single-purpose tools that are not speciﬁ-

cally designed to be reusable or integrated. Never-

theless, using such a tool in practice sometimes “re-

quires [...] digital ecosystem around the tool” (Kıyak

and Kononowicz, 2024), which is a strong argument

to not only focus on generation capabilities but also

on software design quality in automatic item genera-

tion.

Reusability is improved with universal AIG tools

like IGOR (Mortimer et al., 2012), which is still a

standalone tool. Another universal AIG tool is CAFA

(Choi and Zhang, 2019), which is not only reusable

but also offers integration as it can serve as a platform

that can be used by different client systems, such as

item management systems or interactive workbooks

(Choi et al., 2018).

In addition to reusability and integration, the ex-

https://orcid.org/0000-0001-8866-6971

tensibility of item generators is an additional concern.

The current state-of-the-art in automatic item genera-

tion knows many different approaches, although it is

dominated by “template or rule-based approaches as

the primary method for creating item models” (Circi

et al., 2023). These may be somewhat limited with re-

spect to the complexity of the generated items (Baum

et al., 2021). The reason may be that existing tools

are not well designed to be extended by emerging ap-

proaches that can help to add complexity. This in-

evitable leads to situations in which new tools are

designed (rather than existing ones are extended) as

soon as an important delta between existing methods

and the speciﬁc requirements in a particular domain

is detected (Christ et al., 2024).

Extensibility is improved with AIG frameworks

like SARAC (Liu, 2009), which uses individual com-

ponents for the parameterizable elements that occur

within an item model.

This paper presents an entirely different approach,

in which item generation features have been added to

an existing e-assessment system. In this way, auto-

matic item generation is integrated directly into a tool

that actually uses the items, reusable across all items

created in that tool in any domain of study, and ex-

tensible with clearly deﬁned interfaces to add more

functionality in the future.

The remainder of this paper is organized as fol-

lows: Section 2 describes the concept of item gener-

ation used within the system. Section 3 provides two

examples on how to use the concept in practice. Sec-

tion 4 explains the software structure that is used to

Striewe, M.

Automatic Item Generation Integrated into the E-Assessment-System JACK.

DOI: 10.5220/0013454600003932

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 17th International Conference on Computer Supported Education (CSEDU 2025) - Volume 1, pages 747-753

ISBN: 978-989-758-746-7; ISSN: 2184-5026

747

implement the concept. The section also explains the

software interfaces that allow to extend item genera-

tion. Section 5 discusses limitations of the approach,

and Section 6 concludes the paper.

2 CONCEPT OF ITEM

GENERATION

The e-assessment system JACK is a web-based,

general-purpose assessment system that can be used

in virtually any domain of study (Striewe, 2016).

Thus, its item types are not limited to those that are

typically used with automatic item generation, but

range from universal types like multiple choice and

ﬁll-in-the-blanks to domain-speciﬁc types that ask

students to write program code or draw molecules.

The system does not provide a ﬁxed set of items or

item templates, but includes authoring features that

allow teachers to deﬁne their own assessment items.

In addition, actual assessment items can consist of

several parts, where each part may be of a different

type and where the sequence of parts is determined

adaptively based on previous student input. For most

item types, automatic grading and feedback genera-

tion are available. In fact, the ability to provide de-

tailed and elaborated feedback for complex item types

is one of JACK’s most prominent features. Again,

the system does not do so based on ﬁxed algorithms

in most cases but provides authoring features so that

teachers can prepare detailed feedback deﬁnitions.

Consequently, the concept of item generation in

JACK had to fulﬁll several requirements:

1. Item generation must happen online at runtime

each time a student starts to interact with a new

part of an assessment item.

2. Item generation must be based on functions that

are independent of the actual type of the gener-

ated items so that it can be used with any type,

including types that may be added in the future.

3. Item generation must cover both the item contents

shown to students and the internal parts of an item

that are used during automated grading and feed-

back generation.

Based on these requirements, JACK uses a

template-based approach by introducing so-called

item variables. For each assessment item (that may

consist of several parts as mentioned above), authors

can deﬁne an arbitrary number of such variables.

Each variable deﬁnition must include a so-called eval-

uator function that determines its value. These func-

tions may refer to other variables, so that constraint or

dependent variables are possible. The actual contents

of item variables are typically numbers or strings, but

may also be complex mathematical formulas. In ad-

dition to item variables, authors may also deﬁne so-

called dynamic objects that will render graphical ele-

ments in the ﬁnal output such as mathematical graphs

or chemical molecules. The method for deﬁning dy-

namic objects is less generic than the evaluator func-

tions for item variables, but may still refer to item

variables, so that dynamic objects can be constraint

and dependent as well. Finally, authors can use place-

holders in virtually any part of their item deﬁnition

(i. e. in all texts prepared for students and in all rules

for grading and feedback generation) to turn an item

into a templates from which different instances can be

derived at runtime. In summary, the only pre-deﬁned

parts provided by the e-assessment system are gen-

eral item types, types of dynamic objects, and evalua-

tor functions for the generation of variable values, but

no pre-deﬁned item templates, item models or other

kinds of structures. Authors are free to combine any

of these provided parts in any way they like to de-

ﬁne item templates with placeholders that are ﬁlled

by variable values. Authors are also free to deﬁne ar-

bitrary complex combinations of functions to deﬁne

variable values.

Notably, it is left to the item authors to decide if

they apply a strong theory approach (Gierl and Lai,

2012b) and derive the item variables from an item

model they manage externally, or if they apply a weak

theory approach and just turn a ﬁxed item in JACK

into a parent item by adding some variables. Since it

is not mandatory to use each and every item variable

for at least one placeholder, it is also possible to deﬁne

a set of item variables that actually encode a complex

item model directly in JACK. Whether this is actually

meaningful depends on the complexity of the model

and the availability of appropriate evaluator functions

to express the relations between the model elements.

3 EXAMPLES

This section illustrates the concept of item generation

by two examples. Both examples involve dynamic

objects of different kind. The ﬁrst example is a ﬁll-

in-the-blanks item that includes a dynamic object and

involves automatic grading. The second example is

a simple multiple choice item that also features a dy-

namic object and automatic grading, but additionally

involves the use of external data sources during item

generation.

AIG 2025 - Special Session on Automatic Item Generation

748

Figure 1: Student view on a sample instance of an item on

the forms of parabola equations.

3.1 Example 1: Parabola Curves

The ﬁrst example comes from mathematics education.

The item is a ﬁll-in-the-blanks item and asks students

to convert a parabola equation from standard form to

vertex form (see Figure 1). The item template is di-

vided into two parts to cover both the item presenta-

tion and the automated grading. Both parts of the tem-

plate refer to item variables. For item presentation,

authors deﬁne the item stem which includes not only

references to item variables, but also a placeholder for

the curve plot as well as three input ﬁelds (see Figure

2). Authors can decide to use L

X formatting for the

formula and can equip the input ﬁelds with a formula

editor, so that students can enter rational numbers and

alike in a convenient manner. For automated grading,

authors specify rules that refer to the input ﬁelds and

compare their values to expected values that may be

ﬁxed or stored in item variables (see Figure 3 for an

example with one rule for the correct answer). Note

that rules can refer to any input ﬁelds they like, so

that there can be more than one correct solution per

input ﬁeld. Similarly, there can be multiple rules re-

ferring to the same input ﬁelds, resulting in elaborate

feedback for different (wrong) inputs.

The complete list of item variable deﬁnitions is

given in Figure 4. The variables are based on a math-

ematical model that makes sure that the equation will

not get trivial but also not too hard to calculate. Con-

sequently, the ﬁrst four variables (“az”, “an”, “d”, and

“e”) make random selections from carefully selected

value ranges. Note that the value 0 is excluded in all

cases, since it could cause invalid results. The next

nine variables all compute derived values based on

the previous variables. All of these variables are used

internally either to prepare the curve plot (see Fig-

ure 5) or the rules for grading and feedback genera-

tion. In particular, some of the variables capture typ-

ical miscalculations students can make, so that stu-

Figure 2: Authoring view on the item template for the item

shown in Figure 1.

Figure 3: Authoring view on one of the feedback rules for

the item shown in Figure 1.

dent input can be compared to these values as well

to generate speciﬁc feedback for these cases. The ﬁ-

nal four variables (“output1” to “output4”) are used

within the item template to display information to stu-

dents. They make use of a special evaluator function

“rational” which makes sure that its parameters will

be printed as rational number and not as decimal num-

ber.

3.2 Example 2: Molecules

The second examples comes from chemistry educa-

tion. The item asks students to indicate the correct

name for a given chemical compound (see Figure 6).

The corresponding template is quite simple with an

item stem and four answer options (see Figure 7). The

item stem includes a placeholder referring to a dy-

namic object of type “molecule” and all four answer

options refer to item variables.

Automatic Item Generation Integrated into the E-Assessment-System JACK

749

Figure 4: Authoring view on the item variables used in con-

junction with the item template in Figure 2 and Figure 3.

Figure 5: Authoring view on the dynamic object that ren-

ders the parabola in Figure 1.

The complete list of item variable deﬁnitions is

given in Figure 8. It starts with drawing a random

value for variable “size” that will roughly determine

the complexity of the molecules that will appear in the

item. Note that this is a simple heuristic that does not

match academic standards in item writing, but is used

here just for the sake of brevity. The main complex-

Figure 6: Student view on a sample instance of an item on

chemical compounds.

Figure 7: Authoring view on the item template for the item

shown in Figure 6.

ity of the item generation is encoded in the second

variable “tuples” that makes use of a call to an ex-

ternal data source. Precisely, a SPARQL query is is-

sued against the Wikidata database to retrieve chemi-

cal compounds, their names, and their structural iden-

tiﬁers. Using SPARQL queries to generate items has

been explored earlier (Foulonneau and Ras, 2013) and

is thus not a unique feature of automatic item genera-

tion with JACK.

The remaining variables read four random com-

pounds from the query result (“c1” to “c4”), extract

their names (“p1” to “p4”) as well as the structural

identiﬁer for one of them in variable “n1”. The lat-

ter is used subsequently in the deﬁnition of a dynamic

object that turns the identiﬁer into its graphical repre-

sentation (see Figure 9).

A similar visual result can be achieved by using

image links that are available in Wikidata as well and

that link to images of chemical compounds that are

located at Wikimedia Commons. However, that has

several major drawbacks: First, image links are ul-

timately resolved by the client browser and thus in-

dependent of the e-assessment system. Any failure

in resolving the links may result in an incomplete

item on the client side without any chance for the e-

assessment system to notice. With dynamic objects,

the whole output is prepared by the e-assessment sys-

tem and any failure in rendering the output can be

logged. Second, Wikimedia may block large num-

bers of requests for any reason, especially if they ap-

pear within a short period of time from a single IP

location, as it would be the case in exam situations.

With dynamic objects, no third-party server is in-

volved that can block requests. Third, images in Wiki-

media Commons may or may not show the molecule

AIG 2025 - Special Session on Automatic Item Generation

750

Figure 8: Authoring view on the item variables used in con-

junction with the item template in Figure 7.

Figure 9: Authoring view on the dynamic object that ren-

ders the graphical representation of the molecule in Figure

in the expected notation. With dynamic objects, item

authors can be sure that all molecules will be shown in

the same notation. Finally, ﬁles loaded from Wikime-

dia or any third-party server may have ﬁle names that

reveal solutions to students how are able to inspect the

ﬁle names with their browser tools. With dynamic ob-

jects, molecules and alike is rendered directly within

the ﬁnal output and no ﬁle names can be seen.

4 SOFTWARE STRUCTURE

Item variables and dynamic objects are handled dif-

ferently within the e-assessment system for technical

reasons, although both can ultimately be used to ﬁll

placeholders in an item template.

4.1 Data Types

There is only one generic data type for item variables

that holds the variable name and the corresponding

evaluator function. The latter is a string following

some expression language that allows to make ref-

erences to function names, variable names and con-

stants. At runtime, these strings are passed to a dedi-

cated evaluator component that parses the string, calls

the required functions and returns the resulting value.

Dynamic objects are represented by individual

data types due to their larger complexity. While they

share a name, a width, and a height as common at-

tributes, their individual content deﬁnition can be very

different.

4.2 Template Instantiation Process

When a student starts to interact with an assessment

item, a new instance of an object representing that

item is created (even if it is a static item where no item

generation takes place). Then, the evaluator functions

for all variables are called, and the resulting values

are stored within the object representing the item. The

object is persisted, so all values are available for later

manual inspection if necessary.

Then, the item contents that need to be presented

to the student are looked up in the item deﬁnition. It

depends on the item type which contents are relevant

here. Usually, there will be some item stem or prompt.

In multiple choice items, there will also be answer

options that will not be present in other item types.

In turn, some other item type may provide additional

ﬁles for download to the student.

In a ﬁrst pass, each content element is scanned for

placeholders that refer to dynamic objects. Each oc-

currence of a placeholder will be replaced by a new

instance of the respective dynamic object. Notably,

the content deﬁnition for a dynamic object may in-

clude references to item variables, which need to be

resolved ﬁrst.

In a second pass, each content element is scanned

for remaining placeholders that refer to item vari-

ables. These are replaced by the respective variable

values, including graphical conversions like L

formatting for mathematical formulas. Authors can

add ﬂags to each placeholder to indicate which kind

of conversion they would like to use in each individ-

ual place.

4.3 Extensibility

The software structure presented above includes three

dedicated interfaces for extensions. First, new eval-

uator functions can be added that deﬁne new ways

to compute or derive variable values. Second, new

types of dynamic objects can be added that render

new types of graphical output. Third, new conversion

ﬂags can be added that allow to present variable val-

Automatic Item Generation Integrated into the E-Assessment-System JACK

751

ues in different, potentially domain-speciﬁc format-

ting. Following the software structure, making use of

these interfaces happens in three distinct places.

Evaluator functions are added within the evalua-

tor component. Each new function is implemented

in a separate code ﬁle and registered in a dictio-

nary of available function names. Functions may be

simple with just one line of code or may be com-

plex, e. g. including calls to external software li-

braries. As demonstrated above, functions even can

make calls to external databases and could make calls

to AI-powered services and alike in a similar manner.

Since they are implemented in the dedicated evaluator

component, they can be added and updated indepen-

dently of the actual e-assessment system. Thus, new

functions for item generation can be added on-the-ﬂy

without the need to update the entire system.

Dynamic objects are added within the core of the

e-assessment system, since they are more complex

and provide individual settings that must be stored in

the system’s database. Adding a new type of dynamic

objects thus requires to implement some technical in-

terface, to deﬁne the object’s properties and to deﬁne

the output rendering for that object. In addition, input

elements in the authoring interface must be deﬁned so

that teachers can actually make use of the new objects.

Consequently, adding new dynamic objects is not as

easy as adding new evaluator functions and requires

to update and restart the entire e-assessment system.

Conversion ﬂags are added within the converter

component, which is a separate component similar to

the evaluator component. However, ﬂags are not in-

dependent of each other and thus there is a single con-

version algorithm that handles all ﬂags at once. That

algorithm needs to be extended if new ﬂags are re-

quired. Similar to the evaluator component, the con-

verter component can be updated independent of the

actual e-assessment system and thus provide new con-

versions at any time.

5 CURRENT USAGE AND

LIMITATIONS

The approach is fully operational and in use. Among

others, the mathematics department at University of

Duisburg-Essen currently maintains a pool with about

300 item templates that are used for item generation

for homework exercises. The economics department

of the same university maintains smaller pools of item

templates for various assignments in microeconomics

as well as for homework exercises and exams in statis-

tics. These pools have already been used with an older

version of JACK (Massing et al., 2018) and helped to

design the current software structure.

Although the approach is fully operational, it

comes with three notable limitations.

On the technical level, item variables are some-

what limited in the amount of data they can handle ef-

ﬁciently. While it is technically possible to create an

evaluator function that e. g. generates a large image

and returns it in Base64 encoding, it is not efﬁcient

to do so, because the variable value would then be a

very long string that must be stored in a single ﬁeld in

the system’s database. Using compressible SVG may

help here and add additional options for styling the

image on the client-side. Still, that also does not solve

the general problem that a single database ﬁeld is not

the optimal choice to store hundreds of kilobytes of

data in a single string. That does not only apply to

images, but also to evaluator functions that may e. g.

return thousands of result entries for a single query to

an external database.

The latter example may also have an impact on

user experience. As deﬁned in the requirements in

Section 2, item generation happens online right be-

fore an item is displayed to a student. Hence, item

generation is limited to cases in which the processing

of evaluator functions can be completed in a reason-

able short period of time. This applies both to the

complexity of calculations and the delay created by

connecting to external data sources or alike. The more

item variables are deﬁned in an item, the faster each

variable value must be computed to achieve an over-

all acceptable user experience. Experience shows that

waiting for about 5 seconds for an item to be loaded is

acceptable for most students in homework exercises,

but already causes distress for some students in exam

situations.

On the conceptual level, the whole process of

item generation is performed without manual super-

vision. There is no intermediate step in which gener-

ated items are reviewed by human experts before they

are displayed to students. Hence, the applicability of

the approach is limited to cases in which either a high

item quality can be guaranteed by careful item tem-

plate design and careful checking of all data sources,

or in which a reduced item quality is acceptable as

it may be the case in some speciﬁc formative assess-

ment scenarios. In particular, students and teachers

must be aware that low quality items can appear and

students must have options to skip such items if they

cannot be answered in a meaningful way. Notably,

all generated items including all variable values are

available in the e-assessment system for later manual

inspection, so that quality checks can be made in ret-

rospective. Based on these checks, item deﬁnitions

can be amended in order to improve item generation.

AIG 2025 - Special Session on Automatic Item Generation

752

6 CONCLUSIONS AND FUTURE

WORK

This paper presented the automatic item generation

concepts within the e-assessment system JACK. The

focus was not on new capabilities in item generation,

but on the software design. In particular, the con-

cept can be considered more sustainable than com-

mon standalone, single-purpose solutions, as it is in-

tegrated, reusable and extensible:

• The concept is integrated, because item genera-

tion happens directly within an e-assessment sys-

tem that is also responsible for delivering assess-

ment items to students and for automatic grading

and feedback generation. Thus, there is no need

to create an ecosystem around the tool as it might

be the case with other approaches.

• The concept supports reusability, because any of

its features is realized in a distinct component that

can be combined freely. The core concepts of item

variables and dynamic objects are generic for as-

sessment items and can be used regardless of the

item types of their parts.

• The concept supports extensibility, because each

component can be updated without interfering

with the remaining system. There is no need to

build a new system just for new types of depen-

dencies between item variables. Instead, evaluator

functions can be added as needed. There is also no

need to build a new system just for new types of

content elements. Instead, evaluator functions or

dynamic objects can be added as needed. There is

also no need for a new system just because of dif-

ferent representations. Instead, dynamic objects

or output conversions can be added as needed.

Besides a general extension of JACK’s item gen-

eration capabilities in terms of new evaluator func-

tions, future work particularly includes the inclusion

of a new type of dynamic objects for data structures in

computer science education. While visualizing data

structures in general is not a hard problem, it puts

more emphasis on the internal structure of item vari-

ables (e. g. it might become important in what order a

list contains elements).

A current research project tackles the automatic

generation of questions on program code submitted

by students in response to programming assignments.

While the research project is concerned with more

fundamental aspects of asking questions about pro-

gram code, the creation of practical demonstrators

may require to extend the current item generation ca-

pabilities of JACK with speciﬁc features for handling

program code.

REFERENCES

Baum, H., Damnik, G., Gierl, M., and Braun, I. (2021). A

shift in automatic item generation towards more com-

plex tasks. In INTED2021 Proceedings, 15th Interna-

tional Technology, Education and Development Con-

ference, pages 3235–3241. IATED.

Choi, J., Kim, H., and Pak, S. (2018). Evaluation of Auto-

matic Item Generation Utilities in Formative Assess-

ment Application for Korean High School Students.

Journal of Educational Issues, 4(1).

Choi, J. and Zhang, X. (2019). Computerized Item Mod-

eling Practices using Computer Adaptive Formative

Assessment Automatic Item Generation System: A

Tutorial. The Quantitative Methods for Psychology,

15(3):214–225.

Christ, P., Munkelt, T., and Haake, J. M. (2024). Gener-

alized Automatic Item Generation for Graphical Con-

ceptual Modeling Tasks. In Proceedings of the 16th

International Conference on Computer Supported Ed-

ucation, CSEDU 2024, Angers, France, May 2-4,

2024, Volume 1, pages 807–818. SCITEPRESS.

Circi, R., Hicks, J., and Sikali, E. (2023). Automatic item

generation: foundations and machine learning-based

approaches for assessments. Front. in Educ., 8.

Foulonneau, M. and Ras, E. (2013). Using Educational Do-

main Models for Automatic Item Generation Beyond

Factual Knowledge Assessment. In Proceedings of

EC-TEL 2013: Scaling up Learning for Sustained Im-

pact, pages 442–447. Springer Berlin Heidelberg.

Gierl, M. J. and Lai, H. (2012a). The Role of Item Models

in Automatic Item Generation. International Journal

of Testing, 12(3):273–298.

Gierl, M. J. and Lai, H. (2012b). Using Weak and Strong

Theory to Create Item Models for Automatic Item

Generation: Some Practical Guidelines with Exam-

ples. In Automatic Item Generation. Routledge.

Kosh, A. E., Simpson, M. A., Bickel, L., Kellogg, M., and

Sanford-Moore, E. (2019). A Cost–Beneﬁt Analysis

of Automatic Item Generation. Educational Measure-

ment: Issues and Practice, 38(1):48–53.

Kıyak, Y. S. and Kononowicz, A. A. (2024). Case-based

MCQ generator: A custom ChatGPT based on pub-

lished prompts in the literature for automatic item gen-

eration. Medical Teacher, 46(8):1018–1020. PMID:

38340312.

Liu, B. (2009). SARAC: A Framework for Automatic Item

Generation. In 2009 Ninth IEEE International Confer-

ence on Advanced Learning Technologies, pages 556–

558.

Massing, T., Schwinning, N., Striewe, M., Hanck, C., and

Goedicke, M. (2018). E-Assessment Using Variable-

Content Exercises in Mathematical Statistics. Journal

of Statistics Education, 26(3):174–189.

Mortimer, T., Stroulia, E., and Yazdchi, M. V. (2012).

IGOR: A Web-Based Automatic Item Generation

Tool. In Automatic Item Generation. Routledge.

Striewe, M. (2016). An architecture for modular grading

and feedback generation for complex exercises. Sci-

ence of Computer Programming, 129:35–47.

Automatic Item Generation Integrated into the E-Assessment-System JACK

753