solving skills can be used to guide our practice and
feedback strategies when implementing formative
assessments and to evaluate competency and
proficiency when implementing summative
assessments. Traditional single-answer, multiple-
choice items—while common in educational
testing—cannot be used to measure the steps required
to solve complex, often context-dependent, problems
because this item format is limited to a single item.
Testlets can be used to overcome this limitation.
A testlet is a set of two or more items based on the
same scenario. Testlets are effective at measuring
complex problem-solving skills because they include
a set of items related to a common scenario thereby
ensuring that different aspects or components of the
problem—as it relates to the scenario—can be
evaluated. Unfortunately, testlets are challenging to
write. And large numbers of testlets are required for
both formative and summative assessments.
To address this item development problem,
testlets can be integrated into the three-step AIG
process by creating a testlet item model. A testlet item
model is unique because it contains two types of
variables. Global variables can be used throughout
the testlet to vary the content of the generated items.
Global variables are typically introduced in the
scenario so they can be used to link the content in the
item model to the content in the scenario. Global
variables can also be used to link content across two
or more item models. Local variables are also used in
testlet-based AIG. A local variable is specific to each
item model and therefore cannot be used throughout
the testlet. A local variable is used to help ensure the
content in each item model in the testlet is unique. We
provided four illustrative cases to demonstrate how
global and local variables can be used independently
or combined with one another to generate items.
To conclude, testlet-based AIG is a new method
for scaling the item development process. It allows
content specialists to create sets of items linked to a
common scenario. The context for the scenario is
limitless meaning the scenario can be short or long, it
can contain a small or a large amount of content, it
can contain a small or a large number of global
variables, and it can be in any content area. In other
words, the scenario in a testlet can be created to
accommodate any problem-solving situation. Testlet-
based AIG can also be used to measure a range of
knowledge and skills because the length of the item
set is flexible. A testlet can contain a small (e.g., 2) or
a large number of items (e.g., >5) thereby allowing
the content specialist to measure many different types
of knowledge and skills as they relate to the content
in the scenario. Finally, testlet-based AIG is
embedded within a well-established item
development framework associated with AIG (Gierl
& Lai, 2016b). This framework is structured using a
three-step process, where the testlet item model is
created in step 2. The framework also includes a
method for validating the content thereby ensuring
the generated items in the testlet accurately measure
the intended curricular and cognitive outcomes on the
computerized test of interest.
REFERENCES
Andrade, H., Bennett, R., & Cizek, G. (2019). Handbook of
Formative Assessment in the Disciplines. Boca Raton,
FL: CRC Press.
Auld, E. & Morris, P. (2019). The OECD and IELS:
Redefining early childhood education for the 21st
century. Policy Futures in Education, 17, 11-26.
Black, P., & Wiliam, D. (1998). Assessment and classroom
learning. Assessment in Education: Principles, Policy,
& Practice, 5, 7-74.
Black, P. & Wiliam, D. (2010). Inside the black box:
Raising standards through classroom assessment. Phi
Delta Kappan, 92, 81-90.
Bradlow, E. T., Wainer, H., & Wang, X. (1999). A
Bayesian random effects model for testlets.
Psychometrika, 64, 153-168.
Chu, S., Reynolds, R., Notari, M. & Lee, C. (2017). 21st
Century Skills Development through Inquiry-Based
Learning. New York: Springer.
Clauser, B. E., Margolis, M. J., & Case, S. M. (2006).
Testing for licensure and certification in the
professions. Educational Measurement, 4, 701-731.
Daniel, M., Rencic, J., Durning, S. J., Holmboe, E., Santen,
S. A., Lang, V., ... & Gruppen, L. D. (2019). Clinical
reasoning assessment methods: a scoping review and
practical guidance. Academic Medicine, 94(6), 902-
912.
Gierl, M. J., Bulut, O., & Zhang, X. (2018). Using
computerized formative testing to support personalized
learning in higher education: An application of two
assessment technologies. In R. Zheng (Ed.), Digital
Technologies and Instructional Design for
Personalized Learning (pp. 99-119). Hershey, PA: IGI
Global.
Gierl, M. J., & Haladyna, T. (2013). Automatic Item
Generation: Theory and Practice. New York:
Routledge.
Gierl, M. J. & Lai, H. (2016a). Automatic item generation.
In S. Lane, M. Raymond, & T. Haladyna (Eds.),
Handbook of Test Development (2
nd
edition, pp. 410-
429). New York: Routledge.
Gierl, M. J. and Lai, H. (2016b). A process for reviewing
and evaluating generated test items. Educational
Measurement: Issues and Practice, 35, 6–20.
Gierl, M. J. & Lai, H. (2017). The role of cognitive models
in automatic item generation. In A. Rupp & J. Leighton