ACKNOWLEDGEMENTS
I am grateful to the teachers of XXI Liceum
Ogólnokształcące im. św. Stanisława Kostki in
Lublin, Poland, who administered the vocabulary test
and to the students who completed it.
REFERENCES
Arslan, E. A., Yildirim, K., Bisen, I., & Yildirim, Y. (2021).
Reimagining education with artificial intelligence.
Eurasian Journal of Higher Education, 2(4), 32–46.
Attali, Y., & Fraenkel, T. (2000). The point-biserial as a
discrimination index for distractors in multiple-choice
items: Deficiencies in usage and an alternative. Journal
of Educational Measurement, 37(1), 77–86.
Attali, Y., Runge, A., LaFlair, G. T., Yancey, K., Goodwin,
S., Park, Y., & von Davier, A. A. (2022). The
interactive reading task: Transformer-based automatic
item generation. Frontiers in Artificial Intelligence, 5,
903077. doi:10.3389/frai.2022.903077
Bachman, L. F. (2004). Statistical Analyses for Language
Assessment. Cambridge: Cambridge University Press.
Bezirhan, U., & von Davier, M. (2023). Automated reading
passage generation with OpenAI’s large language
model. Computers and Education: Artificial
Intelligence, 5, 100161.
doi:https://doi.org/10.1016/j.caeai.2023.100161
Bonner, E., Lege, R., & Frazier, E. (2023). Large Language
Model-based artificial intelligence in the language
classroom: Practical ideas for teaching. Teaching
English with Technology, 23(1), 23–41.
Bruno, J. E., & Dirkzwager, A. (1995). Determining the
optimal number of alternatives to a multiple-choice test
item: An information theoretic perspective.
Educational and Psychological Measurement, 55, 959–
966.
Circi, R., Hicks, J., & Sikali, E. (2023). Automatic item
generation: Foundations and machine learning-based
approaches for assessments. Frontiers in Education, 8,
858273. doi:10.3389/feduc.2023.858273
Clare, A., & Wilson, J. (2012). Speakout Advanced:
Students’ Book. Harlow: Pearson.
Franco, V. R., & de Francisco Carvalho, L. (2023). A
tutorial on how to use ChatGPT to generate items
following a binary tree structure. PsyArXiv Preprints.
doi:https://doi.org/10.31234/osf.io/5hnkz
Fulcher, G. (2010). Practical Language Testing. London:
Hodder Education.
Gardner, J., O’Leary, M., & Yuan, L. (2021). Artificial
intelligence in educational assessment: ‘Breakthrough?
Or buncombe and ballyhoo?’. Journal of Computer
Assisted Learning, 37(5), 1207-1216.
doi:https://doi.org/10.1111/jcal.12577
Gierl, M. J., Bulut, O., Guo, Q., & Zhang, X. (2017).
Developing, analyzing, and using distractors for
multiple-choice tests in education: A comprehensive
review. Review of Educational Research, 87(6), 1082–
1116.
Haladyna, T. M. (2016). Item analysis for selected-response
items. In S. Lane, M. R. Raymond, & T. M. Haladyna
(Eds.), Handbook of Test Development (2nd ed., pp.
392–407). New York, NY: Routledge.
Haladyna, T. M., & Downing, S. M. (1989). A taxonomy of
multiple-choice item-writing rules. Applied
Measurement in Education, 1, 37–50.
Haladyna, T. M., & Downing, S. M. (1993). How many
options is enough for a multiple-choice test item?
Educational and Psychological Measurement, 53, 999–
1010.
Haladyna, T. M., Downing, S. M., & Rodriguez, M. C.
(2002). A review of multiple-choice item-writing
guidelines for classroom assessment. Applied
Measurement in Education, 15, 309–334.
Haladyna, T. M., & Rodriguez, M. C. (2013). Developing
and Validating Test Items. New York, NY: Routledge.
Hoffmann, S., & Evert, S. (2006). BNCweb (CQP-edition):
The marriage of two corpus tools. In S. Braun, K. Kohn,
& J. Mukherjee (Eds.), Corpus Technology and
Language Pedagogy: New Resources, New Tools, New
Methods (pp. 177–195). Frankfurt am Main: Peter
Lang.
Holmes, W., Bialik, M., & Fadel, C. (2019). Artificial
Intelligence in Education: Promises and Implications
for Teaching and Learning. Boston, MA: Center for
Curriculum Redesign.
Hoshino, Y. (2013). Relationship between types of
distractor and difficulty of multiple-choice vocabulary
tests in sentential context. Language Testing in Asia,
3(1), 16. doi:10.1186/2229-0443-3-16
Khademi, A. (2023). Can ChatGPT and Bard generate
aligned assessment items? A reliability analysis against
human performance. Journal of Applied Learning &
Teaching, 6(1), 75–80.
Kıyak, Y. S., Coşkun, Ö., Budakoğlu, I. İ., & Uluoğlu, C.
(2024). ChatGPT for generating multiple-choice
questions: Evidence on the use of artificial intelligence
in automatic item generation for a rational
pharmacotherapy exam. European Journal of Clinical
Pharmacology. doi:10.1007/s00228-024-03649-x
Kumar, A. P., Nayak, A., Shenoy K, M., Goyal, S., &
Chaitanya. (2023). A novel approach to generate
distractors for Multiple Choice Questions. Expert
Systems with Applications, 225, 120022.
doi:https://doi.org/10.1016/j.eswa.2023.120022
Ludewig, U., Schwerter, J., & McElvany, N. (2023). The
features of plausible but incorrect options: Distractor
plausibility in synonym-based vocabulary tests.
Journal of Psychoeducational Assessment, 41(7), 711–
731. doi:10.1177/07342829231167892
Malec, W., & Krzemińska-Adamek, M. (2020). A practical
comparison of selected methods of evaluating multiple-
choice options through classical item analysis.
Practical Assessment, Research, and Evaluation, 25(1,
Art. 7), 1–14.
March, D. M., Perrett, D., & Hubbard, C. (2021). An
evidence-based approach to distractor generation in