Student Open Responses in Mathematics. Proceedings
of the Int. Conference on Educational Data Mining.
Basu, S., Jacobs, C., & Vanderwende, L. (2013).
Powergrading: A Clustering Approach to Amplify
Human Effort for Short Answer Grading. Transactions
of the Association for Computational Linguistics, 1,
391–402. https://doi.org/10.1162/tacl_a_00236
Borji, A. (2023). A categorical archive of chatgpt failures.
arXiv Preprint arXiv:2302.03494.
Burrows, S., Gurevych, I., & Stein, B. (2015). The eras and
trends of automatic short answer grading. International
Journal of Artificial Intelligence in Education.
Choi, J. H., Hickman, K. E., Monahan, A., & Schwarcz, D.
(2023). Chatgpt goes to law school. Available at SSRN.
Condor, A., Litster, M., & Pardos, Z. (2021). Automatic
short answer grading with bert on out-of-sample
questions. Proceedings of the 14th International
Conference on Educational Data Mining.
Dikli, S. (2010). The nature of automated essay scoring
feedback. Calico Journal, 28(1), 99–134.
EU. (2020). White Paper on Artificial Intelligence: A
European Approach to Excellence and Trust.
Frieder, S., Pinchetti, L., Griffiths, R.-R., Salvatori, T.,
Lukasiewicz, T., Petersen, P. C., Chevalier, A., &
Berner, J. (2023). Mathematical capabilities of chatgpt.
arXiv Preprint arXiv:2301.13867.
Future of Life. (2023). Pause Giant AI Experiments: An
Open Letter. https://futureoflife.org/open-letter/pause-
giant-ai-experiments/
Haller, S., Aldea, A., Seifert, C., & Strisciuglio, N. (2022).
Survey on automated short answer grading with deep
learning: From word embeddings to transformers.
arXiv Preprint arXiv:2204.03503.
Kortemeyer, G. (2023). Could an Artificial-Intelligence
agent pass an introductory physics course? arXiv
Preprint arXiv:2301.12127.
Kumar, V., & Boulanger, D. (2020). Explainable automated
essay scoring: Deep learning really has pedagogical
value. Frontiers in Education, 5, 186.
Li, T. W., Hsu, S., Fowler, M., Zhang, Z., Zilles, C., &
Karahalios, K. (2023). Am I Wrong, or Is the
Autograder Wrong? Effects of AI Grading Mistakes on
Learning. Proceedings of the Conference on
International Computing Education Research
Longo, L., Brcic, M., Cabitza, F., Choi, J., Confalonieri, R.,
Del Ser, J., Guidotti, R., Hayashi, Y., Herrera, F.,
Holzinger, A., & others. (2023). Explainable artificial
intelligence (XAI) 2.0: A manifesto of open challenges
and interdisciplinary research directions. arXiv Preprint
arXiv:2310.19775.
Madnani, N., Loukina, A., Von Davier, A., Burstein, J., &
Cahill, A. (2017). Building better open-source tools to
support fairness in automated scoring. Proc. of the
Workshop on Ethics in Natural Language Processing.
Meske, C., Bunde, E., Schneider, J., & Gersch, M. (2022).
Explainable artificial intelligence: Objectives,
stakeholders, and future research opportunities.
Information Systems Management, 39(1), 53–63.
Saha, S., Dhamecha, T. I., Marvaniya, S., Sindhgatta, R., &
Sengupta, B. (2018). Sentence Level or Token Level
Features for Automatic Short Answer Grading?: Use
Both. In C. Penstein Rosé & others (Eds.), Artificial
Intelligence in Education (Vol. 10947).
Schneider, J., & Apruzzese, G. (2023). Dual adversarial
attacks: Fooling humans and classifiers. Journal of
Information Security and Applications, 75, 103502.
Schneider, J., Kruse, C., Leona, & Seeber, I. (2024).
Validity Claims in Children-AI Discourse: Experiment
with ChatGPT. Proceedings of the International
Conference on Computer Supported Education
(CSEDU 2024).
Schneider, J., Meske, C., & Kuss, P. (2024). Foundation
Models. Business Information Systems Engineering.
Schneider, J., Richner, R., & Riser, M. (2023). Towards
trustworthy autograding of short, multi-lingual, multi-
type answers. International Journal of Artificial
Intelligence in Education, 33(1), 88–118.
Seufert, S., Niklaus, C., & Handschuh, S. (2022). Review
of AI-Enabled Assessments in Higher Education.
https://www.alexandria.unisg.ch/handle/20.500.14171/
109523
Sung, C., Dhamecha, T., Saha, S., Ma, T., Reddy, V., &
Arora, R. (2019). Pre-training BERT on domain
resources for short answer grading. Proceedings of the
Conf. on Empirical Methods in Natural Language
Processing and the Int. Joint Conf. on Natural Language
Processing (EMNLP-IJCNLP).
Tang, G. (2023). Letter to editor: Academic journals should
clarify the proportion of NLP-generated content in
papers. Account. Res. https://doi.org/10.
1080/08989621.2023.2180359
van Dis, E. A., Bollen, J., Zuidema, W., van Rooij, R., &
Bockting, C. L. (2023). ChatGPT: five priorities for
research. Nature, 614(7947), 224–226.
Vittorini, P., Menini, S., & Tonelli, S. (2020). An AI-Based
System for Formative and Summative Assessment in
Data Science Courses. International Journal of
Artificial Intelligence in Education, 1–27.
Yan, L., Sha, L., Zhao, L., Li, Y., Martinez-Maldonado, R.,
Chen, G., Li, X., Jin, Y., & Gašević, D. (2023).
Practical and ethical challenges of large language
models in education: A systematic scoping review.
British Journal of Educational Technology.
Zhang, L., Huang, Y., Yang, X., Yu, S., & Zhuang, F.
(2022). An automatic short-answer grading model for
semi-open-ended questions. Interactive Learning
Environments, 30(1), 177–190.