user-friendly enough so that instructors from a wide
range of technical disciplines feel confident that they
can adopt it without the aid of developers. It must be
capable of being deployed as a web application or as
a tool without an explicit programming support
environment. Thirdly, it must be a fast, cost-effective,
trustworthy, scalable, and generalizable so that the
system keeps pace with technological advances.
7 CONCLUSIONS
In this paper we provide an overview of the recent
deep learning-based solutions for the task of
Automatic Short Answer Grading and its relevance in
the educational setting. We also reviewed available
benchmark datasets and evaluation metrics and
discussed their major shortcomings. We showed that
recently adopted transfer learning and transformer-
based models outperform earlier neural network-
based models used for ASAG. Nonetheless, the
application of the latest transformer-based models
such as GPT-2, GPT-3, T5, and XLNET still remain
to be explored in this context. In this paper, several
possible future directions that can present a barrier to
widescale deployment have been identified. The
interests of various stakeholders, the need for new
dataset curation guidelines and the pressing need for
more user-friendly interfaces to enable the adoption
of ASAG systems have been highlighted.
REFERENCES
Basu, S., Jacobs, C., & Vanderwende, L. (2013).
Powergrading: a clustering approach to amplify human
effort for short answer grading. Transactions of the
Association for Computational Linguistics, 1, 391-402.
Bonthu, S., Rama Sree, S., & Krishna Prasad, M. (2021).
Automated Short Answer Grading Using Deep
Learning: A Survey. Paper presented at the International
Cross-Domain Conference for Machine Learning and
Knowledge Extraction.
Burrows, S., Gurevych, I., & Stein, B. (2015). The Eras and
Trends of Automatic Short Answer Grading.
International Journal of Artificial Intelligence in
Education, 25(1), 60-117.
Camus, L., & Filighera, A. (2020). Investigating
transformers for automatic short answer grading.
Paper presented at the International Conference on
Artificial Intelligence in Education.
Clark, K., Luong, M.-T., Le, Q. V., & Manning, C. D.
(2020). Electra: Pre-training text encoders as
discriminators rather than generators. arXiv preprint
arXiv:2003.10555.
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018).
Bert: Pre-training of deep bidirectional transformers for
language understanding. arXiv preprint
arXiv:1810.04805.
Dzikovska, M. O., Nielsen, R. D., Brew, C., Leacock, C.,
Giampiccolo, D., Bentivogli, L., et al. (2013). Semeval-
2013 task 7: The joint student response analysis and 8th
recognizing textual entailment challenge: NORTH
TEXAS STATE UNIV DENTON.
Ghavidel, H. A., Zouaq, A., & Desmarais, M. C. (2020).
Using BERT and XLNET for the Automatic Short
Answer Grading Task. Paper presented at the CSEDU
(1).
Gomaa, W. H., & Fahmy, A. A. (2014). Arabic short
answer scoring with effective feedback for students.
International Journal of Computer Applications, 86(2),
35-41.
Gomaa, W. H., & Fahmy, A. A. (2020). Ans2vec: A
Scoring System for Short Answers (pp. 586-595):
Springer International Publishing.
Gong, T., & Yao, X. (2019). An attention-based deep model
for automatic short answer score. International Journal
of Computer Science and Software Engineering, 8(6),
127-132.
Hassan, S., A, A., & El-Ramly, M. (2018). Automatic Short
Answer Scoring based on Paragraph Embeddings.
International Journal of Advanced Computer Science
and Applications, 9(10).
Kitaev, N., Kaiser, Ł., & Levskaya, A. (2020). Reformer:
The efficient transformer. arXiv preprint
arXiv:2001.04451.
Kumar, S., Chakrabarti, S., & Roy, S. (2017). Earth
Mover's Distance Pooling over Siamese LSTMs for
Automatic Short Answer Grading. Paper presented at
the IJCAI.
Liu, T., Ding, W., Wang, Z., Tang, J., Huang, G. Y., & Liu,
Z. (2019). Automatic short answer grading via
multiway attention networks. Paper presented at the
International conference on artificial intelligence in
education.
Lun, J., Zhu, J., Tang, Y., & Yang, M. (2020).
Multiple data
augmentation strategies for improving performance on
automatic short answer scoring. Paper presented at the
Proceedings of the AAAI Conference on Artificial
Intelligence.
Madnani, N., & Cahill, A. (2018). Automated scoring:
Beyond natural language processing. Paper presented
at the Proceedings of the 27th International Conference
on Computational Linguistics.
Mohler, M., Bunescu, R., & Mihalcea, R. (2011). Learning
to grade short answer questions using semantic
similarity measures and dependency graph alignments.
Paper presented at the Proceedings of the 49th annual
meeting of the association for computational
linguistics: Human language technologies.
Mohler, M., & Mihalcea, R. (2009). Text-to-text semantic
similarity for automatic short answer grading. Paper
presented at the Proceedings of the 12th Conference of
the European Chapter of the ACL (EACL 2009).