
taker responses to both types of questions on the same
topic.
REFERENCES
Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I.,
Aleman, F. L., Almeida, D., Altenschmidt, J., Altman,
S., Anadkat, S., et al. (2023). GPT-4 technical report.
CoRR, abs/2303.08774.
Alsubait, T., Parsia, B., and Sattler, U. (2015). Ontology-
based multiple choice question generation. KI -
K
¨
unstliche Intelligenz, 30.
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D.,
Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G.,
Askell, A., Agarwal, S., Herbert-Voss, A., Krueger,
G., Henighan, T., Child, R., Ramesh, A., Ziegler, D.,
Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E.,
Litwin, M., Gray, S., Chess, B., Clark, J., Berner,
C., McCandlish, S., Radford, A., Sutskever, I., and
Amodei, D. (2020). Language models are few-shot
learners. In Larochelle, H., Ranzato, M., Hadsell, R.,
Balcan, M., and Lin, H., editors, Advances in Neu-
ral Information Processing Systems, volume 33, pages
1877–1901. Curran Associates, Inc.
Christiano, P., Leike, J., Brown, T. B., Martic, M., Legg, S.,
and Amodei, D. (2023). Deep reinforcement learning
from human preferences. CoRR, abs/1706.03741.
Faraby, S. A., Adiwijaya, A., and Romadhony, A. (2023).
Review on neural question generation for education
purposes. International Journal of Artificial Intelli-
gence in Education, pages 1–38.
Gao, Y., Bing, L., Chen, W., Lyu, M., and King, I. (2019).
Difficulty controllable generation of reading compre-
hension questions. pages 4968–4974.
Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang,
S., Wang, L., and Chen, W. (2021). Lora: Low-
rank adaptation of large language models. CoRR,
abs/2106.09685.
Kumar, G., Banchs, R., and D’Haro, L. (2015). Au-
tomatic fill-the-blank question generator for student
self-assessment. pages 1–3.
Kurdi, G., Leo, J., Parsia, B., Sattler, U., and Al-Emari, S.
(2020). A systematic review of automatic question
generation for educational purposes. International
Journal of Artificial Intelligence in Education, 30:121
– 204.
Lee, M., Nakamura, F., Shing, M., McCann, P., Akiba, T.,
and Orii, N. (2023). Japanese stablelm instruct alpha
7b v2.
Li, B., Hou, Y., and Che, W. (2022). Data augmentation
approaches in natural language processing: A survey.
AI Open, 3:71–90.
Lin, C.-Y. and Hovy, E. (2003). Automatic evaluation of
summaries using n-gram co-occurrence statistics. In
Proceedings of the 2003 Human Language Technol-
ogy Conference of the North American Chapter of the
Association for Computational Linguistics, pages 71–
78.
Liu, M., Calvo, R. A., and Rus, V. (2010). Automatic ques-
tion generation for literature review writing support.
In International Conference on Intelligent Tutoring
Systems.
Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., and Neubig,
G. (2023a). Pre-train, prompt, and predict: A system-
atic survey of prompting methods in natural language
processing. ACM Comput. Surv., 55(9).
Liu, Y., Han, T., Ma, S., Zhang, J., Yang, Y., Tian, J., He, H.,
Li, A., He, M., Liu, Z., Wu, Z., Zhao, L., Zhu, D., Li,
X., Qiang, N., Shen, D., Liu, T., and Ge, B. (2023b).
Summary of chatgpt-related research and perspective
towards the future of large language models. Meta-
Radiology, 1(2):100017.
Oh, S., Go, H., Moon, H., Lee, Y., Jeong, M., Lee, H. S.,
and Choi, S. (2023). Evaluation of question gener-
ation needs more references. In Rogers, A., Boyd-
Graber, J., and Okazaki, N., editors, Findings of
the Association for Computational Linguistics: ACL
2023, pages 6358–6367, Toronto, Canada. Associa-
tion for Computational Linguistics.
Patil, R., Boit, S., Gudivada, V., and Nandigam, J. (2023).
A survey of text representation and embedding tech-
niques in nlp. IEEE Access, 11:36120–36146.
Perkoff, E. M., Bhattacharyya, A., Cai, J. Z., and Cao, J.
(2023). Comparing neural question generation archi-
tectures for reading comprehension. In Proceedings
of the 18th Workshop on Innovative Use of NLP for
Building Educational Applications (BEA 2023), pages
556–566.
Rajpurkar, P., Zhang, J., Lopyrev, K., and Liang, P. (2016).
SQuAD: 100,000+ questions for machine comprehen-
sion of text. In Su, J., Duh, K., and Carreras, X., ed-
itors, Proceedings of the 2016 Conference on Empir-
ical Methods in Natural Language Processing, pages
2383–2392, Austin, Texas. Association for Computa-
tional Linguistics.
Shin, D. and Lee, J. H. (2023). Can chatgpt make reading
comprehension testing items on par with human ex-
perts? Language Learning & Technology, 27(3):27–
40.
Susanti, Y., Tokunaga, T., Nishikawa, H., and Obari, H.
(2017). Evaluation of automatically generated English
vocabulary questions. Research and Practice in Tech-
nology Enhanced Learning, 12(11):1–12.
Wang, L., Yang, N., Huang, X., Jiao, B., Yang, L., Jiang, D.,
Majumder, R., and Wei, F. (2022). Text embeddings
by weakly-supervised contrastive pre-training. CoRR,
abs/2212.03533.
Yuan, X., Wang, T., Wang, Y.-H., Fine, E., Abdelghani,
R., Sauz
´
eon, H., and Oudeyer, P.-Y. (2023). Selecting
better samples from pre-trained LLMs: A case study
on question generation. In Rogers, A., Boyd-Graber,
J., and Okazaki, N., editors, Findings of the Asso-
ciation for Computational Linguistics: ACL 2023,
pages 12952–12965, Toronto, Canada. Association
for Computational Linguistics.
Automatic Question Generation for the Japanese National Nursing Examination Using Large Language Models
829