Authors:
Carlo Bellettini
1
;
Michael Lodi
1
;
2
;
3
;
Violetta Lonati
1
;
3
;
Mattia Monga
1
;
3
and
Anna Morpurgo
1
;
3
Affiliations:
1
Università degli Studi di Milano, Milan, Italy
;
2
Alma Mater Studiorum, Università di Bologna, Bologna, Italy
;
3
Laboratorio Nazionale CINI ‘Informatica e Scuola’, Rome, Italy
Keyword(s):
Bebras, GPT-3, Large Language Models, Computer Science Education.
Abstract:
In this paper we study the problem-solving ability of the Large Language Model known as GPT-3 (codename DaVinci), by considering its performance in solving tasks proposed in the “Bebras International Challenge on Informatics and Computational Thinking”. In our experiment, GPT-3 was able to answer with a majority of correct answers about one third of the Bebras tasks we submitted to it. The linguistic fluency of GPT-3 is impressive and, at a first reading, its explanations sound coherent, on-topic and authoritative; however the answers it produced are in fact erratic and the explanations often questionable or plainly wrong. The tasks in which the system performs better are those that describe a procedure, asking to execute it on a specific instance of the problem. Tasks solvable with simple, one-step deductive reasoning are more likely to obtain better answers and explanations. Synthesis tasks, or tasks that require a more complex logical consistency get the most incorrect answers.