Large Language Models for Student Code Evaluation: Insights and Accuracy

Alfonso Piscitelli; Mattia De Rosa; Vittorio Fuccella; Gennaro Costagliola

doi:10.5220/0013287500003932

Large Language Models for Student Code Evaluation: Insights and Accuracy

Alfonso Piscitelli, Mattia De Rosa, Vittorio Fuccella, Gennaro Costagliola

2025

Abstract

The improved capabilities of Large Language Models (LLMs) enable their use in various fields, including education. Teachers and students already use LLMs to support teaching and learning. In this study, we measure the accuracy of LLMs gpt-3.5, gpt-4o, claude-sonnet-20241022, and llama3 in correcting and evaluating students’ programming assignments. Seven assessments carried out by 50 students were assessed using three different prompting strategies for each of the LLMs presented. Then we compared the generated grades with the grades assigned by the teacher, who corrected them manually throughout the year. The results showed that models such as llama3 and gpt-4o obtained low percentages of generated evaluations, while gpt-3.5 and claude-sonnet-20241022 obtained interesting results if they received at least one example of evaluation.

Download

Paper Citation

in Harvard Style

Piscitelli A., De Rosa M., Fuccella V. and Costagliola G. (2025). Large Language Models for Student Code Evaluation: Insights and Accuracy. In Proceedings of the 17th International Conference on Computer Supported Education - Volume 2: CSEDU; ISBN 978-989-758-746-7, SciTePress, pages 534-544. DOI: 10.5220/0013287500003932

in Bibtex Style

@conference{csedu25,
author={Alfonso Piscitelli and Mattia De Rosa and Vittorio Fuccella and Gennaro Costagliola},
title={Large Language Models for Student Code Evaluation: Insights and Accuracy},
booktitle={Proceedings of the 17th International Conference on Computer Supported Education - Volume 2: CSEDU},
year={2025},
pages={534-544},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013287500003932},
isbn={978-989-758-746-7},
}

in EndNote Style

TY - CONF

JO - Proceedings of the 17th International Conference on Computer Supported Education - Volume 2: CSEDU
TI - Large Language Models for Student Code Evaluation: Insights and Accuracy
SN - 978-989-758-746-7
AU - Piscitelli A.
AU - De Rosa M.
AU - Fuccella V.
AU - Costagliola G.
PY - 2025
SP - 534
EP - 544
DO - 10.5220/0013287500003932
PB - SciTePress