loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Authors: Ashraf Elnashar ; Max Moundas ; Douglas Schmidt ; Jesse Spencer-Smith and Jules White

Affiliation: Department of Computer Science, Vanderbilt University, Nashville, Tennessee, U.S.A.

Keyword(s): Large Language Models (LLMs), Automated Code Generation, ChatGPT-4 vs. AutoGen Performance, Software Development Efficiency, Stack Overflow Solution Analysis, Computer Science Education, Prompt Engineering in AI Code, Quality Assessment, Runtime Performance Benchmarking, Dynamic Testing Environments.

Abstract: In the domain of software development, making informed decisions about the utilization of large language models (LLMs) requires a thorough examination of their advantages, disadvantages, and associated risks. This paper provides several contributions to such analyses. It first conducts a comparative analysis, pitting the best-performing code solutions selected from a pool of 100 generated by ChatGPT-4 against the highest-rated human-produced code on Stack Overflow. Our findings reveal that, across a spectrum of problems we examined, choosing from ChatGPT-4's top 100 solutions proves competitive with or superior to the best human solutions on Stack Overflow. We next delve into the AutoGen framework, which harnesses multiple LLM-based agents that collaborate to tackle tasks. We employ prompt engineering to dynamically generate test cases for 50 common computer science problems, both evaluating the solution robustness of AutoGen vs ChatGPT-4 and showcasing AutoGen's effectiveness in ch allenging tasks and ChatGPT-4's proficiency in basic scenarios. Our findings demonstrate the suitability of generative AI in computer science education and underscore the subtleties of their problem-solving capabilities and their potential impact on the evolution of educational technology and pedagogical practices. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 3.22.217.242

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Elnashar, A.; Moundas, M.; Schmidt, D.; Spencer-Smith, J. and White, J. (2024). Evaluating the Performance of LLM-Generated Code for ChatGPT-4 and AutoGen Along with Top-Rated Human Solutions. In Proceedings of the 19th International Conference on Software Technologies - ICSOFT; ISBN 978-989-758-706-1; ISSN 2184-2833, SciTePress, pages 258-270. DOI: 10.5220/0012820600003753

@conference{icsoft24,
author={Ashraf Elnashar. and Max Moundas. and Douglas Schmidt. and Jesse Spencer{-}Smith. and Jules White.},
title={Evaluating the Performance of LLM-Generated Code for ChatGPT-4 and AutoGen Along with Top-Rated Human Solutions},
booktitle={Proceedings of the 19th International Conference on Software Technologies - ICSOFT},
year={2024},
pages={258-270},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012820600003753},
isbn={978-989-758-706-1},
issn={2184-2833},
}

TY - CONF

JO - Proceedings of the 19th International Conference on Software Technologies - ICSOFT
TI - Evaluating the Performance of LLM-Generated Code for ChatGPT-4 and AutoGen Along with Top-Rated Human Solutions
SN - 978-989-758-706-1
IS - 2184-2833
AU - Elnashar, A.
AU - Moundas, M.
AU - Schmidt, D.
AU - Spencer-Smith, J.
AU - White, J.
PY - 2024
SP - 258
EP - 270
DO - 10.5220/0012820600003753
PB - SciTePress