Evaluating the Performance of LLM-Generated Code for ChatGPT-4 and AutoGen Along with Top-Rated Human Solutions

Ashraf Elnashar; Max Moundas; Douglas Schmidt; Jesse Spencer-Smith; Jules White

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

Evaluating the Performance of LLM-Generated Code for ChatGPT-4 and AutoGen Along with Top-Rated Human Solutions

Topics: Application Software; Intelligent Systems and Applications

In Proceedings of the 19th International Conference on Software Technologies ICSOFT - Volume 1, 258-270, 2024 , Dijon, France

Authors: Ashraf Elnashar ; Max Moundas ; Douglas Schmidt ; Jesse Spencer-Smith and Jules White

Affiliation: Department of Computer Science, Vanderbilt University, Nashville, Tennessee, U.S.A.

Keyword(s): Large Language Models (LLMs), Automated Code Generation, ChatGPT-4 vs. AutoGen Performance, Software Development Efficiency, Stack Overflow Solution Analysis, Computer Science Education, Prompt Engineering in AI Code, Quality Assessment, Runtime Performance Benchmarking, Dynamic Testing Environments.

Abstract: In the domain of software development, making informed decisions about the utilization of large language models (LLMs) requires a thorough examination of their advantages, disadvantages, and associated risks. This paper provides several contributions to such analyses. It first conducts a comparative analysis, pitting the best-performing code solutions selected from a pool of 100 generated by ChatGPT-4 against the highest-rated human-produced code on Stack Overflow. Our findings reveal that, across a spectrum of problems we examined, choosing from ChatGPT-4's top 100 solutions proves competitive with or superior to the best human solutions on Stack Overflow. We next delve into the AutoGen framework, which harnesses multiple LLM-based agents that collaborate to tackle tasks. We employ prompt engineering to dynamically generate test cases for 50 common computer science problems, both evaluating the solution robustness of AutoGen vs ChatGPT-4 and showcasing AutoGen's effectiveness in ch allenging tasks and ChatGPT-4's proficiency in basic scenarios. Our findings demonstrate the suitability of generative AI in computer science education and underscore the subtleties of their problem-solving capabilities and their potential impact on the evolution of educational technology and pedagogical practices. (More)

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 216.73.216.175

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Elnashar, A., Moundas, M., Schmidt, D., Spencer-Smith, J., White and J. (2024). Evaluating the Performance of LLM-Generated Code for ChatGPT-4 and AutoGen Along with Top-Rated Human Solutions. In Proceedings of the 19th International Conference on Software Technologies - ICSOFT; ISBN 978-989-758-706-1; ISSN 2184-2833, SciTePress, pages 258-270. DOI: 10.5220/0012820600003753

@conference{icsoft24,
author={Ashraf Elnashar and Max Moundas and Douglas Schmidt and Jesse Spencer{-}Smith and Jules White},
title={Evaluating the Performance of LLM-Generated Code for ChatGPT-4 and AutoGen Along with Top-Rated Human Solutions},
booktitle={Proceedings of the 19th International Conference on Software Technologies - ICSOFT},
year={2024},
pages={258-270},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012820600003753},
isbn={978-989-758-706-1},
issn={2184-2833},
}

TY - CONF

JO - Proceedings of the 19th International Conference on Software Technologies - ICSOFT
TI - Evaluating the Performance of LLM-Generated Code for ChatGPT-4 and AutoGen Along with Top-Rated Human Solutions
SN - 978-989-758-706-1
IS - 2184-2833
AU - Elnashar, A.
AU - Moundas, M.
AU - Schmidt, D.
AU - Spencer-Smith, J.
AU - White, J.
PY - 2024
SP - 258
EP - 270
DO - 10.5220/0012820600003753
PB - SciTePress