
gration with human developers in a blended workflow
rather than as a wholesale replacement will provide
important insights.
Our future work will also consider if/how other
code quality metrics can be integrated to allow con-
sidering multiple dimensions of code quality beyond
performance. In particular, security and functional
correctness are clearly important points of consid-
eration, but must be supplemented with additional
analyses. Likewise, other quality attributes, such as
memory consumption, long-term maintainability, and
modularity, should also be analyzed. As LLMs con-
tinue to mature, understanding their role in higher-
level software creation and complementing human
programmers offer promising new frontiers.
ACKNOWLEDGEMENTS
We used ChatGPT-4’s Advanced Data Analysis capa-
biity to generate code for the data visualizations and
filter the data sets.
REFERENCES
Github copilot · https://github.com/features/copilot.
Arachchi, S. and Perera, I. (2018). Continuous integration
and continuous delivery pipeline automation for ag-
ile software project management. In 2018 Moratuwa
Engineering Research Conference (MERCon), pages
156–161.
Asare, O., Nagappan, M., and Asokan, N. (2022).
Is github’s copilot as bad as humans at intro-
ducing vulnerabilities in code? arXiv preprint
arXiv:2204.04741.
Bang, Y., Cahyawijaya, S., Lee, N., Dai, W., Su, D., Wilie,
B., Lovenia, H., Ji, Z., Yu, T., Chung, W., et al. (2023).
A multitask, multilingual, multimodal evaluation of
chatgpt on reasoning, hallucination, and interactivity.
arXiv preprint arXiv:2302.04023.
Bommasani, R., Hudson, D. A., Adeli, E., Altman, R.,
Arora, S., von Arx, S., Bernstein, M. S., Bohg, J.,
Bosselut, A., Brunskill, E., et al. (2021). On the
opportunities and risks of foundation models. arXiv
preprint arXiv:2108.07258.
Borji, A. (2023). A categorical archive of chatgpt failures.
arXiv preprint arXiv:2302.03494.
Carleton, A., Klein, M. H., Robert, J. E., Harper, E., Cun-
ningham, R. K., de Niz, D., Foreman, J. T., Goode-
nough, J. B., Herbsleb, J. D., Ozkaya, I., and Schmidt,
D. C. (2022). Architecting the future of software en-
gineering. Computer, 55(9):89–93.
Chen, B., Zhang, Z., Langren
´
e, N., and Zhu, S. (2023). Un-
leashing the potential of prompt engineering in large
language models: a comprehensive review.
De Vito, G., Lambiase, S., Palomba, F., Ferrucci, F., et al.
(2023). Meet c4se: Your new collaborator for soft-
ware engineering tasks. In 2023 49th Euromicro Con-
ference on Software Engineering and Advanced Ap-
plications (SEAA), pages 235–238.
Elnashar, A., Moundas, M., Schimdt, D. C., Spencer-Smith,
J., and White, J. Prompt engineering of chatgpt to
improve generated code & runtime performance com-
pared with the top-voted human solutions.
Espejel, J. L., Ettifouri, E. H., Alassan, M. S. Y., Chouham,
E. M., and Dahhane, W. (2023). Gpt-3.5, gpt-4, or
bard? evaluating llms reasoning ability in zero-shot
setting and performance boosting through prompts.
Natural Language Processing Journal, 5:100032.
Frieder, S., Pinchetti, L., Griffiths, R.-R., Salvatori, T.,
Lukasiewicz, T., Petersen, P. C., Chevalier, A., and
Berner, J. (2023). Mathematical capabilities of chat-
gpt. arXiv preprint arXiv:2301.13867.
Giray, L. (2023). Prompt engineering with chatgpt: A guide
for academic writers. Annals of biomedical engineer-
ing, 51(12):2629—2633.
Jalil, S., Rafi, S., LaToza, T. D., Moran, K., and Lam,
W. (2023). Chatgpt and software testing education:
Promises & perils. arXiv preprint arXiv:2302.03287.
Krochmalski, J. (2014). IntelliJ IDEA Essentials. Packt
Publishing Ltd.
Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., and Neubig,
G. (2023). Pre-train, prompt, and predict: A system-
atic survey of prompting methods in natural language
processing. ACM Computing Surveys, 55(9):1–35.
Pearce, H., Ahmad, B., Tan, B., Dolan-Gavitt, B., and Karri,
R. (2022). Asleep at the keyboard? assessing the se-
curity of github copilot’s code contributions. In 2022
IEEE Symposium on Security and Privacy (SP), pages
754–768. IEEE.
Porsdam Mann, S., Earp, B. D., Møller, N., Vynn, S., and
Savulescu, J. (2023). Autogen: A personalized large
language model for academic enhancement—ethics
and proof of principle. The American Journal of
Bioethics, 23(10):28–41.
van Dis, E. A., Bollen, J., Zuidema, W., van Rooij, R., and
Bockting, C. L. (2023). Chatgpt: five priorities for
research. Nature, 614(7947):224–226.
White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert,
H., Elnashar, A., Spencer-Smith, J., and Schmidt,
D. C. (2023). A prompt pattern catalog to enhance
prompt engineering with chatgpt. arXiv preprint
arXiv:2302.11382.
Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan,
K., and Cao, Y. (2022). React: Synergizing reason-
ing and acting in language models. arXiv preprint
arXiv:2210.03629.
Zhang, Z., Zhang, A., Li, M., and Smola, A. (2022). Au-
tomatic chain of thought prompting in large language
models. arXiv preprint arXiv:2210.03493.
ICSOFT 2024 - 19th International Conference on Software Technologies
270