
REFERENCES
Abdalkareem, R., Shihab, E., and Rilling, J. (2017). What
do developers use the crowd for? A study using stack
overflow. IEEE Softw., 34(2):53–60.
Barke, S., James, M. B., and Polikarpova, N. (2023).
Grounded Copilot: How programmers interact with
code-generating models. Proc. ACM Program. Lang.,
7(OOPSLA1):85–111.
Bradley, N. C., Fritz, T., and Holmes, R. (2018). Context-
aware conversational developer assistants. In ICSE,
pages 993–1003. ACM.
Brambilla, M., Cabot, J., and Wimmer, M. (2017). Model-
driven software engineering in practice, 2nd edition.
Synthesis Lectures on Software Engineering. Morgan
& Claypool Publishers.
Chen, M., Tworek, J., Jun, H., Yuan, Q., de Oliveira Pinto,
H. P., Kaplan, J., et al. (2021). Evaluating
large language models trained on code. CoRR,
abs/2107.03374.
Chen, Y., Fu, Q., Yuan, Y., Wen, Z., Fan, G., Liu, D., Zhang,
D., Li, Z., and Xiao, Y. (2023). Hallucination de-
tection: Robustly discerning reliable answers in large
language models. In CIKM, pages 245–255. ACM.
Fleiss, J. L. (1971). Measuring nominal scale agreement
among many raters. Psychological Bulletin, 76:378–
382.
Fowler, M. (1999). Refactoring - Improving the Design of
Existing Code. Addison Wesley object technology se-
ries. Addison-Wesley.
Gamma, E., Helm, R., Johnson, R., and Vlissides, J. M.
(1994). Design Patterns: Elements of Reusable
Object-Oriented Software. Addison-Wesley Profes-
sional, 1 edition.
Gasparic, M. and Ricci, F. (2020). IDE interaction support
with command recommender systems. IEEE Access,
8:19256–19270.
Guerra, E. M., Cardoso, M., Silva, J. O., and Fernandes,
C. T. (2010). Idioms for code annotations in the Java
language. In SugarLoafPLoP, pages 7:1–7:14. ACM.
Kang, K., Cohen, S., Hess, J., Novak, W., and Peterson,
A. (1990). Feature-oriented domain analysis (FODA)
feasibility study. Technical Report CMU/SEI-90-TR-
021, Software Engineering Institute, Carnegie Mellon
University, Pittsburgh, PA.
Landis, J. R. and Koch, G. G. (1977). The measurement of
observer agreement for categorical data. Biometrics,
33:159–174.
Li, R. et al. (2023). StarCoder: May the source be with you!
CoRR, abs/2305.06161. See also https://huggingface.
co/blog/starcoder.
Liu, K., Kim, D., Bissyand
´
e, T. F., Kim, T., Kim, K.,
Koyuncu, A., Kim, S., and Traon, Y. L. (2019). Learn-
ing to spot and refactor inconsistent method names. In
ICSE, pages 1–12. IEEE / ACM.
Ozkaya, I. (2023a). Application of large language models to
software engineering tasks: Opportunities, risks, and
implications. IEEE Softw., 40(3):4–8.
Ozkaya, I. (2023b). The next frontier in software devel-
opment: AI-augmented software development pro-
cesses. IEEE Softw., 40(4):4–9.
P
´
erez-Soler, S., Guerra, E., de Lara, J., and Jurado, F.
(2017). The rise of the (modelling) bots: Towards as-
sisted modelling via social networks. In ASE, pages
723–728. IEEE Computer Society.
P
´
erez-Soler, S., Ju
´
arez-Puerta, S., Guerra, E., and de Lara,
J. (2021). Choosing a chatbot development tool. IEEE
Softw., 38(4):94–103.
Rich, C. and Waters, R. C. (1988). The programmer’s ap-
prentice: A research overview. Computer, 21(11):10–
25.
Robe, P. and Kuttal, S. K. (2022). Designing PairBuddy –
A conversational agent for pair programming. ACM
Trans. Comput.-Hum. Interact., 29(4).
Ross, S. I., Martinez, F., Houde, S., Muller, M., and Weisz,
J. D. (2023). The programmer’s assistant: Conver-
sational interaction with a large language model for
software development. In IUI, pages 491–514. ACM.
Savary-Leblanc, M., Burgue
˜
no, L., Cabot, J., Pallec, X. L.,
and G
´
erard, S. (2023). Software assistants in soft-
ware engineering: A systematic mapping study. Softw.
Pract. Exp., 53(3):856–892.
Steinberg, D., Budinsky, F., Merks, E., and Paternostro, M.
(2008). EMF: Eclipse Modeling Framework, 2nd edi-
tion. Pearson Education.
Wasowski, A. and Berger, T. (2023). Domain-specific lan-
guages - Effective modeling, automation, and reuse.
Springer.
Wilcoxon, F. (1945). Individual comparisons by ranking
methods. Biometrics, 1:196–202.
Xu, F. F., Alon, U., Neubig, G., and Hellendoorn, V. J.
(2022a). A systematic evaluation of large language
models of code. In MAPS@PLDI, pages 1–10. ACM.
Xu, F. F., Vasilescu, B., and Neubig, G. (2022b). In-IDE
code generation from natural language: Promise and
challenges. ACM Trans. Softw. Eng. Methodol., 31(2).
Yang, Y., Xia, X., Lo, D., and Grundy, J. C. (2022). A sur-
vey on deep learning for software engineering. ACM
Comput. Surv., 54(10s):206:1–206:73.
Zhang, J., Luo, J., Liang, J., Gong, L., and Huang, Z.
(2023). An accurate identifier renaming prediction
and suggestion approach. ACM Trans. Softw. Eng.
Methodol., 32(6):148:1–148:51.
Zhao, W. X. et al. (2023). A survey of large language mod-
els. https://arxiv.org/abs/2303.18223.
ENASE 2024 - 19th International Conference on Evaluation of Novel Approaches to Software Engineering
38