Leveraging chatgpt for text data augmentation. arXiv
preprint arXiv:2302.13007.
Di Rocco, J., Di Ruscio, D., Di Sipio, C., Nguyen, P., and
Rubei, R. (2020). Topfilter: an approach to recom-
mend relevant github topics. In Proceedings of the
14th ACM/IEEE International Symposium on Empiri-
cal Software Engineering and Measurement (ESEM),
pages 1–11.
Di Sipio, C., Rubei, R., Di Ruscio, D., and Nguyen, P. T.
(2020). A multinomial na
¨
ıve bayesian (mnb) network
to automatically recommend topics for github reposi-
tories. In Proceedings of the Evaluation and Assess-
ment in Software Engineering, pages 71–80.
Feng, Z., Guo, D., Tang, D., Duan, N., Feng, X., Gong,
M., Shou, L., Qin, B., Liu, T., Jiang, D., et al. (2020).
Codebert: A pre-trained model for programming and
natural languages. arXiv preprint arXiv:2002.08155.
Gao, J., Zhao, H., Yu, C., and Xu, R. (2023). Exploring
the feasibility of chatgpt for event extraction. arXiv
preprint arXiv:2303.03836.
Houlsby, N., Giurgiu, A., Jastrzebski, S., Morrone, B.,
De Laroussilhe, Q., Gesmundo, A., Attariyan, M., and
Gelly, S. (2019). Parameter-efficient transfer learn-
ing for nlp. In International Conference on Machine
Learning, pages 2790–2799. PMLR.
Husain, H., Wu, H.-H., Gazit, T., Allamanis, M., and
Brockschmidt, M. (2019). Codesearchnet challenge:
Evaluating the state of semantic code search. arXiv
preprint arXiv:1909.09436.
Izadi, M., Heydarnoori, A., and Gousios, G. (2021).
Topic recommendation for software repositories using
multi-label classification algorithms. Empirical Soft-
ware Engineering, 26:1–33.
Koch, G., Zemel, R., Salakhutdinov, R., et al. (2015).
Siamese neural networks for one-shot image recog-
nition. In ICML deep learning workshop, volume 2.
Lille.
Kuzman, T., Mozetic, I., and Ljube
ˇ
sic, N. (2023). Chat-
gpt: Beginning of an end of manual linguistic data an-
notation? use case of automatic genre identification.
ArXiv, abs/2303.03953.
LeClair, A., Eberhart, Z., and McMillan, C. (2018). Adapt-
ing neural text classification for improved software
categorization. In 2018 IEEE international confer-
ence on software maintenance and evolution (IC-
SME), pages 461–472. IEEE.
Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mo-
hamed, A., Levy, O., Stoyanov, V., and Zettle-
moyer, L. (2019). Bart: Denoising sequence-to-
sequence pre-training for natural language genera-
tion, translation, and comprehension. arXiv preprint
arXiv:1910.13461.
Linares-V
´
asquez, M., McMillan, C., Poshyvanyk, D., and
Grechanik, M. (2014). On using machine learning to
automatically classify software applications into do-
main categories. Empirical Software Engineering,
19:582–618.
Megahed, F. M., Chen, Y.-J., Ferris, J. A., Knoth, S., and
Jones-Farmer, L. A. (2023). How generative ai mod-
els such as chatgpt can be (mis) used in spc practice,
education, and research? an exploratory study. Qual-
ity Engineering, pages 1–29.
OpenAI (2023). Gpt-4 technical report. arXiv.
Pfeiffer, J., Vuli
´
c, I., Gurevych, I., and Ruder, S.
(2020). Mad-x: An adapter-based framework for
multi-task cross-lingual transfer. arXiv preprint
arXiv:2005.00052.
Polak, M. P. and Morgan, D. (2023). Extracting accu-
rate materials data from research papers with con-
versational language models and prompt engineering–
example of chatgpt. arXiv preprint arXiv:2303.05352.
Sas, C. and Capiluppi, A. (2021). Labelgit: A
dataset for software repositories classification us-
ing attributed dependency graphs. arXiv preprint
arXiv:2103.08890.
Sharma, A., Thung, F., Kochhar, P. S., Sulistya, A., and Lo,
D. (2017). Cataloging github repositories. In Proceed-
ings of the 21st International Conference on Evalua-
tion and Assessment in Software Engineering, pages
314–319.
Sobania, D., Briesch, M., Hanna, C., and Petke, J. (2023).
An analysis of the automatic bug fixing performance
of chatgpt. arXiv preprint arXiv:2301.08653.
Soll, M. and Vosgerau, M. (2017). Classifyhub: an al-
gorithm to classify github repositories. In KI 2017:
Advances in Artificial Intelligence: 40th Annual Ger-
man Conference on AI, Dortmund, Germany, Septem-
ber 25–29, 2017, Proceedings 40, pages 373–379.
Springer.
Thung, F., Lo, D., and Jiang, L. (2012). Detecting similar
applications with collaborative tagging. In 2012 28th
IEEE International Conference on Software Mainte-
nance (ICSM), pages 600–603. IEEE.
Tian, K., Revelle, M., and Poshyvanyk, D. (2009). Using
latent dirichlet allocation for automatic categorization
of software. In 2009 6th IEEE International Working
Conference on Mining Software Repositories, pages
163–166. IEEE.
Treude, C. (2023). Navigating complexity in software en-
gineering: A prototype for comparing gpt-n solutions.
arXiv preprint arXiv:2301.12169.
Tunstall, L., Reimers, N., Jo, U. E. S., Bates, L., Korat,
D., Wasserblat, M., and Pereg, O. (2022). Efficient
few-shot learning without prompts. arXiv preprint
arXiv:2209.11055.
¨
Ust
¨
un, A., Bisazza, A., Bouma, G., and van Noord,
G. (2020). Udapter: Language adaptation for
truly universal dependency parsing. arXiv preprint
arXiv:2004.14327.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones,
L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I.
(2017). Attention is all you need. Advances in neural
information processing systems, 30.
Wang, T., Yin, G., Li, X., and Wang, H. (2012). Labeled
topic detection of open source software from min-
ing mass textual project profiles. In Proceedings of
the First International Workshop on Software Mining,
pages 17–24.
Wei, X., Cui, X., Cheng, N., Wang, X., Zhang, X., Huang,
S., Xie, P., Xu, J., Chen, Y., Zhang, M., et al. (2023).
Subject Classification of Software Repository
37