Table 2: Models comparison in variable misuse detection task evaluated at the method level. The base rate of functions with
misuse is exactly 50%.
Model
Dimension of vector
representations
Number of layers Pooling Method
Number of
training epochs
Accuracy
Final Test Test best
RGCN 100 3 Average Pooling 10 72.9 73.53
HGT 100 3 Average Pooling 10 66.12 66.44
HGT 300 5 Average Pooling 20 70.96 71.01
RGCN 300 5 Average Pooling 20 77.36 77.36
RGCN 300 5 U-net Pooling 20 76.83 76.83
RGCN 300 3 Attention Pooling 20 77.88 78.36
RGCN 300 3 U-Net Pooling 20 74.95 76
CodeBERT
with fine-tuning
768 8 - 20 95.97 95.97
CodeBERT
without fine-tuning
768 8 - 20 58.55 58.55
of graph neural networks, two variations of the prob-
lem of detecting misused variables are considered:
(1) classification of individual variables and (2) clas-
sification of functions (or methods) for the presence
of incorrect variables in their bodies. The classifica-
tion accuracy of functions for the presence of misused
variables was measured. Graph neural network mod-
els achieve significantly lower classification accuracy
than the CodeBERT model. For the future research,
we plan investigating the training process of graph
neural networks in order to improve the performance
by focusing on more recent graph format mainly HGT
with more optimized training parameters. In addition
to tuning HGT architecture for better performance in
variable misuse detection, we are aimed at minimiz-
ing the problem of input graph complexity for HGT
model.
ACKNOWLEDGEMENTS
The study was funded by a Russian Science Foun-
dation grant number 22-21-00493 https://rscf.ru/en/
project/22-21-00493/.
REFERENCES
Allamanis, M., Brockschmidt, M., and Khademi, M.
(2017). Learning to represent programs with graphs.
arXiv preprint arXiv:1711.00740.
Chirkova, N. (2020). On the embeddings of variables in
recurrent neural networks for source code. arXiv
preprint arXiv:2010.12693.
Devlin, J., Chang, M., Lee, K., and Toutanova, K. (2019).
BERT: pre-training of deep bidirectional transformers
for language understanding. In Burstein, J., Doran,
C., and Solorio, T., editors, Proceedings of the 2019
Conference of the North American Chapter of the As-
sociation for Computational Linguistics: Human Lan-
guage Technologies, NAACL-HLT 2019, Minneapolis,
MN, USA, June 2-7, 2019, Volume 1 (Long and Short
Papers), pages 4171–4186. Association for Computa-
tional Linguistics.
Feng, Z., Guo, D., Tang, D., Duan, N., Feng, X., Gong,
M., Shou, L., Qin, B., Liu, T., Jiang, D., and Zhou,
M. (2020). Codebert: A pre-trained model for pro-
gramming and natural languages. In Cohn, T., He,
Y., and Liu, Y., editors, Findings of the Association
for Computational Linguistics: EMNLP 2020, Online
Event, 16-20 November 2020, volume EMNLP 2020
of Findings of ACL, pages 1536–1547. Association for
Computational Linguistics.
Gao, H. and Ji, S. (2019). Graph u-nets. In international
conference on machine learning, pages 2083–2092.
PMLR.
Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O., and
Dahl, G. E. (2017). Neural message passing for quan-
tum chemistry. In International conference on ma-
chine learning, pages 1263–1272. PMLR.
Gori, M., Monfardini, G., and Scarselli, F. (2005). A new
model for learning in graph domains. In Proceedings.
2005 IEEE international joint conference on neural
networks, volume 2, pages 729–734.
Hellendoorn, V. J., Sutton, C., Singh, R., Maniatis, P., and
Bieber, D. (2019). Global relational models of source
code. In International conference on learning repre-
sentations.
Hu, Z., Dong, Y., Wang, K., and Sun, Y. (2020). Heteroge-
neous graph transformer. In Proceedings of the web
conference 2020, pages 2704–2710.
Kanade, A., Maniatis, P., Balakrishnan, G., and Shi, K.
(2020). Learning and evaluating contextual embed-
ding of source code. In International Conference on
Machine Learning, pages 5110–5121. PMLR.
Kudo, T. and Richardson, J. (2018). Sentencepiece: A sim-
ple and language independent subword tokenizer and
detokenizer for neural text processing. arXiv preprint
arXiv:1808.06226.
Ling, X., Wu, L., Wang, S., Pan, G., Ma, T., Xu, F.,
Liu, A. X., Wu, C., and Ji, S. (2021). Deep graph
matching and searching for semantic code retrieval.
ACM Transactions on Knowledge Discovery from
Data (TKDD), 15(5):1–21.
ENASE 2023 - 18th International Conference on Evaluation of Novel Approaches to Software Engineering
732