Leveraging Transformer and Graph Neural Networks for Variable

Misuse Detection

Vitaly Romanov, Gcinizwe Dlamini

, Aidar Valeev and Vladimir Ivanov

Faculty of Computer Science and Engineering, Innopolis University, Innopolis, Russia

Keywords:

Graph Neural Network, CodeBERT, Variable Misuse Detection.

Abstract:

Understanding source code is a central part of ﬁnding and ﬁxing software defects in software development.

In many cases software defects caused by an incorrect usage of variables in program code. Over the years

researchers have developed data-driven approaches to detect variable misuse. Most of modern existing ap-

proaches are based on the transformer architecture, trained on millions of buggy and correct code snippets to

learn the task of variable detection. In this paper, we evaluate an alternative, a graph neural network (GNN) ar-

chitectures, for variable misuse detection. Popular benchmark dataset, which is a collection functions written

in Python programming language, is used to train the models presented in this paper. We compare the GNN

models with the transformer-based model called CodeBERT.

1 INTRODUCTION

In the context of this work, to ﬁnd misused vari-

ables in source code is to detect incorrect occurrences

of variables. Detecting such misused variables can

prevent bugs and vulnerabilities in programs, which

are difﬁcult to recognize, because typically misused

variables do not lead to compiling errors. Statis-

tical analysis and machine learning approaches are

increasingly used, including both traditional natural

language processing approaches (Vasic et al., 2019)

and approaches that use graph-based representation of

source code (Allamanis et al., 2017). In general, the

essence of the problem lies in the detection of mis-

used variables, as well as the recommendation of the

correct candidate, which might be one of the variables

already present in the body of a function.

In this paper, we propose and evaluate a graph

neural network (GNN) architecture for the problem of

variable misuse detection. The problem includes two

subtasks: localization of an incorrectly used variable

and its repair by identifying a correct variable in the

scope. When source code is represented as a graph,

both subtasks can be formulated as node classiﬁcation

problems.

Recently, vector representations for source code

elements were trained using convolutional networks

https://orcid.org/0000-0002-4578-5011

https://orcid.org/0000-0003-3289-8188

on graphs (GCNs). One of the primary structures used

for this is abstract syntax tree. To build a program

graph, in addition to the syntax tree, external infor-

mation can be used. For example, the place in the

program where the last use of the variable occurred.

Despite the conceptual advantage of representing

source code as a graph, existing methods for training

machine learning models on graphs have few disad-

vantages. Most existing models are based on the mes-

sage passing mechanism. As a consequence, a large

amount of message passing may be required to model

links between far-ﬂung nodes in a graph. In (Hel-

lendoorn et al., 2019), it is proposed to use a hybrid

approach, which simultaneously uses the representa-

tion of the source code as a sequence and as a graph.

This approach enables using the strengths of various

neuro-architectures in a single model.

The goal of this paper is to explore the capabili-

ties of Graph Neural Networks for solving the prob-

lem of variable misuse detection. We compare per-

formance of different GNN architectures and identify

challenges that emerge when using GNNs for source

code analysis. Speciﬁcally, we observed that the com-

plexity of the input graph negatively affect the quality

of variable misuse detection.

In our tests of GNN models on the task of vari-

able misuse detection, we observed that pre-trained

models such as CodeBERT achieve far better perfor-

mance than GNN models trained from scratch. How-

ever, GNN models used in our experiments use far

Romanov, V., Dlamini, G., Valeev, A. and Ivanov, V.

Leveraging Transformer and Graph Neural Networks for Variable Misuse Detection.

DOI: 10.5220/0011997300003464

In Proceedings of the 18th International Conference on Evaluation of Novel Approaches to Software Engineering (ENASE 2023), pages 727-733

ISBN: 978-989-758-647-7; ISSN: 2184-4895

 2023 by SCITEPRESS – Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)

727

fewer parameters, which make the conclusion about

the results more problematic.

The structure of this paper is as follows: Section

2 outlines the related work together with background

on graph neural networks and variable misuse. In

Section 3, the proposed model is presented in detail,

while Section 4 presents the obtained results followed

by a discussion. The paper is concluded in Section 5

with possible future directions.

2 BACKGROUND

In this section, we present a background on the meth-

ods used in this paper.

2.1 Graph Neural Network

A Graph Neural Network (GNN) is a machine learn-

ing model specially designed to learn and extract

knowledge from graph-structured data (Gori et al.,

2005). From the graph theory, a graph G = (V, E)

is made of nodes V = v

, . . . , v

and set of edges

E = E

, . . . , E

connecting the nodes. The general

idea behind GNNs is representing each node in a

graph with a vector, whereby the vector represents the

node’s neighbors and the relationship between the in-

termediate nodes.

Over the years, there have been many variations

in the implementations of graph-structured format for

training GNNs. One of the most popular realizations

of GNNs emerged from quantum chemistry presented

by Gilmer et al. (Gilmer et al., 2017) called the

message-passing technique. In the message-passing

technique, the state of a node (vector representation)

is computed by aggregating the signals for the neigh-

boring nodes. In each iteration a node v representation

as :

(i+1)

= φ



(i)

}

u∈N(v)



(1)

where N(v) is the set of all the neighbors that are

connected to v, φ(.) is a non-linear function that per-

forms the aggregation of information, s

(i)

denotes the

state of node v at step i.

A relational graph convolutional network

(RGCN), has been used in recent papers (Liu et al.,

2021) and (Ling et al., 2021). Initially, all nodes

are initialized using vector representations of nodes.

Then several iterations of message transmission are

performed. Aggregation of messages is performed af-

ter each iteration by averaging vector representations

received from neighbors.

2.2 Transformer Architecture

Transformer (Vaswani et al., 2017) has proved its ef-

ﬁciency in many ﬁelds, including Natural Language

Processing and Computer Vision. Most pre-trained

transformer follow the Encoder-Decoder architecture.

A single encoder layer consists of two blocks: a self-

attention block and a feed-forward network. Multi-

Head Dot-product Self-Attention block is deﬁned as:

Sel f Attention(X) = So f tmax



√



V (2)

where Q, K, and V are linear projections of X , d

is the dimensionality of K. A single decoder layer

is similar but extended with an attention-to-encoder

block, which has encoder outputs as V in the self-

attention formula.

A transformer is designed for sequence-to-

sequence tasks, therefore BERT (Devlin et al., 2019)

appeared as a universal solution to many Natural Lan-

guage Processing tasks.

CodeBERT (Feng et al., 2020) is a pretrained

BERT model for programming and natural lan-

guages. CodeBERT was pretrained on the dataset

from GitHub repositories in six programming lan-

guages. This dataset consists of bimodal (2.1 million

samples) and unimodal (6.5 million samples), where

bimodal means pairs of functions and corresponding

summaries, while unimodal means solely functions.

Researchers over the years have proposed differ-

ent Transformer based approaches to detect variable

misuse. Vasic et al. (Vasic et al., 2019) in addition

to detecting if a variable misuse using LSTM cou-

pled with attention, proposed an approach that pin-

points the location of misuse and generates a repair.

The researchers used two datasets as benchmarks:

ETH-Py150 and MSR-VarMisuse (25 C# GitHub

projects). To validate the LSTM-based approach, the

researchers achieved an accuracy of 82.4% on vari-

able misuse, 71% on localization, and 65.7% on local-

ization+repair task. To further validate the proposed

approach, Vasic et al. (Vasic et al., 2019) evaluated

their model on realistic scenarios where they collected

a dataset from multiple software projects in an indus-

trial setting. The accuracy of the model on realistic

scenarios is 66.7%, 21.9% on localization and 15.8%

on localization+repair.

In the same spirit as Vasic et al. (Vasic et al.,

2019), (Chirkova, 2020) two years later proposed an

approach for generating dynamic embeddings aimed

at improving the performance of the recurrent neu-

ral network, in code completion and bug-ﬁxing tasks.

The researchers focused their research on two popular

dynamically typed programming languages, namely

JavaScript and Python. The dataset used for training

ENASE 2023 - 18th International Conference on Evaluation of Novel Approaches to Software Engineering

728

and testing are Python150k (Raychev et al., 2016a)

and JavaScript150k (Raychev et al., 2016b)

3 METHODOLOGY

In this section, we present our methodology and

the implementation details. Figure 1 presents an

overview of our pipeline.

Figure 1: The Pipeline represents two alternative ways for

evaluating the performance: Transformer-based and GNN-

based.

3.1 Dataset Creation

The dataset used to train the variable misuse detection

model is based on the benchmark dataset prepared as

part of the CuBERT paper (Kanade et al., 2020). The

part of the dataset related to variable misuse localiza-

tion and repair is used. However, the data represen-

tation format is not suitable for converting the source

code into a graph, since the source code is given as a

list of CuBERT tokens. In order to adapt the dataset

for the current study, the following procedure is used.

The Variable-misuse classiﬁcation part of the

dataset is used as the basis for the dataset, which does

not contain information about the location of the error.

Only information about whether an error is present in

a given function, as well as which variable names are

used incorrectly. In order to prepare the data suitable

for transformation into a graph, we exploit the fact

that the reference dataset contains the example of the

correct function and the example of the same func-

tion with the error. Thus, it is possible to reconstruct

in which line the incorrectly used variable is located.

The Variable-misuse classiﬁcation part of the refer-

ence dataset processed in this way contains examples

of functions that can be unambiguously matched to

functions in the Variable-misuse localization and re-

pair part of the reference dataset. This fact allows

us to compare the results obtained on the dataset pre-

pared in this work with the results published for the

Variable-misuse localization and repair problem.

The transformation of source code into a graph is

performed as follows. The following requirements are

made to a graph representation of the source code:

1. the variables with the same name occurring in the

body of one function shall be interpreted as the

same variable;

2. the expressions deﬁned in the body of if condi-

tional operator, for and while loops, try excep-

tion handling block shall be linked to the above-

mentioned operators (for example, an expression

in if operator block shall be linked to the node cor-

responding to this operator);

3. the order of expressions execution shall be re-

ﬂected in the graph;

4. links with imported modules, called functions,

and inherited classes must be unambiguously re-

solved;

5. values of constants in the source code (numbers

and strings) must not be present in the graph;

6. function and variable names must be tokenized.

The ﬁrst step is the extraction of global relation-

ships, which is done with Sourcetrail. During the

indexing process, Sourcetrail creates a database of

global relationships. Examples of such relationships

are relations to functions called in code, imported

modules, and inheritance relations. The g

global

, E

global

) graph represents the set of global

nodes and the relationships between them. The

Sourcetrail utility allows to extract dependencies even

when importing from third-party packages. As a re-

sult, information is collected for frequently used li-

braries about exactly how they are used. In addi-

tion, Sourcetrail stores the correspondence between

the source code and the nodes in the graph, which

allows later to use the combined representation of

source code as a sequence of tokens and as a graph

to solve the type classiﬁcation problem.

Further, source code in the codebase is processed

at the individual ﬁle level. The ast module (Python

3.8) is used to extract the syntax tree of the program.

Next, a local source code graph g

local

= (N

local

, E

local

)

is generated for an individual ﬁle. The step of com-

bining variable names transforms the syntax tree of

the program into a graph. Variables with the same

name inside the function body are combined into one

node. All references to the function name inside the

ﬁle are also represented by one node in the graph.

Then auxiliary edges are added. For expressions de-

ﬁned in the body of the conditional if statement, for

and while loops, and the try exception handling block,

links are added to the statements mentioned above

(for example, for an expression in the if statement

Leveraging Transformer and Graph Neural Networks for Variable Misuse Detection

729

block, a link is added to the node corresponding to

this statement). Additional edges next and prev are

used to show the order of program execution in the

graph.

The last step of creating a local graph is the to-

kenization of names. In this step, nodes and edges

of subword type are added to the graph, which rep-

resents names tokens. The nodes representing tokens

are common to all ﬁles in the code base. Without to-

kenization, the number of unique names grows due to

neologisms as new code is added to the code base (Al-

lamanis et al., 2017). One of the most popular tools

for tokenization is sentencepiece (Kudo and Richard-

son, 2018). It is based on compression algorithms,

ﬁnds the most frequent substring in the code base, and

uses them as tokens.

At the last stage of ﬁle processing, the global

nodes are mapped in the code base. The edges of

global mention type are added to the local graph, link-

ing the nodes of the local graph with their correspond-

ing nodes of the global graph. These edges can be

represented by the graph

global mention

= (N

local

∪N

global

, E

global mention

) (3)

The result of matching is graph :

g = g

local

∪g

global

∪g

global mention

(4)

After that, the graph g for an individual ﬁle is

merged with the general graph G, which combines all

processed packets.

3.2 Graph and Transformer Models

For conducting experiments, two models of graph

neural networks were tested: (1) relational graph neu-

ral network (RGCN (Schlichtkrull et al., 2018)) and

(2) heterogeneous graph transformer (HGT (Hu et al.,

2020)). We use HGT for comparison for two reasons.

First, this model is newer than RGCN and showed

better performance on some tasks. Second, since

transformers often show superior performance, it is

only natural to consider a GNN model that is based

on the transformer architecture.

During experiments with function-level variable

misuse detection, we test the pre-trained CodeBERT

model from the transformer library.

GNNs, such as RGCN and HGT, work in the fol-

lowing way. At the initial stage, the nodes corre-

sponding to the source code tokens are initialized with

vector representations deﬁned for these tokens. For

the remaining nodes of the graph, initialization is per-

formed using vector representations of node types.

Then several iterations of message transmission are

performed. Aggregation of messages is performed

after each iteration by averaging vector representa-

tions of neighboring nodes. In HGT, aggregation uses

transformer with self-attention. In RGCN, summation

is used as the aggregation function.

3.2.1 Variable Misuse Detection as Node

Classiﬁcation

The detection of variable misuse can be performed at

a lower level of abstraction, whereby each variable in

a given method is classiﬁed as containing a misused

or not. To predict if a variable is misused in the con-

text of the given method (scope), it is important to

capture the context and, based on the context, decide

whether a variable is misused or not. The main task

is to capture the context and assign a label to each of

the variables in the scope.

To detect whether a variable is misused or not, we

formulated the task as a node classiﬁcation problem.

Each program is represented by a graph. The nodes

in a graph represent variables and the edges represent

their relationships. Each node that represents a vari-

able can be assigned a binary label based on whether

this variable was misused or not.

3.2.2 Function-Level Variable Misuse as

Subgraph Classiﬁcation

The detection of variable misuse can be performed at

a higher level of abstraction, where methods are clas-

siﬁed as containing a variable misuse or not. To pre-

dict whether a method contains a misused variable,

the method is represented as a subgraph, which is a

part of a program.

To represent the subgraph as an embedding for

variable misuse prediction, the message passing is

performed only among the subset nodes in the sub-

graph. The propagation of the subgraph information

and calculation of the vector representation of a func-

tion embedding is achieved by applying pooling after

the message passes between the nodes.

Pooling approaches are commonly used in con-

volutional neural networks when dealing with image

data. For graph-structured data, features from a set of

nodes of undeﬁned size should be pooled into a vector

of ﬁxed dimensionality. In our research, the following

variants of such functions were considered :

• Calculation of the vector representation of the

function by averaging the vector representations

of nodes (Average Pooling);

• Calculation of the vector representation of the

function by weighted averaging, weights for aver-

aging are calculated using the Attention Pooling

method (Muti-head attention pooling);

ENASE 2023 - 18th International Conference on Evaluation of Novel Approaches to Software Engineering

730

• Calculation of the vector representation of the

function by weighted averaging of the ﬁrst k nodes

with the largest weights, the weights for averaging

are calculated using the attention method (U-Net

Pooling (Gao and Ji, 2019)).

3.3 Models Training and Evaluation

We train all GNN-based model which is a black box

analysis learning task and learn the relationship of the

variable in a given scope to predict if there exists a

variable misuse. The implementation is in python us-

ing PyTorch deep learning framework. For the pur-

pose of conducting experiments and searching for

optimal parameters, we varied the three parameters

(batch size, node embedding size, and number of lay-

ers in the neural network).

To evaluate the trained models, we used the accu-

racy metric. To observe and analyze the model per-

formance we used different conﬁgurations whereby

for GNN different pooling methods, number of lay-

ers, size of the training dataset, and the dimension of

vector representation we used.

4 RESULTS AND DISCUSSIONS

In this section, we present the achieved experimental

results and the discussion of the results.

We conducted pilot experiments on the 10% sam-

ple of the full dataset. Originally we focused more

on the node-level variable misuse detection. The re-

sults for there experiments are shown in Table 1. Ex-

periments on the full dataset included only variable

misuse detection on the function level because these

results were easily comparable with similar metrics

produced using CodeBERT classiﬁer. The results for

these experiments are shown in Table 2. In both ex-

periments, the reported accuracy scores are the best

accuracy on testing (as Test best) and training (Train

best) set during the model training.

The results of the ﬁrst series of experiments us-

ing the 10% portion of the dataset, trained on the task

of node-level variable misuse detection, show that the

model does not experience signiﬁcant overﬁt. The ac-

curacy of misuse detection increases slightly with the

increase of GNN dimensionality. The number of lay-

ers does not affect the results.

The method level variable misuse detection was

conducted on the full dataset. In this set of experi-

ments, we tested several GNN models: Hierarchical

Graph Transformer (HGT), Relational Graph Convo-

lutional Network (RGCN) (Schlichtkrull et al., 2018),

and CodeBERT. CodeBERT is a pre-trained model,

Table 1: Variable misuse detection at the node level with

GNN and 10% of dataset. The base rate of misused vari-

ables is 25%.

Dimension of vector

representations

Number of

layers

Accuracy

Test best Train best

100 5 0.836 0.87

100 3 0.83 0.88

100 5 0.83 0.84

100 3 0.837 0.88

50 5 0.836 0.85

50 3 0.82 0.82

50 5 0.82 0.84

50 3 0.82 0.86

30 5 0.82 0.84

30 3 0.82 0.85

30 5 0.81 0.82

30 3 0.82 0.84

that was trained using a large collection of source

code. GNN models are trained from scratch. Among

GNN models, RGCN achieves the best performance

when used together with attention pooling. Code-

BERT does not perform well before ﬁne-tuning. After

ﬁne-tuning, CodeBERT achieves far smaller error rate

than any of GNN models.

From Table 2 the comparison of different models

with different architectures is presented. The ﬁne-

tuned CodeBERT model outperforms all the other

models with a signiﬁcant margin. This could be be-

cause of the amount of information embedded in the

vector size of 768 compared to the one of GNN (max

300). Incorporating attention pooling improved the

performance of RGCN model since with 3 layers it

achieved better results, this could be explained by

the beneﬁts and power that comes with the attention

mechanism. On the other hand, U-net pooling which

has the mechanism slightly similar to attention and

designed for graph structured data seems to fall be-

hind attention pooling.

Based on our conducted experiments, we are in-

clined to conclude that the incorporation of transform-

ers (i.e. as pooling mechanism) is quite promising

in the task of variable misuse detection. There is

some still more room for improvements in the use of

GNNs for tasks such as variable misuse. When as-

sessing the quality of classiﬁcation models of incor-

rectly used variables, it was revealed that graph neural

networks cannot solve the task when using the source

code graph format and graph neural network architec-

tures speciﬁed in this paper.

5 CONCLUSION

In this paper, we present an analysis of the challenges

faced by graph neural networks. For testing models

Leveraging Transformer and Graph Neural Networks for Variable Misuse Detection

731

Table 2: Models comparison in variable misuse detection task evaluated at the method level. The base rate of functions with

misuse is exactly 50%.

Model

Dimension of vector

representations

Number of layers Pooling Method

Number of

training epochs

Accuracy

Final Test Test best

RGCN 100 3 Average Pooling 10 72.9 73.53

HGT 100 3 Average Pooling 10 66.12 66.44

HGT 300 5 Average Pooling 20 70.96 71.01

RGCN 300 5 Average Pooling 20 77.36 77.36

RGCN 300 5 U-net Pooling 20 76.83 76.83

RGCN 300 3 Attention Pooling 20 77.88 78.36

RGCN 300 3 U-Net Pooling 20 74.95 76

CodeBERT

with ﬁne-tuning

768 8 - 20 95.97 95.97

CodeBERT

without ﬁne-tuning

768 8 - 20 58.55 58.55

of graph neural networks, two variations of the prob-

lem of detecting misused variables are considered:

(1) classiﬁcation of individual variables and (2) clas-

siﬁcation of functions (or methods) for the presence

of incorrect variables in their bodies. The classiﬁca-

tion accuracy of functions for the presence of misused

variables was measured. Graph neural network mod-

els achieve signiﬁcantly lower classiﬁcation accuracy

than the CodeBERT model. For the future research,

we plan investigating the training process of graph

neural networks in order to improve the performance

by focusing on more recent graph format mainly HGT

with more optimized training parameters. In addition

to tuning HGT architecture for better performance in

variable misuse detection, we are aimed at minimiz-

ing the problem of input graph complexity for HGT

model.

ACKNOWLEDGEMENTS

The study was funded by a Russian Science Foun-

dation grant number 22-21-00493 https://rscf.ru/en/

project/22-21-00493/.

REFERENCES

Allamanis, M., Brockschmidt, M., and Khademi, M.

(2017). Learning to represent programs with graphs.

arXiv preprint arXiv:1711.00740.

Chirkova, N. (2020). On the embeddings of variables in

recurrent neural networks for source code. arXiv

preprint arXiv:2010.12693.

Devlin, J., Chang, M., Lee, K., and Toutanova, K. (2019).

BERT: pre-training of deep bidirectional transformers

for language understanding. In Burstein, J., Doran,

C., and Solorio, T., editors, Proceedings of the 2019

Conference of the North American Chapter of the As-

sociation for Computational Linguistics: Human Lan-

guage Technologies, NAACL-HLT 2019, Minneapolis,

MN, USA, June 2-7, 2019, Volume 1 (Long and Short

Papers), pages 4171–4186. Association for Computa-

tional Linguistics.

Feng, Z., Guo, D., Tang, D., Duan, N., Feng, X., Gong,

M., Shou, L., Qin, B., Liu, T., Jiang, D., and Zhou,

M. (2020). Codebert: A pre-trained model for pro-

gramming and natural languages. In Cohn, T., He,

Y., and Liu, Y., editors, Findings of the Association

for Computational Linguistics: EMNLP 2020, Online

Event, 16-20 November 2020, volume EMNLP 2020

of Findings of ACL, pages 1536–1547. Association for

Computational Linguistics.

Gao, H. and Ji, S. (2019). Graph u-nets. In international

conference on machine learning, pages 2083–2092.

PMLR.

Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O., and

Dahl, G. E. (2017). Neural message passing for quan-

tum chemistry. In International conference on ma-

chine learning, pages 1263–1272. PMLR.

Gori, M., Monfardini, G., and Scarselli, F. (2005). A new

model for learning in graph domains. In Proceedings.

2005 IEEE international joint conference on neural

networks, volume 2, pages 729–734.

Hellendoorn, V. J., Sutton, C., Singh, R., Maniatis, P., and

Bieber, D. (2019). Global relational models of source

code. In International conference on learning repre-

sentations.

Hu, Z., Dong, Y., Wang, K., and Sun, Y. (2020). Heteroge-

neous graph transformer. In Proceedings of the web

conference 2020, pages 2704–2710.

Kanade, A., Maniatis, P., Balakrishnan, G., and Shi, K.

(2020). Learning and evaluating contextual embed-

ding of source code. In International Conference on

Machine Learning, pages 5110–5121. PMLR.

Kudo, T. and Richardson, J. (2018). Sentencepiece: A sim-

ple and language independent subword tokenizer and

detokenizer for neural text processing. arXiv preprint

arXiv:1808.06226.

Ling, X., Wu, L., Wang, S., Pan, G., Ma, T., Xu, F.,

Liu, A. X., Wu, C., and Ji, S. (2021). Deep graph

matching and searching for semantic code retrieval.

ACM Transactions on Knowledge Discovery from

Data (TKDD), 15(5):1–21.

ENASE 2023 - 18th International Conference on Evaluation of Novel Approaches to Software Engineering

732

Liu, L., Nguyen, H., Karypis, G., and Sengamedu, S.

(2021). Universal representation for code. In Ad-

vances in Knowledge Discovery and Data Mining:

25th Paciﬁc-Asia Conference, PAKDD 2021, Virtual

Event, May 11–14, 2021, Proceedings, Part III, pages

16–28. Springer.

Raychev, V., Bielik, P., and Vechev, M. (2016a). Probabilis-

tic model for code with decision trees. ACM SIGPLAN

Notices, 51(10):731–747.

Raychev, V., Bielik, P., Vechev, M., and Krause, A. (2016b).

Learning programs from noisy data. ACM Sigplan No-

tices, 51(1):761–774.

Schlichtkrull, M., Kipf, T. N., Bloem, P., Van Den Berg,

R., Titov, I., and Welling, M. (2018). Modeling rela-

tional data with graph convolutional networks. In The

Semantic Web: 15th International Conference, ESWC

2018, Heraklion, Crete, Greece, June 3–7, 2018, Pro-

ceedings 15, pages 593–607. Springer.

Vasic, M., Kanade, A., Maniatis, P., Bieber, D., and

Singh, R. (2019). Neural program repair by jointly

learning to localize and repair. arXiv preprint

arXiv:1904.01720.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones,

L., Gomez, A. N., Kaiser, L., and Polosukhin, I.

(2017). Attention is all you need. In Guyon, I., von

Luxburg, U., Bengio, S., Wallach, H. M., Fergus, R.,

Vishwanathan, S. V. N., and Garnett, R., editors, Ad-

vances in Neural Information Processing Systems 30:

Annual Conference on Neural Information Processing

Systems 2017, December 4-9, 2017, Long Beach, CA,

USA, pages 5998–6008.

Leveraging Transformer and Graph Neural Networks for Variable Misuse Detection

733