
GraphQL has five default scalars: Int, Float,
String, Boolean, and ID. Additionally, users can de-
fine custom scalars. Typical examples include dates,
emails, and UUIDs.
2.2 Q-Learning
Reinforcement Learning (RL) is a type of machine
learning where an agent learns by interacting with
its environment(Richard S. Sutton, 2018). The agent
selects actions in various states and receives re-
wards from the environment based on those actions.
Through trial and error in these interactions—where
actions lead to changes in state—the agent learns be-
havior that maximizes cumulative reward.
Q-learning is a reinforcement learning algorithm
used to estimate an optimal action-value function (Q-
function)(Christopher J. C. H. Watkins, 1992). The
Q-function represents the sum of immediate rewards
obtained by an agent executing action a in state s, plus
future cumulative rewards expected when following
an optimal policy thereafter. These future rewards
are discounted by a discount factor γ. In Q-learning,
Q-values for specific actions taken in each state are
recorded in a Q-table and updated using Equation (1).
Q(s, a) ← Q(s, a) + α[r + γ max
a
′
Q(s
′
, a
′
) − Q(s, a)]
(1)
Here α is the learning rate; γ is the discount factor;
r is the immediate reward; s
′
is the state after action a;
and max
a
′
Q(s
′
, a
′
) represents the maximum Q-value
for actions a
′
available in state s
′
.
In reinforcement learning methods like ε-greedy
are often used to balance exploration and exploita-
tion(Richard S. Sutton, 2018). In Q-learning an ex-
ploration rate ε is set such that with probability 1 − ε,
actions with higher Q-values based on current knowl-
edge are chosen (exploitation), while with probability
ε, random actions are selected (exploration).
In automated API testing requests are sent to the
SUT, rewards are obtained based on responses which
are then used to update values.
2.3 API Testing
AutoGraphQL proposed by Zetterlund et
al.(Louise Zetterlund, 2022) is one existing ap-
proach for automated testing of GraphQL APIs.
This tool automatically generates test cases using
GraphQL queries operated by users in production
environments. Through oracles that verify whether
responses conform to GraphQL schemas, schema
violations can be detected.
Vargas et al.(Daniela Meneses Vargas, 2018) pro-
posed deviation testing for GraphQL APIs. This ap-
proach automatically generates tests with small devi-
ations from manually created test cases and detects
failures by comparing their execution results.
A property-based method for generating black-
box test cases from schemas was proposed by Karls-
son et al.(Stefan Karlsson, 2021). This method takes
a GraphQL schema as input and randomly gener-
ates GraphQL queries and argument values using
property-based techniques. It represents one of the
first studies aimed at fully automating test case gener-
ation for GraphQL APIs.
Additionally, Belhadi et al.(Asma Belhadi, 2024)
proposed methods using evolutionary computation
alongside random-based methods. If source code
for a GraphQL API is available and written in sup-
ported programming languages, test cases are gener-
ated through evolutionary search. For black-box test-
ing without source code access, random search is used
to generate queries. These approaches have been in-
tegrated into EvoMaster—an open-source tool for au-
tomated testing of APIs(Andrea Arcuri, 2021).
For REST APIs specifically, ARAT-RL uses
reinforcement learning for automated test-
ing(Myeongsoo Kim, 2023a). This method effi-
ciently explores vast input spaces in API testing by
dynamically analyzing request-response data using
Q-learning to prioritize API operations and param-
eters. Each API operation corresponds to a state;
selecting specific parameters or scalar value mapping
sources corresponds to actions handled through
separate Q-tables. Rewards based on response status
codes update Q-values accordingly.
3 METHOD
In this study, we propose GQL-QL, an adaptation of
ARAT-RL with improvements to make it suitable for
GraphQL APIs.
First, the schema of the SUT is retrieved using an
introspective query, and Q-values are assigned to each
field and argument for initialization. Next, requests
are generated by referencing the Q-values of fields
and arguments, prioritizing API operations while se-
lecting fields and arguments. Then, based on the con-
tent of the response, the rewards for each value are
updated, and requests are generated repeatedly.
The details of each step are described below.
Proposal of an Automated Testing Method for GraphQL APIs Using Reinforcement Learning
1103