SCWAD: Automated Pentesting of Web Applications
Natan Talon
4
, Val
´
erie Viet Triem Tong
1
, Gilles Guette
3
, Yufei Han
2
and Youssef Laarouchi
4
1
CentraleSup
´
elec, Rennes, France
2
Inria, Rennes, France
3
Universit
´
e de Rennes, Rennes, France
4
Hackuity, Lyon, France
Keywords:
Pentest Automation, Web Application.
Abstract:
A wide array of techniques and tools can be employed for web application security assessment. Some meth-
ods, such as fuzzers and scanners, are partially or fully automated, offering speed and cost-effectiveness.
However, these tools often fall short in detecting specific vulnerabilities like broken access control and are
prone to generating false positives. On the other hand, manual processes like penetration testing, though more
time-consuming and necessitating expertise, provide a more comprehensive risk assessment. To overcome the
limitations of automated tools, these techniques are frequently combined. Fuzzers and scanners, despite their
ease of use and quick results, require the expertise of penetration testing experts to address their limitations.
By integrating these approaches, a more robust and nuanced security assessment can be achieved. This article
presents SCWAD, an automated and customizable penetration testing framework designed to assess vulnera-
bilities in web applications.
1 INTRODUCTION
Penetration testing (pentest) is an audit technique em-
ployed to assess the security risk of an information
system, conducted by skilled security experts (pen-
testers). These pentesters not only identify vulner-
abilities within the targeted system but also exploit
these vulnerabilities to illustrate their potential impact
when woven into an attack scenario. In the realm of
web application pentesting, pentesters face a myriad
of frameworks, architectures, and programming lan-
guages that underpin these applications.
Effectively organizing web application pentesting
poses significant challenges, primarily twofold. First
web application faces to multiple type of vulnera-
bilities Second, a manually organized web applica-
tion pentest is a costly and time-consuming undertak-
ing.While automated tools like fuzzers and scanners
can assist in vulnerability assessments, their capabil-
ity is limited to a fraction of vulnerabilities
In response to existing limitations, our study
presents SCWAD as an automated pentesting frame-
work tailored for effective evaluation of web applica-
tions. The fundamental concept behind SCWAD lies
in framing automated web pentesting as a sequential
decision-making process.
The major contribution in our study can be sum-
marized in the following perspectives.
We conceptualize pentesting web applications as
a sequential decision-making challenge and intro-
duce SCWAD as an automated framework for web
application pentesting.
We structure the information acquired during a
pentest campaign in a knowledge base. We pro-
pose a method for highlighting the vulnerabilities
by querying this base.
We organise a comparative study between
SCWAD and the state-of-the-art practices of
fuzzers and scanners for web applications,
including Portswigger’s BurpSuitePro scan-
ner (PortSwigger, 2023), OWASP’s ZAP scan-
ner (OWASP, 2023b) and Wapiti’s fuzzer (Sur-
ribas, 2023).
We demonstrate that SCWAD is capable of han-
dling large modern applications. At the same
time, we point out that these applications are too
large to be manually pentested.
In the followings, Section 2 surveys the works in
pentesting tools related to our work. Section 3 de-
424
Talon, N., Tong, V., Guette, G., Han, Y. and Laarouchi, Y.
SCWAD: Automated Pentesting of Web Applications.
DOI: 10.5220/0012721000003767
Paper published under CC license (CC BY-NC-ND 4.0)
In Proceedings of the 21st International Conference on Security and Cryptography (SECRYPT 2024), pages 424-433
ISBN: 978-989-758-709-2; ISSN: 2184-7711
Proceedings Copyright © 2024 by SCITEPRESS Science and Technology Publications, Lda.
tails the different modules contituting SCWAD. We
further present how a pentesting task is organised in
an iterative way with the three modules of SCWAD.
Section 4, instantiates the exploration process of three
vulnerabilities on web applications, which demon-
strates the use of SCWAD for automated pentesting.
Then it compares SCWAD and three popular fuzzer /
scanner tools on dedicated web application and chal-
lenges SCWAD on online applications.
2 BACKGROUND
2.1 Web Applications and Their
Associated Risks
The term web application denotes application soft-
ware accessible through the web and executed via a
web browser. A web application can be a straight-
forward static website where content is delivered
exactly as stored on the server for each user. In
contrast, dynamic web applications are tailored to
individual users and rely on a three-tier architec-
ture, separating presentation, application processing,
and data management functions. Web applications
find utility across various domains, from basic email
clients to complex e-commerce platforms and online
games. Their widespread adoption is due to their user-
friendly interface and global accessibility, allowing
users to engage from anywhere. However, this ac-
cessibility also renders web applications vulnerable,
making them prime targets for attacks, ranging from
simple denial-of-service to more sophisticated cam-
paigns. Attacks on web applications can grant at-
tackers unauthorized access, potentially compromis-
ing the entire system. Furthermore, web applications
may serve as vectors for information leakage.
In 2021, the Open Web Application Security
Project (OWASP, 2021) released its updated top 10
risks associated with designing or implementing web
applications. Topping the list is the vulnerability of
Broken Access Control, a flaw enabling users to
act beyond their designated permissions without prior
authentication. Exploiting this flaw could result in
information disclosure, modification, or destruction
of essential business functions. Another critical vul-
nerability, ranking third, is ’Injection, allowing at-
tackers to manipulate the system’s interpretation of
commands. One such instance is the widely known
Reflected Cross-Site Scripting (XSS) attacks, occur-
ring when malicious code is sent to a victim end user
through a web application.
During a reflected XSS attack, victims are de-
ceived into executing a malicious payload, often em-
bedded in a link or a crafted form. The injected codes
appear legitimate, camouflaging themselves as pay-
loads from a trusted server, ultimately making the vic-
tim’s web browser the target. These attacks exploit
improper sanitization of user input within the web ap-
plication, allowing the injected code to end up in the
HTML output. Reflected server XSS exploits em-
ploy various techniques to bypass insufficient input
sanitization checks, often leading to significant data
leaks in web applications, as revealed by (Buyukkay-
han et al., 2020).
2.2 Vulnerability Hunting in Web
Applications
To bolster the security of web applications, a system-
atic search for vulnerabilities is often conducted. To
conduct this search, the literature (Zhang et al., 2022)
provides solutions of two types. First static analysis
that focuses on analyzing the source code of the web
application. This method offers the advantage of scru-
tinizing code without executing programs. Numerous
tools have been developed to address PHP scripts and
taint-based vulnerabilities, such as Pixy (Jovanovic
et al., 2006) and its object-oriented extension, OOP-
ixy (Nashaat et al., 2017), phpSAFE (Nunes et al.,
2015), and RIPS (Dahse and Holz, 2014). Pixy, for in-
stance, employs a flow-sensitive, interprocedural, and
context-sensitive data flow analysis to uncover vulner-
abilities like SQL injection, cross-site scripting, and
command injection, albeit with a false positive rate
of around 50%. phpSAFE serves as a source code
analyzer for PHP-based plugins, capable of detecting
Cross Site Scripting and SQL Injection vulnerabili-
ties. RIPS, on the other hand, utilizes the abstract
syntax tree of PHP scripts and employs backward-
directed taint analysis to identify taint-based vulnera-
bilities. In a benchmark study by (Nunes et al., 2018),
which featured 134 WordPress plugins with real vul-
nerabilities, Pixy and other static analysis tools were
assessed for XSS and SQLi vulnerabilities. The re-
sults indicated that there is not a one-size-fits-all tool
for all scenarios and classes of vulnerabilities.
Static analysis is inherently limited to a single
programming language, typically PHP or JavaScript
in the tools available today. This restriction means
that vulnerabilities in source codes written in other
programming languages go undetected by the static
analysis method. Furthermore, static analysis cannot
reveal values or content that only become apparent
through the dynamic analysis of the target web ap-
plication, such as cookie values.
The second type of solution is dynamic analysis
that focuses on discovering vulnerabilities while the
SCWAD: Automated Pentesting of Web Applications
425
web application is running and observes the applica-
tion’s output behavior in response to specific inputs.
These analyzers are often referred to as ’black-box
scanners’ (Kals et al., ; Drakonakis et al., 2023; Eriks-
son et al., 2021; Pellegrino et al., 2015) because they
assume that the application’s internals are not observ-
able. Traditional black-box scanners aim to enumer-
ate all reachable pages and then fuzz input data, in-
cluding URL parameters, form values, and cookies,
to provoke vulnerabilities. In their evaluation of 11
black-box web vulnerability scanners, (Doup
´
e et al.,
2010) emphasized the importance of deep crawling to
discover all vulnerabilities in an application. How-
ever, as noted in (Doup
´
e et al., 2012), these scanners
ignore a key aspect of modern web applications: any
request can change the state of the web application.
Doup
´
e et al’s work is credited with pioneering the
use of a state machine to guide state-aware fuzzing
of web applications, resulting in improved code cov-
erage compared to traditional scanners.
Finally, the security of web applications can be as-
sessed through pentest campaigns, which simulate at-
tacks to determine system security. Web pentesting is
typically conducted by human experts who manually
explore the application to identify vulnerabilities, of-
ten aided by one or more web application scanners. In
this context, measuring test coverage and reproducing
the test campaign can be challenging. In our study,
we echo to this challenge by proposing SCWAD as
an automated and customizable vulnerability explo-
ration tool. Security analyst/service owner of web ap-
plications can use SCWAD to provide a comprehen-
sive coverage of possible vulnerabilities, in order to
achieve an accurate vulnerability assessment. In the
next section, we describe our pentest’s modelisation.
3 SCWAD FRAMEWORK
We introduce SCWAD an autonomous pentesting
framework to spotlight vulnerabilities within web ap-
plications. SCWAD employs an attack-based assess-
ment strategy, simulating the role of an attacker. It
systematically explores potential vulnerabilities in a
given web application, exploiting identified weak-
nesses to trigger data breaches. Moreover, SCWAD
is equipped with the capability to register and replay
pentest campaigns.
SCWAD operates through three key components
that interact in a continuous loop.
Update knowledge base encompasses defining and
updating of a Knowledge Base specific to each web
application tested within SCWAD. This knowledge
base encapsulates insights gathered by a pentester
Figure 1: Components of SCWAD and their interaction.
across multiple sessions, assuming various user roles
(see Section 3.1). Act on the app involves an au-
tomated Agent responsible for devising and execut-
ing pertinent actions on the web application. This
agent’s primary objective is to comprehensively ex-
plore the application, attempting simulated attacks,
and enriching the knowledge base as extensively as
possible (see Section 3.2). Check for new vulnera-
bilities is defined by introducing a Verifier, which is
tasked with confirming the presence of vulnerabilities
within a web application (see Section 3.4). SCWADs
Agent and Verifier may require expert knowledge to
either build relevant actions or look for specific events
while checking vulnerabilities. These expert knowl-
edge are provided on query by the last component of
SCWAD, the Oracle (see Section 3.2). The SCWAD
process unfolds in structured rounds, with successive
calls to the Agent, Knowledge Base update, and Veri-
fier, as illustrated in Figure 1.
3.1 SCWAD Knowledge Base
In SCWAD, the knowledge base of a web application
stores in a structured format all information gathered
by SCWAD about the application.
A web application is represented by an IP address
or a domain name, enabling the identification of the
web application’s location and by the collection of in-
formation obtained while acting in various user roles
within the application. Each user u is further de-
scribed by:
u.login: Its user name
u.credentials. Its credentials for the application.
u.cookies. Its set of cookies, separated into active or
not.
u.pages. The pages already visited by the user and
those to which he/she knows a link. The record
SECRYPT 2024 - 21st International Conference on Security and Cryptography
426
pages provides details about the content of all of these
pages. A page is modeled in the database though the
web application’s path, the HTTP methods, headers,
and parameters used to access the page and the code
of the page, offering information about its interactive
elements such as clickable buttons, HTML forms, and
more. A page must have its path defined. However,
its access method and its can be unknown if the page
has not been visited yet; in such cases, their values are
set to none.
u.current page: the current page visited by the user.
u.allowed paths. all the links to the pages reach-
able by the user. These links are those found on the
pages as the application is explored under the user’s
account. It includes anchors’ references, images and
scripts’s sources and forms actions.
3.2 SCWAD Agent
The SCWAD agent is responsible for extending vul-
nerability exploration of web applications and en-
hancing the agent’s knowledge base regarding the
web applications to its fullest extent. To accomplish
this objective, the SCWAD agent undertakes three key
tasks: it compiles a list of potential actions via the
knowledge base’s current status. It selects an action
from this list and executes it. The execution of the
chosen action promptly initiates a knowledge base up-
date and triggers a call to the SCWAD verifier, con-
cluding the current round of exploration.
Set of Actions. The SCWAD agent supports a range
of potential actions based on four generic commands:
AccessWebPage, SendHttpForm, SearchData, and
SetCookie. Details regarding these commands, their
parameters, and anticipated outcomes can be found in
Table 1. An action is constructed using a command
and concrete values v
1
,··· , v
n
representing elements
like URLs, cookies, or specific form values. These
values are extracted from either the knowledge base
or expert values served by the oracle. SCWAD ora-
cle is simply a string generator designed to highlight
XSS vulnerabilities. It is largely inspired by the XSS
Filter Evasion Cheat Sheet (OWASP, 2023a) provided
by OWASP.
Finally the set of possible actions are determined
by all commands and all the possible combinations of
concrete values.
Pentesting Strategy. From the available actions
applicable in the current knowledge base state, the
SCWAD agent selects one to execute. This choice of
action determines the crawling strategy and, conse-
quently, the tool’s performance in vulnerability iden-
tification. In our work, we introduce four strategies.
Random selection where the agent picks an action at
random. This strategy serves as baseline comparison
for the three others. Explore first prioritizes actions
leading to accesses to new web pages. This strategy
tries to gather all possible navigation information be-
fore attempting any exploitation. Fill forms first pri-
oritizes actions leading to filling to new forms. This
strategy attempts to exploit injection points as soon as
they are found. Finally, vulnerability tracking con-
sists in following specific sequences of actions de-
signed from human pentesters behavior.
3.3 Update Knowledge Base
SCWAD performs actions on the current page of a
known user in the knowledge base. Execution of an
action succeeds when the server commits to perform,
on the contrary, an execution fails when the server re-
fuses to perform the action.
In case the action succeeds, SCWAD verifies if
the user changed (e.g. login action) and enrich the
database with the new knowledge (pages, cookies)
gathered by both potential users.
3.4 Checking Vulnerabilities
A pentest campaign conducted with SCWAD starts
with a knowledge base denoted as K
0
and progresses
to an updated knowledge base K
i
after executing a se-
quence of actions X : a
0
.. .a
i1
. SCWADs verifier
diligently checks for new vulnerabilities each time
there are changes in the knowledge base. In this pa-
per, we propose to translate the vulnerabilities into
Boolean formulas in first-order logic. When the eval-
uation of a such a formula holds true’ on a knowl-
edge base K
j
, we assert that the sequence of actions
from K
0
to K
j
has led to the discovery of a vulnera-
bility, formally characterized by the the formula. In
this study, we showcase how SCWAD evaluates po-
tential security risks in web applications. We focus
on exploring three vulnerability types as per (OWASP,
2021) standards: A01:2021-Broken Access Control
(referred to as BAC in the rest of the paper), XSS
injection, categorized under A03:2021-Injection, and
technical information disclosure (referred to as TID),
which falls under A05:2021-Security Misconfigura-
tion.
A Broken Access Control (BAC) vulnerability ap-
pear when a user can access data or functionality of
an other user when it should not. For example, in
a shop web application a user should not access the
cart of an other user. An reflected XSS vulnerability
appear when an attacker can provide javascript code
as inputs to the server instead of genuine data. This
code must then be reflected on the client and executed.
SCWAD: Automated Pentesting of Web Applications
427
Table 1: List of commands and corresponding parameters in SCWAD.
Command Parameters Expected behavior on the Web application
AccessWebPage u: url change current page for the current user
SendHttpForm x: xpath , input: dict { key:
value }
send the form identified by x and filled with input to
the web application
SearchData data: string return true if data has been found in the current page
SetCookie cookie: dict { key: value } add cookie to user’s cookies if this cookie doesn’t
exists and replace it otherwise
Finally, Technical Information Disclosure (TID) vul-
nerabilities refer to any technical information about
the server. This can take many forms such as service
banners indicating what technology is runs the server
or stack traces when the server encounter errors.
4 EXPERIMENTS
4.1 SCWAD Framework
The SCWAD framework is designed as a Python
package and use the playwright library (Microsoft,
2024) to instrument a web browser. Action selection
for the automated agent follows a predefined strategy,
implemented as functions returning elements from a
list of feasible actions. For pentesting, the knowl-
edge base can be initialized with only a starting URL,
a set of users (that can be empty) and the session
cookie name or a pre-filled database can be supplied.
SCWAD also includes a replay mode, where it reads
a filled database and replays each action sequentially.
In this mode, SCWAD creates a new knowledge base
in parallel to the input knowledge base. This allows
for ensuring the new knowledge base contains the
same information as the original knowledge base by
the end of the replay, and comparing vulnerabilities
found during the replay to the initial knowledge base
to detect effective patches. Furthermore, the replay
mode checks that actions it must do can be built from
the new knowledge base as the impossibility to do so
would indicate significant changes in the web appli-
cation.
4.2 Pentesting UVVU with SCWAD
We develop the UVVU (Talon et al., 2023) web ap-
plication to be used as a testbed to evaluate the effec-
tiveness of the proposed SCWAD framework
1
. This
application is poorly designed intentionally in order
to embed vulnerabilities and mimic the real-world
highly vulnerable applications.
1
SCWAD will be made available after the publication
Experimental Setup: In these experiments, we set
the three types of vulnerabilities in UVVU as de-
scribed in Section 3.4: 7 technical information dis-
closure, 2 broken access control and 1 XSS. We test
the 4 different strategies (Section 3.2) in the choice
of actions to perform. The first three strategies serve
as the baselines of the search strategies compared to
vulnerability tracking. All of the 4 search strategies
have the same pool of possible actions to make their
choice for each step of the pentest process. For these
strategies, an action is referred as a possible action to
execute, when the arguments of the command can be
filled with the records in the Knowledge Base or di-
rectly by the Oracle in the proposed system (see Fig-
ure 1). The strategies select one action from all the
possible actions to execute based on their own crite-
ria.
The first three search strategies directly select ac-
tions from the pool of all possible actions. In contrast,
the vulnerability tracking strategy uses sequences of
actions. These sequences are manually coded by hu-
man pentesters as a generalization, upon their ex-
periences of pentest, of sequences that may trig-
ger the different types of vulnerabilities (BAC, XSS,
TID). Moreover, these sequences were defined with-
out knowledge of the target web applications they
would be tested on. These sequences are referred as
action templates in the followings. For each step t of
the pentest process, the vulnerability tracking mod-
ule first compares the actions executed at the previous
two steps (t 1 and t) with the corresponding steps in
the action templates. If it can find an exact match in
the action templates, this module will use the imme-
diately next action at t + 1 suggested by the exactly
matched action template. Otherwise, it will adopt the
exploit-first selection strategy. We compare the first
three strategies to the vulnerability tracking module to
demonstrate that fuzzers (random strategy) and auto-
mated scanners (explore-first and exploit-first strate-
gies) cannot achieve the performances of pentesters
to exploit the vulnerabilities that require to execute
several actions in a chain.
We propose 5 metrics to measure the perfor-
mances of the four different search strategies applied
SECRYPT 2024 - 21st International Conference on Security and Cryptography
428
Table 2: SCWAD performances on UVVU application.
Strategy update successful failed Duration Success rate
actions actions
Random selection 55 54 6 4 min 50%
Explore first 69 68 4 5 min 70%
Exploit first 63 64 2 5 min 70%
Vulnerability tracking 73 74 17 6 min 100%
in our framework. First, we calculate the number of
updates to the knowledge base resulting from the ac-
tions (referred as update). It indicates how many
times new information were added to the knowledge
base after the execution of an action completed with
a success flag. More frequent updates correspond to
richer information about the target application, thus
indicating a better coverage over the potential vulner-
abilities in pentested application. With this setting, a
better search strategy is expected to induce more fre-
quent update, allowing security analysts gain a better
knowledge about the target application. Second, we
computed the number () of successful and failed ac-
tions. An action is considered successful when the
web application executes it, otherwise it is consid-
ered failed. We emphasize that either an action is exe-
cuted successfully or with a failure can equally bring
the pentester more knowledge about the target appli-
cation. For example, the success or failure of data
injection to an URL and/or form indicates that XSS
payload injection is feasible or infeasible to trigger
further vulnerabilities by injecting payloads into the
target server. This will guide the choice of the follow-
ing actions to execute. One search strategy is better
than another, if it can perform more actions (success-
ful or failed) within the time limit. Either way can im-
prove the pentester’s knowledge about the target web
application Third, we also gathered the execution time
needed by SCWAD.Finally, we check and record as a
success rate how many of the 10 vulnerabilities that
each strategy manages to unveil. Table 2 presents the
results of the four different strategies used in SCWAD
to pentest UVVU.
The random selection strategy randomly selects an
action from the list of possible actions, resulting in the
poorest performance with fewer states reached by the
knowledge base. Due to its inefficient exploration,
it can only identify fifty percent of the vulnerabili-
ties intended to be uncovered in the web application.
The vulnerabilities it failed to uncover were related
to technical information disclosure, which it couldn’t
reach due to inadequate state coverage.
The explore-first strategy navigates through all
permitted pages before attempting any exploitation,
while the exploit-first strategy immediately seeks to
exploit vulnerabilities upon encountering a poten-
tially exploitable state. This entails injecting forms
and accessing restricted pages whenever they are in-
corporated into the knowledge base. Both strategies
exhibited comparable performance, uncovering sev-
enty percent of the vulnerabilities. Their broader cov-
erage compared to the random selection strategy al-
lowed them to gather more insights into the web ap-
plication’s configuration, reaching a greater number
of states in the knowledge base. However, they also
missed instances of technical information disclosure.
The suboptimal coverage of these three strategies
underscores the significance of sequencing actions ef-
fectively in an uncontrolled environment, where each
action may precipitate irreversible consequences for
subsequent actions.
The vulnerability tracking strategy enhances its
repertoire of potential actions by not only scrutinizing
previously executed actions but also prioritizing them
based on the exploit-first selection strategy. This ap-
proach constructs a logical sequence of actions akin to
those performed by human pentesters, leading to vul-
nerability discovery. For instance, instead of directly
attempting injections into a form or HTTP parameter,
it initially searches for reflected values within these
elements. Moreover, it endeavors to bypass identified
server errors when exploiting vulnerabilities by ex-
perimenting with alternative payloads on the same in-
jection point. By reaching states inaccessible to other
strategies, this approach can uncover all vulnerabili-
ties, leveraging more sophisticated exploitation tech-
niques that provoke server errors in the UVVU appli-
cation, thereby disclosing technical information such
as stack traces. Figure 2 illustrates how this strategy
streamlines the array of potential actions compared to
others, resulting in superior outcomes. We empha-
size that the correlation between the actions taken in
the pentest process is crucial in efficiently discover-
ing vulnerabilities. In a pentest process, strategically
selecting actions that yield comprehensive informa-
tion about the target application in earlier steps can
swiftly limit the available actions in subsequent steps,
leading to efficient vulnerability discovery with mini-
mal actions. Consequently, an effective pentest search
method is anticipated to involve a minimal number
of actions throughout the process. Simultaneously,
this method should rapidly decrease the number of
SCWAD: Automated Pentesting of Web Applications
429
potential actions for later stages of the pentest. We
can note that vulnerability tracking strategy has the
highest number of failed actions, this is because this
strategy leverage such failures to cut short action se-
quences, participating in rapidly decreasing the num-
ber of possible actions. Let’s take two examples to
demonstrate how vulnerability tracking shrinks down
the number of potentially available actions. First, in
comparison to the explore-first strategy, trying to ex-
ploit an XSS may trigger server errors that would
redirect the client to an error page. This means that
failed exploitations may result in successful explo-
ration, avoiding the need to specifically explore for
these error pages. Second example compares to the
exploit-first strategy. Reflected XSS vulnerabilities
can only occurs if data entered by a user are reflected
(directly or after some transformation) by the server.
Then trying XSS injections on endpoints that does not
reflect any data is useless. While the exploit-first strat-
egy act like a fuzzer and tries every possible payload,
the vulnerability tracking strategy first check if any
data is reflected. If no data is reflected, the vulnera-
bility tracking strategy cut short the search of XSS on
that endpoint, effectively removing several possible
actions that would try to inject payloads.
4.3 Comparison with State-of-the-Art
Tools
Experimental Setup: To evaluate the performances
of SCWAD compared to the state-of-the-art tools, we
execute these tools against three different web appli-
cations. The first web application is Damn Vulnera-
ble Web Application (DVWA, 2023) designed to as-
sist security professionals in testing their skills and
tools. This web application encompasses most ma-
jor web application vulnerability types, offering three
difficulty levels. We conduct tests at the ”medium”
difficulty level for all the scenarios. However, DVWA
does not mirror a real-life web application accurately.
Vulnerabilities are entirely isolated from one another
and lack integration into authentic application fea-
tures. The second web application is WackoPicko,
introduced in (Doup
´
e et al., 2010). It simulates a gen-
uine commercial web application, akin to UVVU. Al-
though it mimics a real application, it employs differ-
ent technologies and was last updated in 2017. The
last web application we examined is UVVU, detailed
in Section 4.2. All three web applications implement
at least one vulnerability of each type: technical in-
formation disclosure (now referred as TID), broken
access control (BAC) and reflected XSS.
DVWA has 1 BAC, 18 reflected XSS and 17 TID.
WackoPicko has 1 BAC, 2 reflected XSS and 2 TID.
UVVU has 2 BAC, 1 reflected XSS and 7 TID. These
numbers where given by the authors of each web
application and we manually check for they are ex-
ploitable before running our experiment. They serve
as the ground truth of what vulnerabilities the differ-
ent tested tools should found when scanning or pen-
testing the applications.
We compare SCWAD to 3 state-of-the-art scan-
ners that will serve as baselines for our experiments.
The first tool is the scanner of BurpSuite Pro, a pro-
prietary software developed by PortSwigger that is
largely used by pentesters. The second is the scanner
of ZAP, an open sourced tool from the OWASP also
widely used by pentester. The last tool is Wapiti, an
open source web scanner (Surribas, 2023). All these
scanners were launched with their default configura-
tion and best effort were made to allow them to per-
form authenticated scans.
To assess the performance of both SCWAD and
the baseline scanners, we calculated three metrics.
Firstly, the True Positive Rate (TPR) represents the
ratio of correctly identified vulnerabilities to the total
number of expected alerts. The number of expected
alerts is determined by the pre-defined ground truth
of vulnerabilities for each web application, while true
positive alerts indicate instances where the tool cor-
rectly identifies a vulnerability corresponding to one
of the ground truth vulnerabilities. Secondly, the
False Positive Rate (FPR) measures the ratio of in-
correctly identified vulnerabilities to the total number
of expected alerts. False positive alerts are those that
do not correspond to any ground truth vulnerability.
Lastly, the False Negative Rate (FNR) represents the
ratio of missed vulnerabilities to the total number of
expected alerts. False negatives occur when the tool
fails to detect a ground truth vulnerability, resulting in
no alert being raised.
Each of the applied tool aims to detect the em-
bedded vulnerabilities as much as possible, within the
limited time of two hours. At the end of the experi-
ments, all the 4 tools were able to finish before the
time limit. However it is worth noting that the Wapiti
fuzzer failed to scan DVWA due to its mishandling of
anti-CSRF tokens during the login phase. The results
of each scan are presented in Table 3, 4 and 5.
The results indicate that SCWAD is the sole tool
to detect Broken Access Control vulnerabilities and it
does it with 100% TPR. SCWAD is designed to man-
age multiple users and compare their interactions, en-
abling it to identify such vulnerabilities. In contrast,
other tools are unable to unveil this type of vulnerabil-
ity as they only handle sessions to broaden their web
application crawling.
Regarding the reflected XSS vulnerabilities,
SECRYPT 2024 - 21st International Conference on Security and Cryptography
430
Figure 2: Number of possible actions built by SCWAD in each round for different strategies during a pentest over UVVU.
Table 3: XSS Detection: comparison between state-of-the-
art tools
means that the tool has crashed on the target app.
XSS detection
Target App DVWA Wackopicko UVVU
Existing vulns 18 2 1
Pentesting with BURPSUITE
TPR 100% 100% 100%
FPR 10% 33% 0%
FNR 0% 0% 0%
Pentesting with WAPITI
TPR 50% 100%
FPR 0% 0%
FNR 50% 0%
Pentesting with ZAP
TPR 61% 50% 0%
FPR 0% 0% 0%
FNR 39% 50% 100%
Pentesting with SCWAD
TPR 72% 100% 100%
FPR 0% 33% 0%
FNR 28% 0% 0%
SCWAD performs similarly to BurpSuite Pro (the tool
with highest TPR and lowest FNR on every appli-
cation). On DVWA, SCWAD has a highest FNR,
this discrepancy arises because SCWAD can not ac-
Table 4: Broken Access Control Detection: comparison be-
tween state-of-the-art tools.
Detection of BAC
Target App DVWA Wackopicko UVVU
Existing vulns 1 1 2
Pentesting with BURPSUITE
no broken access control discovered
Pentesting with WAPITI
crashed on DVWA
no broken access control discovered on others
Pentesting with ZAP
no broken access control discovered
Pentesting with SCWAD
TPR 100% 100% 100%
FPR 0% 0% 0%
FNR 0% 0% 0%
cess certain parts of the application, which is acces-
sible only through JavaScript events triggered by but-
ton clicks. This is a feature not yet implemented in
SCWAD. But, BurpSuite has a highest FPR while
SCWAD has 0% meaning BurpSuite generated false
positives as it detects a reflected XSS because it can
inject a script tag. But due to Content Security Policy,
the injected script is not executed, which demonstrate
SCWAD: Automated Pentesting of Web Applications
431
Table 5: Technical Information Disclosure: comparison be-
tween state-of-the-art tools
Detection of TID
Target App DVWA Wackopicko UVVU
Existing vulns 17 2 7
Pentesting with BURPSUITE
no technical information disclosure discovered
Pentesting with WAPITI
crashed on DVWA
no technical information disclosure discovered
Pentesting with ZAP
TPR 100% 100% 14%
FPR 0% 80% 0%
FNR 0% 0% 86%
Pentesting with SCWAD
TPR 18% 100% 100%
FPR 0% 0% 0%
FNR 83% 0% 0%
the effectiveness of this protection mechanism. In the
case of WackoPicko, ZAP and Wapiti fail to identify
all the reflected XSS, while SCWAD and BurpSuite
detect an extra one (FPR of 33%). The additional re-
flected XSS identified by SCWAD and BurpSuite is a
misclassified stored XSS.
Regarding technical information disclosure vul-
nerabilities, SCWAD outperforms the other tools
overall. BurpSuite and Wapiti can not generate re-
ports for such vulnerabilities. ZAP yields inconsis-
tent results with a TPR of 100% on DVWA but only
14% on UVVU and a FPR of 80% on WackoPicko
(ZAP mistakes stack traces for absolute paths on the
server due to a directory traversal vulnerability). Even
though it has a low TPR on DVWA (due inaccessi-
ble parts of the web application and TID in errors
that were not triggered), SCWAD triggers errors with
stack traces on UVVU that ZAP does not identify.
Our experiment demonstrates that SCWAD can
identify BAC vulnerabilities that the other tools miss.
Additionally, SCWAD produces fewer reports than
traditional scanners, focusing solely on genuine vul-
nerabilities and avoiding false positives.
We also tested these different tools on the ginand-
juice.shop (gin, ) to see how they behave on an uncon-
trolled web application. Ginandjuice.shop is a web
application design by PortSwigger (Burpsuite’s edi-
tors) to test scanners. Because scanners are not able to
detect broken access control vulnerabilities, this web
site does not implement any. It does not implement
technical information disclosure as well, so only XSS
vulnerabilities remain to be found. Obviously Burp-
suite find all XSS vulnerabilities but Wapiti found
none (0% TPR) and ZAP only one (25% TPR) and
generate a false positive (25% FPR). On the other
hand SCWAD find two of the four XSS vulnerabil-
ities (50% TPR). The missed ones are because the
four payloads generated by the Oracle do not trigger
the vulnerability, hence enriching the Oracle would be
enough to find these vulnerabilities.
4.4 Pentesting Real World Applications
with SCWAD
Before testing SCWAD on real world applications, we
tried it on truly black boxed web applications. To do
so we launched it against challenges web applications
of the (RootMe, 2023) platform. SCWAD discovered
stored XSS on 4 different challenges.
To scan real world web applications without dis-
turbing them, we prevented SCWAD from performing
possibly harmful actions. This means, we removed
any call to the Oracle and so any payload injection.
This way, we were only looking for Broken Access
Control and Technical Information Disclosure vulner-
abilities. In addition to removing payload injections,
we limited the rate of SCWAD (at most 1 action ev-
ery 0.8 seconds) to avoid overloading the server and
prevent unwanted denial of service.
We selected 3 web applications on which we were
able to create two accounts. We then have three users
on each web application, two different authenticated
users and the unauthenticated one. This is necessary
to compare what these users can access and find BAC
vulnerabilities if any.
In one hour we are able to execute between 250
and 300 actions depending on the loading time of the
web site.
Within more time, we can execute more actions
but we were limited by servers protections that limited
our rate (766 actions max within 7 hours). We are able
to get very huge amount of pages (2.1K, 3.9K) which
make our processing time very high, but our way of
defining pages is necessary to find BAC. To reduce
our processing time, we believe we need to do page
clustering and ”understand” the data.
We find a technical information disclosure on one
web application (service banner). This result is not
surprising as such real world web applications are
usually pentested and patched before being released
in production. Technical information disclosure vul-
nerabilities are not critical vulnerabilities and are of-
ten not considered harmful by application owners that
may not put effort in patching them.
SECRYPT 2024 - 21st International Conference on Security and Cryptography
432
Ethical Considerations. During the realization of
this work, we don’t upload any payloads to the target
real-world web applications. We only submit queries
and observe the returned value. Our aim is to eval-
uate whether SCWAD can be used in practices and
avoid poisoning the real-world web applications. We
unveil a TID vulnerability in the Crazygames applica-
tion. We have shared this unveiling with the applica-
tion owner, explaining our experiments are designed
for scientific research only.
5 CONCLUSION
In this study, we have introduced SCWAD, an auto-
mated web application pentesting framework. Cen-
tral to our approach is the structured and quantifi-
able representation of a target web application’s el-
ements, referred to as the knowledge base in this con-
text. The introduction of the knowledge base con-
cept brings forth significant advantages. Firstly, it
enables the pentest process to be modeled as a se-
quential decision-making problem. SCWAD incor-
porates an automated pentest agent that selects vi-
able vulnerability exploration actions based on the at-
tributes defined in the knowledge base. Additionally,
the pentest agent within SCWAD can enhance the un-
derstanding of the target web application by updating
attribute values in the knowledge base, thereby guid-
ing subsequent vulnerability exploration actions. Sec-
ondly, the design of the knowledge base allows vul-
nerabilities to be encoded as logic expressions involv-
ing the attributes of the knowledge base. This feature
facilitates interaction between the automated pentest
agent and human oracles; the agent can assess po-
tential vulnerabilities’ feasibility by matching knowl-
edge base attributes with encoded logic expressions
representing vulnerability signatures. Looking ahead,
our future research aims to integrate reinforcement
learning-based agents, enhancing adaptability and ef-
ficiency in vulnerability exploration. The pentest
policies learned through interactions with diverse web
applications can empower human security analysts
to uncover novel vulnerability exploitation methods.
This knowledge, in turn, can inform proactive mea-
sures for strengthening the security posture of target
web applications.
REFERENCES
Ginandjuice.shop. https://ginandjuice.shop.
Buyukkayhan, A. S., Gemicioglu, C., Lauinger, T., Oprea,
A., Robertson, W., and Kirda, E. (2020). What’s in an
exploit? an empirical analysis of reflected server XSS
exploitation techniques. In RAID 2020.
Dahse, J. and Holz, T. (2014). Simulation of built-in php
features for precise static code analysis. In NDSS
2014.
Doup
´
e, A., Cavedon, L., Kruegel, C., and Vigna, G. (2012).
Enemy of the state: A state-aware black-box web vul-
nerability scanner. In USENIX Security.
Doup
´
e, A., Cova, M., and Vigna, G. (2010). Why johnny
can’t pentest: An analysis of black-box web vulnera-
bility scanners. In Detection of Intrusions and Mal-
ware, and Vulnerability Assessment, pages 111–131.
Drakonakis, K., Ioannidis, S., and Polakis, J. (2023). Res-
can: A middleware framework for realistic and robust
black-box web application scanning. In NDSS.
DVWA (2023). Damn vulnerable web application.
https://github.com/digininja/DVWA.
Eriksson, B., Pellegrino, G., and Sabelfeld, A. (2021).
Black widow: Blackbox data-driven web scanning. In
IEEE S&P 2021.
Jovanovic, N., Kruegel, C., and Kirda, E. (2006). Pixy: a
static analysis tool for detecting web application vul-
nerabilities. In IEEE S&P 2006.
Kals, S., Kirda, E., Kruegel, C., and Jovanovic, N. Secu-
bat: A web vulnerability scanner. In World Wide Web
Conference.
Microsoft (2024). Playwright.
Nashaat, M., Ali, K., and Miller, J. (2017). Detecting secu-
rity vulnerabilities in object-oriented php programs. In
SCAM 2017.
Nunes, P., Medeiros, I., Fonseca, J. C., Neves, N., Correia,
M., and Vieira, M. (2018). Benchmarking static anal-
ysis tools for web security. IEEE Transactions on Re-
liability.
Nunes, P. J. C., Fonseca, J., and Vieira, M. (2015). php-
SAFE: A Security Analysis Tool for OOP Web Appli-
cation Plugins. In DSN 2015.
OWASP (2021). Open source foundation for application
security top 10 - 2021. https://owasp.org/Top10/.
OWASP (2023a). Owasp.
https://cheatsheetseries.owasp.org/cheatsheets/XSS
Filter Evasion Cheat Sheet.html.
OWASP (2023b). Zed attack proxy (zap).
https://www.zaproxy.org/.
Pellegrino, G., Tsch
¨
urtz, C., Bodden, E., and Rossow, C.
(2015). J
¨
Ak: Using dynamic analysis to crawl and
test modern web applications. In RAID 2015.
PortSwigger (2023). Burp suite’s web vulnerability scanner.
https://portswigger.net/burp/vulnerability-scanner.
RootMe (2023). Root me. https://www.root-me.org.
Surribas, N. (2023). Wapiti. https://wapiti-
scanner.github.io/.
Talon, N., Viet Triem Tong, V., Guette, G., and Han, Y.
(2023). Uvvu. https://github.com/scwaduvvu/uvvu.
Zhang, B., Li, J., Ren, J., and Huang, G. (2022). Efficiency
and Effectiveness of Web Application Vulnerability
Detection Approaches. ACM Computing Surveys.
SCWAD: Automated Pentesting of Web Applications
433