Scrooge: Detection of Changes in Web Applications to Enhance Security

Testing

Fabio B

usser

, Jan Kressebuch

, Mart

ın Ochoa

, Valentin Zahnd

and Ariane Trammell

Institute of Computer Science, Zurich University of Applied Sciences, Winterthur, Switzerland

Secuteer GmbH, Zurich, Switzerland

Keywords:

Security Testing, Black-Box Testing, Software Evolution.

Abstract:

Due to the complexity of modern web applications, security testing is a time-consuming process that heavily

relies on manual interaction with various analysis tools. This process often needs to be repeated for newer

versions of previously tested applications, as new functionalities frequently introduce security vulnerabilities.

This paper introduces scrooge, a tool that automates change detection in web application functionality to

enhance the efﬁciency and focus of the security testing process. We evaluate scrooge on various platforms,

demonstrating its ability to reliably detect a range of changes. Scrooge successfully identiﬁes different types

of changes, showcasing its applicability across diverse scenarios with high accuracy.

1 INTRODUCTION

The ever-evolving landscape of web applications

presents a continuous challenge for security testing.

Among various testing strategies, black-box testing

is commonly performed, where the source code of a

web application is unknown, simulating external at-

tackers (Bau et al., 2010). In black-box testing, a

combination of automatic and manual tasks are of-

ten employed to achieve high coverage of the ap-

plication under test, as automatic crawling has well-

known limitations (Doup

e et al., 2010). Additionally,

a combination of manual and automatic testing is typ-

ically needed to test for both well-known vulnerabili-

ties (such as various injection types) and logical vul-

nerabilities that are inherent to the application and are

more difﬁcult to automate (Pellegrino and Balzarotti,

2014).

Testing web applications is time-consuming and

resource-intensive. Furthermore, when an application

evolves due to new functionality, the security testing

process needs to be repeated. It has been observed

that changes to software tend to introduce new vulner-

abilities to existing versions (Mitropoulos et al., 2012;

Williams et al., 2018). Therefore, it is advantageous

for security testers to quickly identify which parts of

a web application have changed since the last security

testing, to prioritize their efforts accordingly.

This paper introduces scrooge, a tool designed to

automatically identify changes between two versions

of the same web application in a black-box fashion.

By automating change detection, scrooge has the po-

tential to signiﬁcantly improve the efﬁciency and ef-

fectiveness of security testing.

We propose a data-gathering architecture aimed at

abstracting key interfaces and the state of a web ap-

plication. The resulting graph data structure allows

us to perform various change analyses, from parame-

ter changes to changes in HTML or JSON responses,

to changes in how a given page is visually rendered.

We evaluate our approach on three web applications

(two popular e-shop applications and one application

used in security training and teaching) and measure

its accuracy with respect to changes, either artiﬁcially

introduced or resulting from enabling new functional

modules. Our evaluation shows that scrooge can pre-

cisely detect changes without false positives. Reach-

ing all changes may depend on the selection of the

crawling strategy, which we also evaluate using two

popular crawlers and manual crawling. Our tool is

open source (Kressebuch and B

usser, 2024).

The rest of this paper is structured as follows: We

introduce our approach in Section 2. We then dis-

cuss our implementation in Section 3. Section 4 con-

tains our evaluation. We discuss limitations and fu-

ture work in Section 5. Related work is summarized

in Section 6, and we conclude in Section 7.

2 APPROACH

The overall integration of scrooge in the penetration

testing tool ﬂow is shown in Fig. 1. A penetration

Büsser, F., Kressebuch, J., Ochoa, M., Zahnd, V. and Trammell, A.

Scrooge: Detection of Changes in Web Applications to Enhance Security Testing.

DOI: 10.5220/0013139600003899

In Proceedings of the 11th International Conference on Information Systems Security and Privacy (ICISSP 2025) - Volume 2, pages 48-59

ISBN: 978-989-758-735-1; ISSN: 2184-4356

tester performs a test of a web application in a given

version x. If a new version x + 1 is available, she uses

scrooge to determine the changes between the version

she tested last and the new version. Scrooge analyses

the versions in question and determines the changes

which can be represented as an annotated graph or a

text report. The penetration tester reviews the change

analysis and plans the test of the new version to anal-

yse the detected changes.

While this approach has the potential to increase

efﬁciency in the security testing, it is crucial that

changes are detected reliably. If a change would not

be detected the penetration tester might ignore rele-

vant new functionality and therefore a vulnerability

might remain undetected. Also, this approach must be

as precise as possible, because false positives would

need to be manually inspected and discarded, which

would require additional work.

Penetration

Tester

scrooge

Web Application

Version: x

Web Application

Version: x+1

Detect changes (x: x+1)

Pentest

Analyse

Changes

Pentest (Changes)

Figure 1: Integration of scrooge in Penetration Testing

Flow.

In the following we describe the architecture of

scrooge and how it was designed to accurately detect

various families of changes in a black-box fashion.

2.1 Architecture

The overall architecture of scrooge is shown in Fig. 2.

HAR Files

Crawler Proxy Target

HAR Files

Request Request

Response

Request

Dump

HAR Files

Snapshots

Changes

as Graphs

Changes

as Text

Filter

Change

Detection

Change

Detection

scrooge

Figure 2: Design Overview of the Overall System.

Functionality in modern web applications is com-

plex, given that typically one part of the logic is de-

ﬁned in the front-end (via JavaScript) and another

part on the back-end or external services (via syn-

chronous or asynchronous requests). In order to iden-

tify changes between two versions of a web applica-

tion, ideally we should be able to exercise as much

of the applications’ logic as possible. Since a manual

navigation by a human would be time consuming and

possibly incomplete, it is natural to involve an auto-

matic crawler in this process. This crawler navigates

the website independently and tries to reach as many

endpoints as possible. Note however that our archi-

tecture is independent from the type of crawler used

and is compatible with manual crawling as we will

see in Section 4.

The crawler accesses the target web application

with a proxy in order to dump all requests and re-

ceived answers. This data is stored in so called HAR-

Files (HTTP Archive format), which is a JSON for-

matted archive ﬁle that stores all data relevant for the

interaction of a web browser with a web site. This

data is then analysed and ﬁltered to only keep rele-

vant information in a data structure suitable for the

analysis of different versions of the same web appli-

cation.

This data structure is a graph where nodes are

aggregated URLs and edges represent links or asyn-

chronous calls to other URLs. Moreover for each

node we store sent parameters (if any) and responses.

Those graphs are referred to as snapshots. Two snap-

shots can be compared by using various metrics to

detect changes. The detected changes are stored as

text or as graphical ﬁles which can be used by a

penetration tester to identify the changed areas of a

web application. The comparison algorithms to de-

tect changes are described in the next section.

2.2 Detection of Changes Between

Snapshots

In order to detect different types of changes in a web

application, several detection mechanisms need to be

combined. Scrooge combines approaches from the

following three different areas:

• Changes in Parameters and Requests. It is anal-

ysed whether the crawler generates different re-

quests to the web application indicating that the

scope of the web application changed.

• Changes in the Graphical Representation.

Changes are detected by comparing screenshots

of supposedly similar web pages.

• Changes in the HTML Structure. The structure

of supposedly similar HTML pages are compared

to detect differences.

The comparison algorithms used are described in Sec-

tion 3. At the basis of the comparison algorithms

we use well known visual, textual and tree compar-

ison algorithms that we brieﬂy recap here. Note

that these algorithms will give a similarity score and

Scrooge: Detection of Changes in Web Applications to Enhance Security Testing

are not binary (equal or different). This is a com-

mon practice when assessing the similarity of web

sites (Mallawaarachchi et al., 2020) given that a strict

comparison (i.e. using hashes of HTML) would po-

tentially yield false positives in common situation

such as dynamic content (banners, state of database).

Non-binary scores allow users to conﬁgure thresholds

and adjust accuracy to various scenarios. We never-

theless assume that in a penetration testing scenario,

the application under tests in different versions has

more or less a stable or minimalist database content

(for instance an e-shop has more or less the same in-

ventory in the two versions), as we will discuss in the

evaluation in Section 4.

DHash. The dHash algorithm (difference hash)

(Krawetz, 2013) is a hashing function for recognizing

image similarities. It converts an image into a hash

value that represents the visual appearance of the im-

age. This hash value can be used to quickly compare

images to determine if there is a similarity.

Jaccard Similarity. The Jaccard similarity (Jadeja,

2022) describes the similarity of two sets based on

the intersection divided by the union of the elements

of both sets. A high overlap between the intersection

and the union results in a high match, while a low

overlap results in a low match.

Tree Edit Distance. The tree edit distance describes

the similarity of two data structures arranged as a tree.

Starting from the root node, the tree to be compared is

searched for differences in the underlying nodes and

the steps required to reach the target state are calcu-

lated.

3 IMPLEMENTATION

We implemented scrooge as a prototype in python

which is available as an open source project (Kresse-

buch and B

usser, 2024). The following sections de-

scribe the most important implementation aspects.

3.1 Crawler and Proxy

As discussed previously, in order to detect as many

changes as possible, it is important to fully navigate

the web application under test. Therefore, in our work

we implemented scrooge with the following three

web crawler approaches. In order to prepare for our

evaluation, we have chosen state of the art crawlers,

both following a deterministic and a randomized strat-

egy, and manual crawling. Note that scrooge can be

integrated with any other crawler.

• CrawlJax (Mesbah et al., 2012) is a determin-

istic crawler that automatically analyzes user in-

terface state changes in the web browser. The

crawler scans the DOM tree of the application and

identiﬁes elements that can potentially change the

state of the application. These elements are then

automatically activated by triggering events such

as clicks. This gradually creates a state machine

that models the various navigation paths and states

within the web application.

• Black Widow (Eriksson et al., 2021) is a random-

ized crawler based on a data-driven approach to

crawling and scanning web applications. The tool

addresses three central pillars for in-depth crawl-

ing and scanning: Navigational Modeling, Traver-

sal, and State Dependency Tracking. The effec-

tiveness of Black Widow is illustrated by signiﬁ-

cant improvements in code coverage compared to

other crawlers, with coverage between 63% and

280% higher depending on the application (Eriks-

son et al., 2021).

• Manual Crawling was used as a baseline to

evaluate the performance of the two automated

crawlers.

As a proxy server, we use mitmproxy (@maxim-

ilianhils et al., 2013) that saves all observed connec-

tions as a HAR ﬁle.

3.2 Detection of Changes

In the snapshot data structure, nodes are aggregated

URLs. We have chosen to aggregate URLs with the

same identiﬁer in order to abstract the state of the ap-

plication. We are aware that this abstraction may have

an impact in terms of precision depending on how the

application under tests is coded, but it has advantages

in terms of efﬁciency. In the following, an example

of the generation of an identiﬁer for a GET method is

shown.

GET− R e q u e s t :

h t t p s : / / exa mp le . domain / p r o d u c t ? c a r t =2& q u a n t i t y =5



GET/ p r o d u c t ? c a r t&q u a n t i t y

The detection of changes between snapshots is

performed by a set of comparison algorithms. As in-

troduced in 2.2, the different approaches can be di-

vided into metrics that are based on changes in the pa-

rameters and requests, changes that are based on the

graphical representation and changes that are based

on the structure of the HTML ﬁle. In the following

paragraphs we describe the chosen metrics more pre-

cisely.

ICISSP 2025 - 11th International Conference on Information Systems Security and Privacy

3.2.1 Detection of Changes in Parameters and

Requests

The following metrics were used to analyze changes

in the parameters or in the requests themselves.

• ParameterChange is a method for detecting

changes by comparing the parameters sent in dif-

ferent requests. It works by creating an alphabet-

ically sorted list of parameters for each snapshot

and then comparing the lists. If the lists differ,

it is considered a change. This method is useful

because changes in the parameters often indicate

changes in the way the application is working.

• MissingRequestChange is a method for de-

tecting changes by identifying requests that are

present in one snapshot but not another. It works

by iterating over all requests in Snapshot 1 and

checking if each request is also present in Snap-

shot 2. If a request is not found in Snapshot 2,

it is considered a change. This method is useful

for identifying changes in the ﬂow of a web appli-

cation, as the removal of a request can indicate a

change in the way the application is structured.

• NewRequestChange is a method for detecting

changes by identifying requests that are present in

one snapshot but not another. It works by iterat-

ing over all requests in Snapshot 2 and checking if

each request is also present in Snapshot 1. If a re-

quest is not found in Snapshot 1, it is considered

a change. This method is useful for identifying

changes in the ﬂow of a web application, as the

addition of a request can indicate a change in the

way the application is structured.

• AsyncRequestsChange is a method for detecting

changes by identifying differences in the execu-

tion of asynchronous requests associated with a

static page. It does not consider the parameters

or responses of asynchronous requests but instead

focuses on whether new requests are being exe-

cuted or previous requests are no longer being ex-

ecuted. This can indicate that new features have

been added to the corresponding static page.

• AsyncRequestParamChange is a method for de-

tecting changes by identifying differences in the

JSON schema of asynchronous request param-

eters. It works by generating a JSON schema

from the request body (Content-Type JSON) for

each snapshot and then comparing the schemas

using a standard library (xlwings, 2024). If dif-

ferences are found, it is considered a change.

Only asynchronous requests with the Content-

Type application/json are supported. This

method is useful for identifying changes in the

data that is being passed between the web appli-

cation and the client, as changes in the request pa-

rameters can indicate changes in the way the ap-

plication is interacting with the user.

• AsyncResponseChange is a method for detect-

ing changes by identifying differences in the

JSON schema of asynchronous response bod-

ies. It works by generating a JSON schema

from the response body (Content-Type JSON) for

each snapshot and then comparing the schemas

(again using (xlwings, 2024)). If differences

are found, it is considered a change. Only

asynchronous responses with the Content-Type

application/json are supported. This method

is useful for identifying changes in the data that

is being returned from the web application to the

client, as changes in the response body can indi-

cate changes in the way the application is provid-

ing information to the user.

3.2.2 Detection of Changes in the Graphical

Representation

The following metric was used to detect changes in

the graphical representation of a webpage.

• The DHashStructureChange detects changes by

comparing the visual structure of a static request

using a difference hashing algorithm. It works by

loading the HTML structure for a request with an

identical identiﬁer from both snapshots into a web

browser and taking a screenshot of the window.

The resulting hash from the images is then com-

pared, and the change rate is determined based on

the difference in the hash. The value N = 8 was

chosen for calculating the DHash.

3.2.3 Detection of Changes in the HTML

Structure

The following two metrics were used to detect

changes in the structure of the HTML code.

• JaccardStructureChange is a method for detect-

ing changes in web application functionality by

comparing the HTML structure using the Jaccard

Similarity algorithm. It works by creating a set for

each HTML document from Snapshot 1 and Snap-

shot 2, and then comparing the sets using the Jac-

card Similarity algorithm to determine a change

rate.

• TreeDifferenceStructureChange is a method for

detecting changes in web application functional-

ity by comparing the HTML structure using a tree

edit distance algorithm. It works by converting the

Scrooge: Detection of Changes in Web Applications to Enhance Security Testing

HTML structure into a node tree for each snap-

shot and then traversing the trees synchronously.

An edit distance is counted throughout the traver-

sal. For each node, it is checked whether there

is a corresponding node in the opposite snapshot.

If the corresponding node is missing, the edit dis-

tance is increased by one. Similarly, the number

of child tags is checked, and if there is a differ-

ence, the edit distance is also increased by this

difference. This method is useful for identify-

ing changes in the hierarchical structure of a web

page, as changes in the HTML structure can indi-

cate changes in the organization of the content on

the page.

4 EVALUATION

In the previous sections, we introduced the architec-

ture and overall approach of scrooge. Given the com-

plexity of modern web applications, accurately esti-

mating the performance of our methodology is chal-

lenging. Our change detection features could po-

tentially produce false positives (being too sensitive

to apparent changes) or false negatives (failing to

detect certain types of functional changes). There-

fore, in this section, we apply our methodology to

two versions of three different popular web applica-

tions, which offer sufﬁcient complexity to preliminar-

ily evaluate our approach in terms of accuracy.

Note that following choices have been made for

the three applications: the state of the application

database has been preserved between the two versions

of an application as much as possible (modulo neces-

sary changes for the new functionality). We have de-

cided not to set a predetermined conﬁguration on the

thresholds of the non-binary similarity metrics, but to

manually inspect any detected difference by scrooge

and assess whether it was the result of an intended

change in the new version or a false positive.

4.1 Evaluation on WordPress &

WooCommerce

WooCommerce (WooCommerce, 2024) is a widely-

used open-source e-commerce plugin for WordPress.

It allows users to create and manage online stores

within their WordPress sites, offering features such

as product management, order processing, payment

gateways, and shipping options. WordPress (version

6.4.3) with WooCommerce (version 8.7.0) as the in-

stalled e-commerce plugin was used as the test ap-

plication. For the evaluation, the installation was

cloned and targeted functional changes were made

to the cloned version. Both instances have the same

version numbers and the same data status. The un-

changed version is hereinafter referred to as Original,

while the modiﬁed version is referred to as Modiﬁed.

The following is a list of introduced changes in order

to evaluate the effectiveness of scrooge at detecting

them.

C1 - Additional Parameter in POST Request “Add to

Cart” The form used to add a product to the cart was

supplemented with an additional parameter.

Goal: Detect parameter changes in static requests

(POST)

C2 - Additional Parameter in AJAX Request “Add to

Cart” The asynchronous addition to the cart was sup-

plemented with an additional parameter.

Goal: Detect parameter changes in asynchronous re-

quests

C3 - Additional Field in AJAX Response “Add to

Cart” The response for the asynchronous “Add to

Cart” was supplemented with a new ﬁeld.

Goal: Detect parameter changes in asynchronous re-

quests

C4 - Additional Step in Checkout Process The normal

checkout process proceeds as follows:

Cart −→ Checkout −→ Order Conﬁrmation

In Modiﬁed, an additional step was added.

Cart −→ Cross-Selling −→ Checkout −→ Order

Conﬁrmation

Goal: Detect a ﬂow change in a process

C5 - New AJAX Request on Product Page Product

pages now send an additional asynchronous request.

Goal: Detect new AJAX requests

C6 - Template Adjustment on Product Page The

HTML structure on the product page was adjusted.

New HTML elements were added to the “Add to Cart”

form.

Goal: Detect a structural change

C7 - Sample Page “Sample Page” Deleted The con-

tent page “Sample Page” was deleted in Modiﬁed.

Goal: Detect that content/functions are missing

Snapshot Comparison and Interpretation Figure 3 il-

lustrates the output of our tool when comparing the

two versions of the web application. Green nodes

represent new functionality and red nodes removed

functionality. Table 1 summarizes the introduced

changes vs. the change detection algorithms that were

able to detect them. All changes C1-C7 were de-

tected by one or more change detection algorithms.

For instance, C6 “Template Adjustment on Prod-

uct Page” triggered a JaccardStructureChange,

a TreeDifferenceStructureChange and a visual

DHashStructureChange on values above the pre-

ICISSP 2025 - 11th International Conference on Information Systems Security and Privacy

GET/

TreeDifferenceStructureChange 0.02

GET/cart/

TreeDifferenceStructureChange 0.02

GET/shop/

TreeDifferenceStructureChange 0.02

GET/my-account/

TreeDifferenceStructureChange 0.02

GET/checkout/

TreeDifferenceStructureChange 0.02

GET/sample-page/

MissingRequestChange

GET/product/example/

AsyncRequestsChange

JaccardStructureChange 0.02

TreeDifferenceStructureChange 0.05

DHashStructureChange 0.02

GET/cross-selling/

NewRequestChange

POST/my-account/

TreeDifferenceStructureChange 0.02

GET/my-account/lost-password/

TreeDifferenceStructureChange 0.02

GET/wp-login.php?redirect_to&reauth

MissingRequestChange

GET/product-category/studium/

TreeDifferenceStructureChange 0.02

POST/product/example/

AsyncRequestsChange

JaccardStructureChange 0.02

TreeDifferenceStructureChange 0.05

ParamChange

DHashStructureChange 0.02

GET/wp-admin/admin-ajax.php?action

NewRequestChange

POST/?wc-ajax

AsyncRequestParamChange

AsyncResponseChange

POST/my-account/lost-password/

NewRequestChange

GET/wp-admin/admin-ajax.php?action

NewRequestChange

Figure 3: Evaluation Graph WordPress Original - Modiﬁed (CrawlJax). Green nodes represent new functionality and red

nodes removed functionality.

conﬁgured thresholds.

In sum, all changes were detected and no false

positives where found, that is, all indicators of change

could be associated to one of the changes C1-C7.

From a crawling perspective, for the WooCommerce

webshop, both CrawlJax and Black Widow worked

well. Some ﬁne-tuning was required to ensure the

complete run of the crawlers. For example, CrawlJax

gets stuck on the WordPress search form. If this is re-

moved, CrawlJax runs completely through all pages.

4.2 Evaluation on PrestaShop

PrestaShop (version 8.1.5) is a popular e-commerce

platform (PrestaShop, 2024). In order to evaluate

our approach, a copy of a webshop was created, with

some new modules activated in the cloned version.

The unchanged version is hereinafter referred to as

Original, while the modiﬁed version is referred to as

Modiﬁed. The goal is now to check if the newly active

modules can be detected using the comparison algo-

rithms.

Activated Modules in the modiﬁed version The fol-

lowing modules were activated in the modiﬁed ver-

sion in order to test the change detection capabilities

of our approach. Note that the changes introduced are

therefore not under our control, which should repre-

sent a more realistic situation compared to the previ-

ously described evaluation on Wordpress.

M1 Wishlist Allows customers to create wishlists to

save their favorite products for later

M2 Customer “Sign in” link Allows customers to eas-

ily register for the webshop

M3 Contact form Adds a contact form to the contact

page

M4 Newsletter subscription Allows customers to sign

up for the newsletter in the footer of the webshop

Snapshot Comparison and InterpretationThe re-

sult of running scrooge on both version is de-

picted in Fig. 4. For instance, the modiﬁcation

M3 (Contact form) was detected at GET/contact-

us using JaccardStructureChange, TreeDifference-

StructureChange, and DHashStructureChange. The

new request POST/contact-us also reveals the new

form on the contact page. Figure 5, illustrates the

before and after rendering of this website, which ex-

plains why DHashStructureChange has detected this

change.

Insights PrestaShop The newly introduced modules

in the modiﬁed version were identiﬁed accurately.

Since these are entirely new modules and no exist-

ing functions were modiﬁed, the NewRequestChange

and AsyncRequestsChange functions are particularly

useful in this case.

As with the evaluation with WooCommerce in the

previous subsection, automatic testing using crawlers

proved challenging. The constant crashes or hang-ups

Scrooge: Detection of Changes in Web Applications to Enhance Security Testing

Table 1: Evaluation of detected changes.

Evaluation of introduced changes

Change C1 C2 C3 C4 C5 C6 C7 False-Positives

MissingRequestChange X 0

NewRequestChange X 0

AsyncRequestsChange X 0

JaccardStructureChange X X 0

DHashStructureChange X 0

TreeDifferenceStructureChange X X X 0

ParamChange X 0

AsyncRequestParamChange X 0

AsyncResponseChange X 0

Table 2: Evaluation of found changes depending on crawler used.

Evaluation Crawler

Change C1 C2 C3 C4 C5 C6 C7 False Positives

CrawlJax X X X X X X X 0

BlackWidow X X X X X X X 0

Manual Crawling X X X X X X X 0

GET/

AsyncRequestsChange

JaccardStructureChange 0.23

TreeDifferenceStructureChange 0.08

DHashStructureChange 0.08

GET/contact-us

AsyncRequestsChange

JaccardStructureChange 0.35

TreeDifferenceStructureChange 0.08

DHashStructureChange 0.41

GET/module/blockwishlist/action?action

NewRequestChange

POST/module/ps_emailsubscription/subscription

NewRequestChange

GET/3-clothes

AsyncRequestsChange

JaccardStructureChange 0.19

TreeDifferenceStructureChange 0.07

DHashStructureChange 0.02

GET/new-products

AsyncRequestsChange

JaccardStructureChange 0.21

TreeDifferenceStructureChange 0.08

DHashStructureChange 0.1

GET/module/blockwishlist/action?action

NewRequestChange

POST/contact-us

NewRequestChange

GET/module/blockwishlist/action?action

NewRequestChange

GET/men/1-hummingbird-printed-t-shirt.html

AsyncRequestsChange

JaccardStructureChange 0.14

TreeDifferenceStructureChange 0.07

DHashStructureChange 0.11

GET/cart?action

AsyncRequestsChange

JaccardStructureChange 0.21

TreeDifferenceStructureChange 0.08

DHashStructureChange 0.12

GET/module/blockwishlist/action?action

NewRequestChange

GET/order

JaccardStructureChange 0.01

TreeDifferenceStructureChange 0.07

POST/cart

POST/module/ps_shoppingcart/ajax

GET/module/blockwishlist/action?action

NewRequestChange

GET/module/blockwishlist/action?action

NewRequestChange

GET/login?back

NewRequestChange

POST/login?back

NewRequestChange

GET/password-recovery

NewRequestChange

POST/password-recovery

NewRequestChange

GET/registration

NewRequestChange

Figure 4: Evaluation Graph PrestaShop Original - Modiﬁed (Manual Crawling).

of the crawlers require numerous adjustments to the

crawler as well as the test application. No clear pat-

tern was identiﬁed as to where the crawlers fail. Ul-

timately, once those issues were ﬁxed however, both

crawlers were able to run through the entire shop and

detect all changes without false positives.

4.3 Evaluation on DVWA

To test the functionality of change detection on gen-

eral websites, a local version of the Damn Vulnera-

ble Web Application (DVWA) (dvw, 2010) was in-

stalled. This application illustrates various vulnera-

bilities through customizable difﬁculty levels. Each

difﬁculty level has a slightly different implementation

ICISSP 2025 - 11th International Conference on Information Systems Security and Privacy

Table 3: Evaluation of Changes per Module.

Evaluation of Types of Changes

Type of Change M1 M2 M3 M4 False-Positives

MissingRequestChange 0

NewRequestChange X X X X 0

AsyncRequestsChange X X 0

JaccardStructureChange X X X 0

DHashStructureChange X X X 0

TreeDifferenceStructureChange X X X 0

ParamChange 0

AsyncRequestParamChange 0

AsyncResponseChange 0

Table 4: Evaluation of Detected Modules by Crawler.

Evaluation of Crawlers

Type of Change M1 M2 M3 M4 False-Positives

CrawlJax X X X X 0

BlackWidow X X X X 0

Manual Crawling X X X X 0

(a)

(b)

Figure 5: Comparison “Contact us” (a) M3 deactivated (b)

M3 activated.

which makes it suitable for evaluating our approach.

Thus, for testing purposes, the modiﬁed functionality

was deﬁned by adjusting the difﬁculty level.

Insights from DVWA Crawler Comparison The

DVWA application was crawled three times. Initially,

manual interaction was used with the application, and

the resulting requests were collected via the proxy.

In the second step, Black Widow was used as the

crawler, followed by CrawlJax in the third step.

Using CrawlJax as the crawler for DVWA did not

yield the desired results in the tests. CrawlJax consis-

tently terminated without accessing any subpages. In

tests using Black Widow, the crawler navigated au-

tonomously through the menus. The website navi-

gation with Black Widow visually conﬁrmed access-

ing all subpages accessible in the website’s naviga-

tion menu. However, visible forms were only partially

ﬁlled out and submitted, thus not all functionalities of

the website were tested with Black Widow.

A baseline was formed from the union of all de-

tected request identiﬁers in Snapshots Snapshot 1,

Snapshot 2, and Snapshot 3. The baseline was de-

rived from manual interaction, crawling with Black

Widow, and crawling with CrawlJax. Table 5 (in

Appendix) displays the existing URLs in the snap-

shots. The resulting baseline includes 35 URLs,

with 34 URLs reached through manual interaction.

Crawling with Black Widow as the crawler reached

23 URLs. It became apparent that Black Widow

reached all subpages, but did not test all forms, thus

missing requests for form submissions in Snapshot

2. The request GET/instructions.php?doc was

reached in the crawl with Black Widow, but not in

manual interaction. Crawling with CrawlJax as the

crawler resulted in coverage of only three URLs, in-

cluding the login page GET/login.php, the landing

page GET/index.php, and the home page GET/.

Due to Black Widow’s inability to correctly ﬁll

Scrooge: Detection of Changes in Web Applications to Enhance Security Testing

out all forms, it was decided to manually navigate

DVWA for evaluating differences between security

levels Low and High. It is expected that precisely the

changes in behavior during form submission will pro-

vide insights into the changes in the application’s im-

plementation. Therefore, as many existing forms as

possible must be ﬁlled out for evaluation purposes.

In a direct comparison of security levels Low and

High, several differences in the website’s behavior

were identiﬁed as depicted in Fig. 6.

Insights DVWA. The automatically identiﬁed differ-

ences by scrooge between the security levels Low and

High were manually analyzed in order to determine

their accuracy. All differences could be explained by

functionality changes between these two levels. For

instance the page GET/vulnerabilities/sqli/ was imple-

mented fundamentally differently in both security lev-

els. While an input ﬁeld was used directly for input

at security level Low, at level High a link points to a

pop-up window that contains the input mask for the

ID. This leads to structural changes in the direct com-

parison and to new requests that open and send the

pop-up window. From a manual inspection of the in-

spected functionality, it all seems scrooge was able

to detected most changes between the two security

level. However we did not have a ground truth def-

inition of all changes implemented, so there could be

potentially false negatives in the automatic analysis.

5 DISCUSSION

As we have seen in the previous section, scrooge ex-

hibits promising capabilities for detecting changes in

the functional scope of web applications. However,

several limitations hinder its broader applicability and

require further investigation.

Abstraction of Functionality and Application’s

State. By construction, our graph data structure ag-

gregates URLs with the same identiﬁer but differ-

ent parameter values. Depending on the application’s

logic this abstraction may affect the precision of our

analysis. Similarly, the state of persistent storage may

affect accuracy of a comparison (i.e. an empty e-

shop vs. a shop with several items in inventory).

We believe however that our preliminary evaluation

is promising in the sense that for the evaluated appli-

cations the achieved accuracy of the comparison was

high. A more thorough evaluation on other applica-

tions and architectures constitutes interesting future

work.

Crawler Challenges: The current reliance on ex-

ternal crawlers introduces a signiﬁcant limitation.

These crawlers exhibit inconsistencies in perfor-

mance across different web applications. Some func-

tion ﬂawlessly, while others struggle entirely. To

address this, future work should explore several av-

enues. Combining multiple crawlers can leverage

their strengths and mitigate weaknesses by creating

snapshots from various tools, ensuring comprehen-

sive analysis. Developing a dedicated crawler op-

timized for scrooge, particularly for detecting spe-

ciﬁc change types and handling Single Page Applica-

tions (SPAs), would enhance precision and effective-

ness. Additionally, utilizing parallel crawling tech-

niques can signiﬁcantly reduce analysis time, making

scrooge more efﬁcient for large-scale applications by

running multiple crawlers simultaneously for exten-

sive coverage in a shorter period.

SPA Integration Hurdles: Single-page applications

(SPAs) pose a unique challenge due to their dynamic

nature. Scrooge currently struggles to effectively cap-

ture changes within SPAs. To improve this, iden-

tifying page changes within a Single Page Applica-

tion (SPA) is crucial. This requires developing logic

in the crawling process to detect when a new page

loads, which can be achieved by continuously moni-

toring the current URI and its history to recognize ad-

justments triggered by AJAX requests. Additionally,

exploring alternative storage solutions beyond HAR

ﬁles for capturing the dynamic state of SPAs is nec-

essary. Evaluating the feasibility and effectiveness of

these alternative storage methods will be essential for

improving the analysis process.

Addressing these limitations constitutes interest-

ing future work, as well as a more comprehensive

evaluation of the tool’s precision.

6 RELATED WORK

There are various research topics closely related to

our work such as change detection, software evolu-

tion, as web crawling, and automated black-box test-

ing. In the following we give an overview of the

works most related to ours and how we compare

against them.

Change Detection in Web Pages. Closest

to our work are studies and implementations

of change detection in web pages (see for in-

stance (Mallawaarachchi et al., 2020) for a survey in

this domain). This line of work has been inspired

by the practical need to track changes in web sites

to get notiﬁcations related to important updates (i.e.

news, government announces etc.) or potential at-

tackes (i.e. defacements). Today there exist sev-

eral closed-source change detection services, such as

Google Alerts (Google, 2024) and some open source

ICISSP 2025 - 11th International Conference on Information Systems Security and Privacy

GET/login.php

GET/index.php

GET/vulnerabilities/brute/

JaccardStructureChange 0.05

TreeDifferenceStructureChange 0.02

GET/vulnerabilities/brute/?username&password&Login

MissingRequestChange

GET/vulnerabilities/brute/?username&password&Login&user_token

NewRequestChange

GET/vulnerabilities/view_source.php?id&security

TreeDifferenceStructureChange 0.56

DHashStructureChange 0.42

GET/vulnerabilities/view_help.php?id&security

GET/vulnerabilities/csrf/

JaccardStructureChange 0.05

TreeDifferenceStructureChange 0.02

GET/vulnerabilities/fi/?page

GET/vulnerabilities/captcha/

JaccardStructureChange 0.04

TreeDifferenceStructureChange 0.03

GET/vulnerabilities/sqli/

JaccardStructureChange 0.11

TreeDifferenceStructureChange 0.03

DHashStructureChange 0.03

GET/vulnerabilities/sqli_blind/

JaccardStructureChange 0.11

TreeDifferenceStructureChange 0.03

DHashStructureChange 0.02

GET/vulnerabilities/xss_r/

GET/vulnerabilities/xss_d/

GET/vulnerabilities/exec/ GET/vulnerabilities/upload/

GET/vulnerabilities/weak_id/

GET/vulnerabilities/xss_s/

GET/vulnerabilities/javascript/

TreeDifferenceStructureChange 0.01

GET/phpinfo.php

POST/vulnerabilities/exec/

GET/vulnerabilities/csrf/?password_new&password_conf&Change

MissingRequestChange

GET/vulnerabilities/csrf/?password_new&password_conf&Change&user_token

NewRequestChange

POST/vulnerabilities/upload/

DHashStructureChange 0.02

GET/vulnerabilities/sqli/?id&Submit

MissingRequestChange

GET/vulnerabilities/sqli/session-input.php

NewRequestChange

GET/vulnerabilities/sqli_blind/?id&Submit

MissingRequestChange

GET/vulnerabilities/sqli_blind/cookie-input.php

NewRequestChange

POST/vulnerabilities/weak_id/

GET/vulnerabilities/xss_d/?default

GET/vulnerabilities/xss_r/?name

TreeDifferenceStructureChange 0.01

GET/vulnerabilities/csp/asdf

MissingRequestChange

POST/vulnerabilities/javascript/

TreeDifferenceStructureChange 0.01

GET/about.php

GET/

POST/vulnerabilities/xss_s/

TreeDifferenceStructureChange 0.03

GET/vulnerabilities/csp/

AsyncRequestsChange

JaccardStructureChange 0.19

TreeDifferenceStructureChange 0.03

DHashStructureChange 0.05

GET/vulnerabilities/csp/source/jsonp.php?callback

NewRequestChange

POST/vulnerabilities/csp/

MissingRequestChange

GET/vulnerabilities/csp/nope

MissingRequestChange

POST/vulnerabilities/sqli/session-input.php

NewRequestChange

POST/vulnerabilities/sqli_blind/cookie-input.php

NewRequestChange

Figure 6: Evaluation Diagram DVWA Crawling with Low and High Level.

ones such as changedetection.io (dgtlmoon, 2024).

However note that different from that line of work,

we are concerned about changes in an application as

a whole, and not only on individual websites. The

graph data structure deﬁned in our work is more gen-

eral and allows one to reason for instance on changes

in navigation paths, and thus on more abstract appli-

cation control ﬂows. Last, our implementation can

run locally, which is important in order to guarantee

the privacy of the system under test.

Software Evolution. Software evolution research

provides valuable insights into understanding soft-

ware changes and predicting future development.

D’Ambros et al.’s (D’Ambros et al., 2008) work

demonstrates a detailed approach for analyzing soft-

ware repositories to gain insights into software evo-

lution. Mitropoulos et al.’s (Mitropoulos et al., 2012)

work highlights the importance of tracking software

evolution to identify security-related bugs. However

most works in this domain are white-box approaches

that assume the source code is known, whereas our

work treats the application under test as a black-box.

Web Crawling. As illustrated in our work, effec-

tiveness of web crawling is crucial for identifying

changes in web applications. Stafeev and Pellegrino’s

(Stafeev and Pellegrino, 2024) work provides a com-

prehensive survey of web crawling algorithms and

their effectiveness for web security measurements. In

our work, we do not claim a contribution in the crawl-

ing domain, since the proposed approach in this paper

is designed to be independent of the speciﬁc crawler

used.

Automated Black-Box Testing. Automated black-

box testing approaches, such as EvoMaster (Arcuri,

2021) and RestTest-Gen (Corradini et al., 2022), pro-

vide techniques for testing RESTful APIs. However,

these methods typically require API documentation.

This work addresses this limitation by generating API

documentation for the identiﬁed endpoints, enabling

more comprehensive black-box testing.

Overall, the proposed approach extends existing

research by combining web crawling, change detec-

tion and software evolution concepts, and black-box

testing techniques to effectively detect changes in web

application functionality, particularly in the context of

black-box testing scenarios.

7 CONCLUSION

In this work, we present scrooge, a prototype tool de-

signed to identify changes in web application func-

tionality. Scrooge aims to improve security test-

ing efﬁciency by detecting differences between web

page versions. We evaluated scrooge on e-commerce

platforms and a security application, demonstrating

its ability to reliably detect changes. The effective-

ness however, relied on the data generation method.

Manual interaction provided the most consistent re-

sults, highlighting the need for crawler optimization.

Furthermore, scrooge successfully identiﬁed various

change types, showcasing its applicability across di-

verse scenarios. Our ﬁndings suggest that combin-

ing crawling methods could improve coverage, and

future work should focus on crawler optimization and

single-page application compatibility. Additionally,

implementing more differentiation algorithms could

increase the number of detectable webpage features.

Overall, scrooge demonstrates the potential of auto-

matic change detection for improved security testing

efﬁciency, laying a foundation for further research

and development in this domain.

Scrooge: Detection of Changes in Web Applications to Enhance Security Testing

ACKNOWLEDGEMENTS

DeepL and ChatGPT were used to translate some sec-

tions of this work. Gemini was used to shorten the

text.

REFERENCES

(2010). Damn vulnerable web application (DVWA). http:

//www.dvwa.co.uk/. Accessed: 2024-07-01.

Arcuri, A. (2021). Automated black- and white-box testing

of restful apis with evomaster. IEEE Software, 38.

Bau, J., Bursztein, E., Gupta, D., and Mitchell, J. (2010).

State of the art: Automated black-box web applica-

tion vulnerability testing. In 2010 IEEE symposium

on security and privacy, pages 332–345. IEEE.

Corradini, D., Zampieri, A., Pasqua, M., and Ceccato, M.

(2022). Resttestgen: An extensible framework for au-

tomated black-box testing of restful apis.

D’Ambros, M., Gall, H., Lanza, M., and Pinzger, M.

(2008). Analysing software repositories to understand

software evolution.

dgtlmoon (2024). changedetection.io. https://github.com/

dgtlmoon/changedetection.io. Accessed: 2024-07-01.

Doup

e, A., Cova, M., and Vigna, G. (2010). Why johnny

can’t pentest: An analysis of black-box web vulner-

ability scanners. In International Conference on De-

tection of Intrusions and Malware, and Vulnerability

Assessment, pages 111–131. Springer.

Eriksson, B., Pellegrino, G., and Sabelfeld, A. (2021).

Black widow: Blackbox data-driven web scanning.

volume 2021-May.

Google (2024). Google alerts. https://www.google.com/

alerts. Accessed: 2024-07-01.

Jadeja, M. (2022). Jaccard similarity made simple: A be-

ginner’s guide to data comparison. Accessed: 2024-

05-26.

Krawetz, N. (2013). Kind of like that. The Hacker Factor

Blog.

Kressebuch, J. and B

usser, F. (2024). Scrooge source code.

https://github.com/secuteer/scrooge.

Mallawaarachchi, V., Meegahapola, L., Madhushanka, R.,

Heshan, E., Meedeniya, D., and Jayarathna, S. (2020).

Change detection and notiﬁcation of web pages: A

survey. ACM Computing Surveys (CSUR), 53(1):1–

35.

@maximilianhils, @raumfresser, and @cortesi (2013).

mitmproxy/mitmproxy.

Mesbah, A., Deursen, A. V., and Lenselink, S. (2012).

Crawling ajax-based web applications through dy-

namic analysis of user interface state changes. ACM

Transactions on the Web, 6.

Mitropoulos, D., Gousios, G., and Spinellis, D. (2012).

Measuring the occurrence of security-related bugs

through software evolution. In 2012 16th Panhellenic

Conference on Informatics, pages 117–122.

Pellegrino, G. and Balzarotti, D. (2014). Toward black-box

detection of logic ﬂaws in web applications. In NDSS,

volume 14, pages 23–26.

PrestaShop (2024). Prestashop. https://prestashop.com/.

Accessed: 2024-07-01.

Stafeev, A. and Pellegrino, G. (2024). Sok: State of the

krawlers-evaluating the effectiveness of crawling al-

gorithms for web security measurements.

Williams, M. A., Dey, S., Barranco, R. C., Naim, S. M.,

Hossain, M. S., and Akbar, M. (2018). Analyzing

evolving trends of vulnerabilities in national vulner-

ability database. In 2018 IEEE International Con-

ference on Big Data (Big Data), pages 3011–3020.

IEEE.

WooCommerce (2024). Woocommerce. https://

woocommerce.com/. Accessed: 2024-07-01.

xlwings (2024). jsondiff. https://github.com/xlwings/

jsondiff. Accessed: 2024-07-03.

ICISSP 2025 - 11th International Conference on Information Systems Security and Privacy

APPENDIX

Table 5: Found Request Identiﬁers per Snapshot in DVWA.

URLs Manual BlackWidow CrawlJax

GET/ X X X

GET/about.php X X

GET/index.php X X X

GET/instructions.php?doc X

GET/login.php X X X

GET/phpinfo.php X X

GET/vulnerabilities/brute/ X X

GET/vulnerabilities/brute/?username&password&Login X X

GET/vulnerabilities/captcha/ X X

GET/vulnerabilities/csp/ X X

POST/vulnerabilities/csp/ X

GET/vulnerabilities/csp/nope X

GET/vulnerabilities/csrf/ X X

GET/vulnerabilities/csrf/?password new&password conf&Change X

GET/vulnerabilities/exec/ X X

POST/vulnerabilities/exec/ X

GET/vulnerabilities/ﬁ/?page X X

GET/vulnerabilities/javascript/ X X

POST/vulnerabilities/javascript/ X

GET/vulnerabilities/sqli/ X X

GET/vulnerabilities/sqli/?id&Submit X

GET/vulnerabilities/sqli blind/ X X

GET/vulnerabilities/sqli blind/?id&Submit X

GET/vulnerabilities/upload/ X X

POST/vulnerabilities/upload/ X X

GET/vulnerabilities/view help.php?id&security X

GET/vulnerabilities/view source.php?id&security X

GET/vulnerabilities/weak id/ X X

POST/vulnerabilities/weak id/ X

GET/vulnerabilities/xss d/ X X

GET/vulnerabilities/xss d/?default X

GET/vulnerabilities/xss r/ X X

GET/vulnerabilities/xss r/?name X

GET/vulnerabilities/xss s/ X X

POST/vulnerabilities/xss s/ X X

35 URLs 34 URLs 23 URLs 3 URLs

97.1% 65.7% 8.6%

Scrooge: Detection of Changes in Web Applications to Enhance Security Testing