described in more detail in this subsection.
There are different approaches to compare the
contents of responses for similarity. One option is
to directly compare the contents of two responses for
equality. While very fast and simple, this has the ma-
jor drawback that it can’t detect even the smallest dif-
ferences between two responses. In practice, when
sending the same request twice, there’s often no guar-
antee that the responses are exactly the same because
of, e.g., dynamic elements such as server-generated
CSRF tokens or timestamps. Therefore, a better ap-
proach is required to cope with such differences.
An approach that can cope with small differences
is to only compare the relevant parts of the con-
tents for equality. For example, when comparing two
HTML documents, one would first remove all ele-
ments from the documents that are not interesting for
comparison (e.g., meta tags, static content, scripts,
footers, etc.) and then compare the stripped docu-
ments. The benefit of this solution is that it allows
to filter out all elements that could taint a compari-
son for equality. The big drawback, however, is that
one has to define and maintain a list of elements that
should be excluded or allowed. In practice, this is
difficult and time consuming, especially as there are
major differences between different web frameworks.
We therefore use a more general approach that
does not require application-specific configuration.
Instead of comparing two contents directly, fuzzy
hashes of each content are computed and compared
for similarity. To do this, two different fuzzy hash-
ing algorithms are used, ssdeep (Kornblum, 2006)
and tlsh (Oliver et al., 2013). These algorithms were
chosen because tlsh has been proven to perform well
when comparing HTML documents (Oliver et al.,
2014) while ssdeep is one of the best known and most
widely used fuzzy hashing algorithms.
The two fuzzy hashing algorithms are used as fol-
lows when comparing two contents: First, the ssdeep
and tlsh hashes are computed of both contents and
compared with each other. If the resulting compari-
son score is above a certain configurable threshold for
either ssdeep or tlsh, the contents are processed by a
filtering step where certain elements that are not rele-
vant for the comparison are removed. Then, the fuzzy
hashes are computed again, this time of the filtered
contents. If the comparison scores for both ssdeep and
tlsh are above the threshold, the contents are consid-
ered similar. Otherwise, they are considered different.
To illustrate this procedure, we again use the ex-
ample scenario introduced in Section 3.5 where U
1
and U
2
are two sellers in an e-shop that both can
access a resource that lists their own products. The
contents of the HTTP responses they get are differ-
ent and now it must be determined how similar these
two contents are to get a final vulnerability verdict. In
step one, the fuzzy hashes of both contents are com-
puted and compared with each other. Let’s assume
that the two contents differ mainly in terms of the tex-
tual information about the listed products, while most
of the other page elements are the same in both cases
due to standard web page components such as scripts,
navigation, header, footer, etc. Therefore, it is quite
likely that the comparison score is above the threshold
for at least one of the fuzzy hash algorithms, which
would wrongly indicate a vulnerability. To prevent
such false positives, the second step is used, where as
many standard components as possible are removed
from the contents. In the case of HTML content, a
small list of predefined tags is used to define what el-
ements should be removed, which currently contains
scripts and meta tags. This list is based on an analysis
of our test applications (see Section 4) and is subject
to change once more data is available. As a result
of this removal, the actual differences (the included
products) in the two contents are now more apparent,
which results in fuzzy hashes that are much less sim-
ilar than before and which therefore most likely will
result in the correct verdict of not vulnerable.
3.7 Testing of Multiple Users and Roles
The workflow described above can be used to de-
tect access control vulnerabilities based on two users.
However, there are often more than two users and
roles that should be considered. To support this, the
workflow is simply used repeatedly for each pair of
users or roles that should be tested, where one of the
users can also be the anonymous (unauthenticated)
user. For example, if an application provides three
user roles administrator (A), vip user (V) and stan-
dard user (S), and if the anonymous user (Y) should
also be considered and one wants to find vulnerabili-
ties between each pair of distinct users or roles, then
six runs of the entire workflow would be done based
on the pairs (A,V), (A,S), (A,Y), (V,S), (V,Y) and (S,Y).
Note that if one of the two users is the anonymous
user, the entire workflow basically works as described
above, but crawling is only done with one authenti-
cated user and the both users content filter is omitted.
3.8 Configuration Example
To give an idea about the configuration that is required
for the solution to detect vulnerabilities based on two
users, Figure 5 shows the default configuration file.
The first two sections target and auth are self-
explanatory and must be specifically configured. As
Automated Black Box Detection of HTTP GET Request-based Access Control Vulnerabilities in Web Applications
209