e l i f r e . f i n d a l l ( r ” I F (1 = 1 , ’ t r u e ’ , ’ f a l s e ’ ) ” , s t r ( row ) ) :
# D e t e c t i f
Through the expressions illustrated in Listing 1,
expressions such as 1=1 or ’a’=’a’ that are often
used, in the Boolean Based Blind SQL technique are
quickly identified, allowing to alert the administrator
of a possible injection attack in progress. However,
as mentioned, there are several ways to carry out SQL
injection attacks, in this way, to increase the success
of detection, a second check, the queries, is performed
in cases where the first scan does not detect any mali-
cious expression.
As already mentioned, the first check was de-
signed to identify the most used SQL injection tech-
niques more quickly. The second verification is based
on the concept of fuzzy string matching (Kuruvilla,
2022; Kalyanathaya et al., 2019; Kostanyan, 2017)
for the detection of malicious expressions. This con-
cept corresponds to the identification of two strings,
string characters or entries that are approximately
similar but are not exactly the same. It can be imple-
mented using different techniques, in our solution, the
Levenshtein Distance(Sharma, 2022; Yujian and Bo,
2007; Haldar and Mukhopadhyay, 2011) is used. This
approach provides a measure of the number of single
character insertions, exclusions or substitutions that
vary between the sequences under analysis. In a real
application context, the execution of a trusted query
may vary. One of the main reasons for this variation is
the content of the parameters inserted in the clause(s)
where for filtering, which can be different from exe-
cution to execution. Thus, the solution presented, in
the analysis of queries, removes, using regular expres-
sions, the content of these parameters, reducing the
occurrence of false positives. This removal allows, in
the context of the proposed solution, that the Fuzzy
string matching technique presents more reliable re-
sults.
Thus, the queries of the application are numbered
(in the current state, this numbering is performed by
the programmer). The Listing 2 shows an example
of this numbering. In the definition of the query, the
identifier that is intended to be assigned to the query is
added as the SQL comment in the begin of the query.
Listing 2: Query Numeration Example.
#1
s e l e c t
*
from t a b l e where f i e l d = ’ f i e l d ’
In addition to the numbering, an application query
table is created, where all queries that should be moni-
tored by the system are recorded. This table is a JSON
file (see listing 3) where the query identifier and its
contents are registered.
Listing 3: Queries table example.
[
{” i d ” : 1 , ” c o n t e n t ” : s e l e c t
*
from t a b l e wher e f i e l d ”}
]
The main function of this table is to save an ex-
ample of the execution (without the contents of the
parameters of the clause(s) where) of the queries that
will be monitored. This example does not contain any
malicious expression and is used, as will be detailed,
by the Fuzzy string matching technique in the analysis
of the queries recorded in the logs.
The analysis process of the second mechanism be-
gins with the search for the identifier that was as-
signed to the query under analysis. After this process,
through regular expressions, is removed all the con-
tents of the parameters of the clause(s) where. With
the identifier found, it is obtained, through the query
table, the secure content of the query under analy-
sis. From this point, a comparison is performed, using
python’s fuzzywuzzy (Foundation, 2022) library, be-
tween the secure content and the contents of the pre-
processed logs of the query in processing. The query
execution is classified as malicious when the differ-
ence obtained is greater than 10%.
The definition of this value is justified because, in
the testing phase, when querys are created dynami-
cally (common in data filtering mechanisms), the In-
ner joins performed between the different tables can
be different. Thus, the definition of Threshold (10%)
was necessary to reduce the rate of false positives.
In cases where the difference is greater than 10%,
as in the first check, the administrator is alerted. For
both analysis processes existing in the proposed solu-
tion, in some contexts, namely, when more than one
execution with suspected SQL injection is detected,
in a short time (default 1m) the database server is
blocked/shutdown preventively.
4 PRELIMINARY RESULTS
False positives/false negatives was one of the prob-
lems identified in the development phase. Thus, to
try to reduce the occurrence of these events, a simpler
test scenario was defined to perform the first valida-
tion of the solution. Using a web platform (iamvinitk,
2022), changed intentionally, to enable the exploita-
tion of SQL injection attacks, along with sqlmap (G
and Stampar, 2022; Bizimana and Belkhouja, 2017;
Baklizi et al., 2022), which is a penetration testing
tool that automates the process of detecting and ex-
ploiting SQL injection failures, it was defined that the
information to be obtained in attacks on the test plat-
form would be:
Query Log Analysis for SQL Injection Detection
473