Towards a SQL Injection Vulnerability Detector Based on Session Types
Ant
´
onio Silvestre
a
, Ib
´
eria Medeiros
b
and Andreia Mordido
c
LASIGE, Departamento de Inform
´
atica, Faculdade de Ci
ˆ
encias, Universidade de Lisboa, Portugal
Keywords:
SQL Injection Vulnerabilities, Session Types, Type Checking, Static Analysis, Software Security.
Abstract:
Vulnerabilities in web applications pose a risk for organisations. Among them, SQL injections (SQLi) give
the attacker access to private data by submitting malicious SQL queries to the database via invalidated entry
points. Although there are various techniques for detecting SQLi, static analysis is widely used as it inspects
the application code without executing it. However, static analysis tools are not always precise. In this work,
we explore an avenue that links the detection of SQLi to type checking, thus providing stronger guarantees of
their existence. We propose a novel approach which consists of interpreting the behaviour of a web application
as if it was a communication protocol and uses session types to specify this behaviour. We leverage FreeST,
a functional programming language for session types, to implement FREESQLI, a seminal detector of SQLi
in PHP web applications. The tool translates the PHP code into FreeST code and capitalizes on FreeST’s type
checker to verify protocol adherence and detect inconsistencies associated with the presence of SQLi.
1 INTRODUCTION
Despite efforts to prevent vulnerabilities in web appli-
cations, the number of detected vulnerabilities has in-
creased yearly and continues to be a security threat for
organizations. SQL injections (SQLi) are among the
top three vulnerabilities listed in the OWASP Top 10
Web Application Security Risks
1
. SQLi attacks oc-
cur when a system fails to sanitize or validate user in-
puts correctly, which allows malicious SQL queries to
be submitted to the database. According to CVEDe-
tails
2
, the number of reported SQLi has increased
from 740 in 2021 to 1790 in 2022 and is still ris-
ing. This data highlights the importance (and growing
prevalence) of this vulnerability on web applications,
despite being a well-researched topic.
Over the years, some programming languages
have developed mechanisms to prevent these attacks,
but they are still susceptible to human error (Halfond
et al., 2006). For this reason, there has been an ef-
fort to develop tools to automatically detect vulnera-
bilities by inspecting the application code and assist-
ing developers in identifying exploitable code frag-
ments. Yet, these tools can produce many false posi-
a
https://orcid.org/0000-0003-4227-2752
b
https://orcid.org/0000-0003-4478-8680
c
https://orcid.org/0000-0002-1547-0692
1
https://owasp.org/Top10/
2
https://www.cvedetails.com/
tives, leading to developers spending time correcting
nonexistent vulnerabilities.
PHP is the most widely used language on
the server-side of web applications, accounting for
roughly 76.5%
3
. It is a dynamically typed lan-
guage since variable data types are determined in run-
time, and the programmer does not need to specify
them (Achour et al., 2023). As PHP does not require
type specifications, it can pose security risks to ap-
plications if the intended types do not match. The
absence of strict type enforcement makes data vali-
dation from external sources (e.g., user inputs) more
difficult. For instance, if a numeric field is not cor-
rectly validated, an attacker can inject a handcrafted
string containing suspicious characters, leading to se-
curity flaws within the application.
Unlike PHP, strongly typed languages require the
programmer to specify types that govern the program
specification explicitly and throw an exception or an
error at compile time (i.e., before execution) whenever
the program does not meet the specified type. Fol-
lowing the numeric field example, in these languages,
a type mismatch is identified when the field receives
a string. Although type systems alone can not elim-
inate the need for content validation, (e.g., user in-
puts), they tend to lead to fewer fault errors than dy-
namic languages.
3
https://w3techs.com/technologies/overview/
programming language
Silvestre, A., Medeiros, I. and Mordido, A.
Towards a SQL Injection Vulnerability Detector Based on Session Types.
DOI: 10.5220/0012732500003687
Paper published under CC license (CC BY-NC-ND 4.0)
In Proceedings of the 19th International Conference on Evaluation of Novel Approaches to Software Engineering (ENASE 2024), pages 711-718
ISBN: 978-989-758-696-5; ISSN: 2184-4895
Proceedings Copyright © 2024 by SCITEPRESS Science and Technology Publications, Lda.
711
In this work, we resort to more expressive types
the so-called session types which extend the tra-
ditional notion of data types and enable the specifi-
cation of communication protocols between different
parties by specifying types of data exchange and their
directions (Vasconcelos, 2012; Thiemann and Vas-
concelos, 2016). Session type systems provide ad-
ditional guarantees of protocol adherence, which is
paramount for our approach to the detection of SQLi:
using session types we can, for instance, specify mes-
sages exchanged between the web application and the
database and detect if the latter is receiving unsani-
tised queries from the former, which indicates the
presence of a vulnerability for SQLi.
In this paper, we propose an approach to de-
tect SQLi vulnerabilities in PHP web applications
through a novel static analysis technique based on
session types to define the nature of interactions
between servers, clients, and databases. The ap-
proach leverages a strongly typed language called
FreeST (Almeida et al., 2019), which has a robust
type system based on session types. More specifi-
cally, the approach converts PHP code into FreeST
code, which must include the translation of the pro-
gram and the inference of the expected type. Trans-
lation includes an analysis of the application’s be-
haviour (over its interaction with the database and the
user) and the specification of a communication proto-
col (read, session type) that specifies this behaviour.
Once the code has been translated, the FreeST com-
piler identifies inconsistencies between the expected
types and the respective values. These inconsistencies
allow us to identify vulnerabilities for SQLi, which
will then be appropriately collected and reported to
the developer. So far we have implemented a proof of
concept of our approach in the FREESQLI prototype.
The main contributions of this paper are: (1) A
novel static analysis approach that uses session types
to detect vulnerabilities for SQL injections; (2) The
FREESQLI tool that identifies SQLi vulnerabilities in
PHP code, using the FreeST language.
2 BACKGROUND
Our work lives at the intersection of vulnerability de-
tection for SQLi and programming languages. So, in
this section we introduce some background related to
SQLi and detection mechanisms, but also related to
programming languages, session types and the lan-
guage that underpins our work – FreeST.
SQL Injections. An SQLi is a security vulnerabil-
ity that occurs when an attacker injects SQL com-
mands and meta-characters into the application’s un-
sanitised entry points, such as user forms, cookies and
server variables (Halfond et al., 2006). These meta-
characters when inserted in an existing SQL query,
they can change its structure so that it is interpreted as
a different SQL query, allowing the attacker to bypass
authentication, access the database, view, change, or
delete sensitive data, or even take control of adminis-
trative functions. One example of a vulnerable PHP
code to SQLi can be seen in Listing 1.
1 $u = $_GET [ username ];
2 $p = $_GET [ password ];
3 $qry = " S E L E CT * FROM us e r s WHERE un a m e = $u A ND
pas s w d = $p ";
4 $r = m y sql i _ que r y ( $conn , $qry );
Listing 1: Example of PHP code vulnerable to SQLi.
This example verifies if a user belongs to the
database after he provides their credentials on the
login page and through two entry points (lines 1
and 2). Both entry points are injectable and do not
go through sanitisation. An attacker can access all
users’ information by simply providing the string
OR 1=1;-- on the username field. This input al-
ters the final query to:
SELECT * FROM users WHERE uname=’’ OR 1=1;--
AND passwd=’’
The use of -- meta-characters serves as a com-
ment delimiter, which prevents the remaining content
from being treated as part of the query. Consequently,
the interpreted query would be:
SELECT * FROM users WHERE uname=’’ OR 1=1;
The tautology ’’ OR 1=1 in the WHERE clause
makes the query always true, which is logically equiv-
alent to the query SELECT * FROM users;. Next,
this query is sent to the database through the sensitive
sink mysqli_query to be executed. In this context,
a sensitive sink is a function that when executed with
abnormal data (like injected SQL code) can return un-
expected values.
According to Halfond et al. (Halfond et al., 2006),
different types of SQLi have distinct goals and char-
acteristics. However, they occur essentially because
of unsanitised input fields, which made languages
adopt mechanisms to prevent them. PHP was en-
dowed with input sanitisation functions and prepared
statements to mitigate the lack of validation on entry
points (Achour et al., 2023). Nonetheless, since these
mechanisms are applied by the developer, they are
prone to human error. For this reason, there has also
been an effort to develop tools to detect or prevent this
kind of vulnerability (Gould et al., 2004; Halfond and
Orso, 2005; Medeiros et al., 2016b; Medeiros et al.,
2016a).
ENASE 2024 - 19th International Conference on Evaluation of Novel Approaches to Software Engineering
712
Static Analysis Tools. Although programming lan-
guages provide defensive mechanisms to prevent vul-
nerabilities, such as the sanitization functions (e.g.,
mysqli_real_escape_string in PHP), they can not
guarantee that the software produced is secure since
human errors remain inevitable. Supplementary tools
are, therefore, essential for validating the code pro-
duced and supporting developers throughout the de-
velopment process.
Static analysis is a strategy that involves analysing
the application source, intermediate or binary code
without executing it to detect errors and vulnerabil-
ities. There is a wide variety of static analysis tools,
but usually, they follow a similar approach. They start
by taking the code and building a model that repre-
sents it. This model is created based on different ap-
proaches, often inspired by compiler strategies. Once
the model is complete, they analyse it based on the
knowledge and rules they have of what they are pro-
grammed to and generate a report containing the re-
sults (Chess and West, 2007). However, as these tools
depend on programmed knowledge, they may pro-
duce false positives (FP) by detecting nonexistent vul-
nerabilities and occasionally missing actual vulnera-
bilities, resulting in false negatives (FN). In security,
we know that FP and FN always have a negative effect
on the programmer’s confidence. It is thus crucial to
minimise both and maximise true positives when de-
veloping a static analyser.
Types and Type Systems. Modern software engi-
neering recognises a wide range of formal methods
to ensure the desired and correct behaviour of a sys-
tem. Of all formal methods, type systems are best
established and prevalent (Pierce, 2002). Type sys-
tems, often associated with programming languages,
try to prevent errors from occurring during the exe-
cution of a program by detecting inconsistencies be-
tween types and values at compile time. When de-
signed, they should be easily verifiable and transpar-
ent, which means they must be able to verify if a pro-
gram is typed correctly, and their behaviour should be
predictable. Type systems enable the establishment
of the relationship between the program and the types
used to annotate them (Cardelli, 1996).
Programming languages can be categorised based
on their type systems. PHP is a weakly and dynam-
ically typed language. This means that while each
variable has a specific type, the programmer does not
need to declare their type because it is inferred at run-
time. Due to this dynamic nature and the lack of strict
type enforcement, PHP becomes more susceptible to
potential vulnerabilities, like attackers injecting mali-
cious code in a string to be used in a numeric field.
As previously stated, types function as annota-
tions within computer programs. While different pro-
gramming languages have unique types, certain types,
such as integers, strings, booleans, and others, are
commonly used across multiple languages. Some lan-
guages can feature more complex types like arrays,
objects, or session types, which are especially perti-
nent to this work.
Session Types and FreeST. Session types enable
the specification of heterogenously typed channels,
defining communication protocols between two par-
ties (Vasconcelos, 2012). Session types can be seen as
sequences of input and output operations defining the
messages being exchanged: !T represents the sending
of a value of type T, whereas ?T represents the recep-
tion of a value of type T . Session types can also of-
fer some flexibility in the form of internal or external
choices: +{ : T
}
L
represents the internal choice of
a label , followed by the execution according to T
,
whereas &{ : T
}
L
offers a set of choices labelled
with L and with continuation T
. This structure
of sequential operations makes them ideal for mod-
elling protocols in distributed scenarios (Dardha et al.,
2012).
FreeST is an experimental concurrent program-
ming language
4
that uses a core of linear functional
programming and enables the creation of channels for
inter-thread communication. The most notable fea-
ture of FreeST is its robust type system founded on
context-free session types that regulate communica-
tion on channels (Almeida et al., 2019). Context-
free session types (Thiemann and Vasconcelos, 2016)
enable the specification of communication protocols
whose traces are defined by context-free languages,
while regular session types only enable the specifica-
tion of tail recursive protocols. The detail “context-
free” vs “regular” is not important in our setting, so
we will use FreeST for its implementation of session
types (although we could choose any other language
that implements session types
56
).
Table 1 presents the different session types and
primitives within the FreeST language. These types
allow the definition of communication protocols be-
tween threads, with each type being linked to a spe-
cific primitive. FreeST features several communica-
tion primitives: send and receive are used for trans-
mitting and receiving values, select is used to make
choices within the channel, and match is used to align
with the selected option. The close primitive is used
for channel termination, while the wait primitive is
4
https://freest-lang.github.io/
5
https://rss.rd.ciencias.ulisboa.pt/tools/sepi/
6
https://docs.rs/session types/latest/session types/
Towards a SQL Injection Vulnerability Detector Based on Session Types
713
Table 1: FreeST session type constructor and primitives.
Session Type
Constructor
Behaviour
FreeST
Primitive
!T
send value of type T
send
?T
receive value of type T
receive
+{l: T, ..} select a choice select
&{l: T, ..}
offer a set of choices
match
T ; U do T, then U -
Close close the channel close
Wait
wait for channel to be closed
wait
Skip neutral element of ; -
used to wait for channel termination. The Skip type
is the neutral element of the sequential composition ;.
For example, consider a scenario where a server
thread is responsible for receiving two integers,
checking if their sum exceeds ten and providing the
result. The communication between the client and
server threads can be read as: send the first number,
send the second number, and receive a boolean value.
This sequence can be translated into a FreeST proto-
col using the type constructors presented in Table 1,
as shown below:
type protocol = !Int; !Int; ?Bool; Close
Type Checking. Type checking is one of the most
(involuntarily) used static analysis techniques by de-
velopers. It involves checking and validating the types
of values in a programming language and ensuring, at
compile time, that the variables or expressions used in
operations are compatible with their expected types.
Although most programmers are familiar with type-
checking, they may not pay much attention to it since
the rules are already defined by the programming lan-
guage and enforced by the compiler, transparently to
the user. Regardless, type checking, like any other
static analysis method, is not foolproof and can pro-
duce FP and FN. Programmers often overlook these
imperfections, yet they can significantly impact vul-
nerability detection (Chess and West, 2007).
3 FREESQLI PROTOTYPE
We aim to develop a static analysis tool to detect SQLi
in PHP code using session types offered by FreeST.
To enable and guarantee the use and practicality of
our tool, session types, FreeST and its compiler are
used in a way that is completely transparent to the
user, so that FreeST goes unnoticed.
We call our tool FREESQLI, and its approach op-
erates in two distinct phases: the translation and the
vulnerability detection, represented in Figure 1. The
translation phase translates the PHP code into FreeST
PHP
Source
Code
Intermediate Language
Code Translator
PHP-IL
Mapping
FreeST Code
Translator
IL-FreeST
Lines Mapping
FreeST Compiler
SQL Injection
Vulnerability
knowledge
FreeST Syntatic
Rules
Vulnerability Detector
Vulnerabilies
Reported
Type
Checking
Rules
FreeSQLi
Translation Phase
Vulnerability Detection Phase
Intermediate
Language (IL)
FreeSTCode
FreeST
Compiler Errors
Figure 1: Architecture of the proposed solution.
code, passing through an intermediate language (IL).
The conversion of PHP into IL enables us to remove
irrelevant syntax from the PHP and simplify the trans-
lation to FreeST. The IL code is then transformed into
FreeST code, including the program definition and the
protocol specification (i.e. the type that should govern
the program). This translation registers the mapping
of the IL lines into FreeST lines to facilitate the gen-
eration of a comprehensive report in the next phase.
Once we have the FreeST code, we are able to
compile it and proceed to the vulnerability detection
phase. The FreeST compiler checks for protocol com-
pliance and identifies type mismatches, generating a
list of errors in case of any inconsistencies. These er-
rors, the source code, and the corresponding mapping
are then used to pinpoint the vulnerabilities in PHP
code and develop a comprehensive report.
We now give a more detailed explanation of
each phase, illustrated by a running example of how
FREESQLI detects SQLi vulnerabilities.
Running Example. Consider the PHP code frag-
ment in Listing 2, which consists of an SQL query
where we limit the number of returned results to 20
and establish an offset, specified by the user. The code
aims to divide a result set into pages, but it overlooks
ENASE 2024 - 19th International Conference on Evaluation of Novel Approaches to Software Engineering
714
the proper sanitation of the $offset variable. This
oversight creates an opportunity for an attacker to in-
ject malicious code, and since the query relies on this
variable, the code becomes vulnerable to SQLi.
1 $offs e t = $_GET[ offset ];
2 $q = " S E L ECT id , n a m e F ROM pro d u cts O R D E R B Y
name L I M I T 2 0 O F FSET $offset ;";
3 $r = m y sql i _ que r y ( $conn , $q);
Listing 2: Example of a PHP code vulnerable to SQLi.
Translation Phase. In this phase, we focus on con-
verting the PHP source code into FreeST code. In
PHP (as in other languages), some functions behave
similarly. For example, the entry point functions, like
$_GET and $_POST, invoke different HTTP request
methods, but both consist of receiving user inputs.
With this in mind and to simplify the translation pro-
cess, we start this phase by converting source code
into an IL. The IL allows us to remove unnecessary
PHP code (e.g., comments) without oversimplifying
the program, which could prevent us from detecting
certain vulnerabilities.
The Intermediate Language Code Translator,
shown in Figure 1, converts the PHP code into IL
through a lexer. This translation assigns tokens to in-
dividual PHP functions based on their behaviour, us-
ing the IL-PHP mapping. Since we are translating
to FreeST, we focus on the behaviour of the func-
tions, and the IL already establishes the connection to
the communication protocols (later specified through
session types). Always taking the application’s point
of view, PHP entry points functions will be trans-
lated into the receive token, PHP functions that con-
sist of sending data (queries) to the database (e.g.,
mysql_query) will be represented in IL by the send
token. We also need tokens that allow us to repre-
sent sanitisation functions (sanit_f), type checking
(typechk) and other functions that do not fall into
the previous categories (e.g., replace_str). Table 2
presents an excerpt of the mapping between these to-
kens and the anticipated behaviour corresponding to
each function. (Note that we are still working with a
small fragment of the PHP language and that we will
extend the translation to other primitives soon.)
Variable names are not altered by the translation
and do not have a specific token since it could lead
us to lose information. SQL queries are represented
as the token query and its variables since we assume
their safeness depends on the variables they use.
Coming back to our running example, the $_GET
and mysqli_query functions are represented as the
receive and send tokens, respectively, meaning that
the protocol will receive data provided by $_GET
and it will send data (to the database) through
mysqli_query function. Therefore, line 1 of the code
would be translated to $offset receive, while line
3 would become $r send $conn $q. The query on
line 2 would also become $q query $offset. The
complete IL conversion is shown in Listing 3.
1 $offs e t recei v e
2 $q q u e r y $offset
3 $r se n d $conn $q
Listing 3: Conversion of PHP code of Listing 2 to IL.
Once we obtain the IL code, the translation into
FreeST occurs through the FreeST Code Translator,
involving two essential steps: (1) specification of the
(translated) FreeST program, and (2) inference of the
expected protocol, by means of a session type.
The first step leverages the IL tokens, and the
translator parses the lines of IL code into FreeST us-
ing predefined templates based on the FreeST syntac-
tic rules. In the second step, the translator identi-
fies and specifies protocol definitions. Here, we use
session types to describe the protocol of the server’s
communication with the user and the database. When
a server receives data from the user via entry points,
he receives unsafe data since it comes from an un-
trusted source. On the other hand, when the server
queries a database, it should send safe data to avoid
possible attacks. For this purpose, we endow the
FreeST language with two new types: Safe and
Unsafe, similar to tainted and untainted variables
used in the literature (Shankar et al., 2001; Dahse and
Holz, 2014). These new types represent whether a
variable can be tampered with. For example, user-
input variables fall under the unsafe category, whereas
data supplied to a sensitive sink should fall under
the safe category. The sanitisation functions (e.g.,
mysqli_real_escape_string) modify the variable
type from Unsafe to Safe. We chose the terms Safe
and Unsafe over tainted and untainted for clarity and
their potential use in FreeST features beyond identi-
fying vulnerabilities. Considering this, we can ob-
serve in Table 3 the correlation between the IL to-
kens that involve communication and their expected
behaviour as a FreeST session type. These correla-
tions allow us to define the correct communication
protocol based on the ILs tokens. For example, the
IL receive token represents user input, categorised
as Unsafe due to its untrusted source. This means
that the FreeST session type corresponds to ?Unsafe
(i.e. “receive unsafe”), which means we will use the
primitive receive when translating to FreeST code,
as indicated in Table 1.
In our running example, each IL instruction of
Listing 3 is transformed following a template, result-
ing in the FreeST code of Listing 4.
Towards a SQL Injection Vulnerability Detector Based on Session Types
715
Table 2: PHP-IL Mapping: IL tokens and respective PHP functions.
IL token Description Example of PHP primitives/functions
receive entry point $ GET, $ POST, $ COOKIE, $ REQUEST
send sensitive sink
mysql query, mysql unbuffered query, mysql db query,
mysqli query, mysqli real query
sanit f sanitization function mysql escape string, mysql real escape string
replace str replace string function str replace, str ireplace, substr replace, preg replace, strtr,
typechk type checking function is array, is bool, is callable, is float, is int, is null, is string
query SQL query -
Table 3: Correlation between IL and FreeST session types.
FreeST FreeST
IL token Description of expected behaviour session type primitive
receive Receive unsafe input from a user ?Unsafe receive
send Make a safe query to the database !Safe send
1 let ( offset , p rotoc o l ) = re c e ive p r otoco l in
2 let q = q u e r y offse t in
3 let p rotoc o l = send q p r otoco l i n
Listing 4: Code in Listing 3 translated to a FreeST program.
1 type Pr o t ocol = ? U n safe ; ! Safe ; C l o s e
2
3 serve r : Pr o t ocol - > ()
4 serve r prot o c ol =
5 let ( offset , pr o t ocol ) = r e ceive p rotoc o l in
6 let q = q u ery off s e t i n
7 let pro t o col = send q p rotoc o l in
8 close p rotoc o l
Listing 5: The resulting FreeST code of Listing 2.
It is important to note that in FreeST, the receive
primitive returns a tuple with the received result and
the continuation type (e.g., (offset, protocol) in
Listing 4, line 1), while the send primitive only re-
turns the continuation type. In the second step, we
specify a communication protocol (i.e., a type). The
type is defined based on specific communication to-
kens like send and receive. For instance, a receive
token appears in line 1, followed by a send token
in line 3. In this particular case, the communication
involves receiving an Unsafe type, sending a Safe
type, and closing the channel. Consequently, the type
for this program would be:
type Protocol = ?Unsafe; !Safe; Close
Once the protocol is defined and the lines are con-
verted based on templates, we add other necessary
FreeST syntax to allow the program compilation. As
the translated instructions are not in the same line as
in the IL, we need to annotate the program with the
type. So, to correctly report the vulnerable part of the
code, we store a mapping between the IL and FreeST
lines for future reference (IL-FreeST Lines Mapping
in Figure 1). The final FreeST code for our running
example is depicted in Listing 5.
Vulnerability Detection Phase. In this phase, we
detect SQLi vulnerabilities by relying on the FreeST
compiler to verify if the generated FreeST code fol-
lows the inferred protocol. Using its type checking
rules, the FreeST compiler evaluates if the protocol
is being followed by the (translated) FreeST program
and generates a list of errors in case of type mis-
matches. These errors may occur when, for exam-
ple, the server sends Unsafe data (to the database) us-
ing the primitive send while the protocol requires the
server to do it with Safe values. Such discrepancies
suggest the existence of SQLi vulnerabilities, indicat-
ing the presence of one or more unsanitised variables.
As part of our protocol adherence verification, we
have modified the compiler to include types Safe and
Unsafe and have developed a FreeST module that
specifies the functions needed to properly infer the
type of queries based on the types of their arguments.
This module specifies, for instance, that a query with
one argument with type Unsafe, has type Unsafe:
query : Unsafe -> Unsafe
However, if all arguments in a query are Safe, the
query has type Safe. In the case were the query has
two arguments, we have:
query : Safe -> Safe -> Safe
In short, if some variable in a query statement has
type Unsafe, the query has type Unsafe. Sanitisation
functions transform Unsafe values into Safe values:
sanitise : Unsafe -> Safe
Once the FreeST errors are collected and the map-
ping between IL and FreeST lines is identified, a re-
port with all SQLi detected is generated. The report
includes queries vulnerable to injections alongside the
unsanitised variables used within these queries. Also,
it specifies the exact line where these vulnerabilities
occur for clear identification and resolution.
Coming back to our running example, we need
to compile the resulting FreeST code in Listing 5.
Following the program execution, we notice that the
offset variable is assigned type Unsafe; as q is de-
fined from a query with an argument of type Unsafe,
it also has type Unsafe. And so, in line 7 of the pro-
gram, we send a value of type Unsafe, when we were
ENASE 2024 - 19th International Conference on Evaluation of Novel Approaches to Software Engineering
716
expecting to send a value of type Safe (as indicated
in the type in line 1). As a result, the compiler detects
this error and generates the following message: Could
not match expected type Safe with actual type Unsafe
for expression q, providing details in which lines this
error appears. This message indicates that we tried to
send an Unsafe query to the database when it should
be Safe. Because the query uses the offset variable,
which is an entry point that was not sanitised, we are
in the presence of an SQLi. Based on this compila-
tion error, the vulnerability detector detects this vul-
nerability and generates a report to the user with the
necessary information to correct it:
-----------------------------------------
SQL injections detected: 1
Queries: "SELECT id , name FROM products
ORDER BY name LIMIT 20 OFFSET $offset;"
Vulnerable variables: $offset
-----------------------------------------
4 USAGE EXAMPLE
Listing 6 contains a code example that conventional
static analysis tools may wrongly identify it as vul-
nerable to SQLi. The input received from the user un-
dergoes the substr function, which extracts the first
four characters. This substring is not dangerous be-
cause it contains less than five characters (Medeiros
et al., 2016a) and does not compromise the query,
making this example a false positive (FP) for certain
tools (e.g., (Dahse and Holz, 2014)).
1 $u = $_POST[ user ];
2 $u1 = subs t r ($ u , 0 , 4) ;
3 $q = " S E L ECT * from u s e r s WHERE user = ".$u1 ." ";
4 $r = m y sql i _ que r y ( $con , $q );
Listing 6: PHP example without a SQLi vulnerability.
1 type Pr o t ocol = ? U n safe ; ! Safe ; C l o s e
2
3 serve r : Pr o t ocol - > ()
4 serve r prot o c ol =
5 let (u , p r o tocol ) = rece i v e pr o t ocol in
6 let u1 = s a n itise u1 i n
7 let q = q u ery u1 in
8 let pro t o col = send q p rotoc o l in
9 close p rotoc o l
Listing 7: The resulting FreeST code of Listing 6.
Our approach does not classify this code as an FP
because the substr function is considered a saniti-
sation function if it extracts less than five characters.
The translated FreeST code for this example is in List-
ing 7. The compiler does not detect any inconsistency
because the sanitise function converts the Unsafe
variable u in a Safe variable u1.
5 RELATED WORK
SQL injections are well-researched and documented
threats, and over the years, various techniques and
tools have been developed to prevent or detect
them (Gould et al., 2004; Halfond and Orso, 2005;
Medeiros et al., 2016b; Medeiros et al., 2016a). These
auxiliary tools aim to help developers reduce the num-
ber of vulnerabilities or accelerate the code review
process. However, their adoption in real-world ap-
plications can vary since developers tend to prioritize
tools that seamlessly integrate into their systems and
offer minimal false-positive rates (Oyetoyan et al.,
2018). In this section, we provide an overview of sev-
eral such techniques and tools.
JDBC-Checker (Gould et al., 2004) is a static code
checker that statically verifies the accuracy of types in
dynamically created SQL queries. It was one of the
earliest static analysis tools, and even if its primary
purpose is not to prevent general SQLi attacks, it can
aid in blocking attacks that exploit type discrepancies
within dynamically generated query strings. It can
help address one of the main issues leading to SQLi
vulnerabilities: the lack of proper validation of input
types. Yet, it is worth noting that JDBC-Checker is
ineffective against more general SQLi, which involve
syntactically and type-correct queries.
Another tool that appeared at an early stage was
AMNESIA (Halfond and Orso, 2005). It uses a model
that combines static analysis with runtime monitoring
to enhance security against SQLi attacks. The process
involves a static phase that constructs models of legiti-
mate query types for each database access point using
static analysis. Subsequently, in the dynamic phase,
queries undergo interception pre-database execution
and verification against the built models. Queries con-
flicting with these models are flagged as SQLi and
prevented from execution. The effectiveness of this
tool is tied to the accuracy of the built query models
and can influence both FP and FN.
More recently, some tools have focused on detect-
ing various vulnerabilities, not only SQLi, by leverag-
ing some machine learning techniques. One of these
tools is WAP (Medeiros et al., 2016b). The tool uses
taint analysis to detect vulnerabilities and machine
learning to ensure that identified vulnerabilities are
genuine and not FP. The tool breaks down the code
into smaller parts using a Lexer and organises it into
Towards a SQL Injection Vulnerability Detector Based on Session Types
717
an abstract syntax tree (AST). During the taint anal-
ysis process, WAP looks for vulnerable entry points
and verifies if they compromise any sensitive sink,
considering possible entry point sanitisation through-
out the code. The tool corrects the code automatically.
Unlike WAP, DEKANT (Medeiros et al., 2016a)
differs in how it detects vulnerabilities in PHP code.
Instead of using an AST, it employs taint analysis and
machine learning, breaking the code into tokens and
converting them into an intermediate language (simi-
lar to our approach). DEKANT learns to spot vulner-
abilities by observing Hidden Markov Model (HMM)
sequence models, enabling it to autonomously detect
flaws without relying on explicit detection methods.
The main drawback of employing machine learn-
ing on DEKANT and WAP is their reliance on the
quality of their training datasets. If a poor set is used,
it can lead to large numbers of FP and FN. As a re-
sult, they cannot offer guarantees regarding their pre-
diction or detection abilities, something we want to
reach with our approach.
6 CONCLUSION
In this paper, we propose a novel approach that anal-
yses PHP code’s behaviour to identify vulnerabilities
for SQLi. Looking at the program from the appli-
cation’s point of view, we use session types to iden-
tify communication protocols between the applica-
tion, the user, and the database. Our approach uses
the FreeST programming language in the backend
and capitalises on its compiler and expressive types
to identify type mismatches and infer the existence of
vulnerabilities for SQL injections. Also, we present
FREESQLI, a proof of concept of our approach.
In the near future, we will extend the fragment of
PHP that we are considering to include the if prim-
itive and we will prove results of correctness of our
translations. In addition, we will adapt our FreeST
implementation so that Safe is considered a subtype
of Unsafe, which will enable us to identify second-
order SQLi. We will also evaluate the performance of
our tool. To do this, we will use NIST SARD
7
, which
offers small synthetic applications. The test suite will
include different scenarios, such as instances where
traditional static analyzers produce false positives and
negatives and patches for vulnerable cases.
7
https://samate.nist.gov/SARD/
ACKNOWLEDGMENTS
This work was supported by FCT through the project
SafeSessions PTDC/CCI-COM/6453/2020 (http:
//doi.org/10.54499/PTDC/CCI-COM/6453/2020)
and LASIGE Research Unit UIDB/00408/2020
(https://doi.org/10.54499/UIDB/00408/2020) and
UIDP/00408/2020 (https://doi.org/10.54499/UIDP/
00408/2020)
REFERENCES
Achour, M. et al. (2023). PHP Manual. https://www.php.
net/manual/en/.
Almeida, B., Mordido, A., and Vasconcelos, V. T. (2019).
FreeST: Context-free session types in a functional lan-
guage. arXiv preprint arXiv:1904.01284.
Cardelli, L. (1996). Type systems. ACM Computing Surveys
(CSUR), 28(1):263–264.
Chess, B. and West, J. (2007). Secure programming with
static analysis. Pearson Education.
Dahse, J. and Holz, T. (2014). Simulation of Built-in PHP
Features for Precise Static Code Analysis. In NDSS,
volume 14, pages 23–26.
Dardha, O., Giachino, E., and Sangiorgi, D. (2012). Session
types revisited. In PPDP, pages 139–150.
Gould, C., Su, Z., and Devanbu, P. (2004). JDBC checker:
a static analysis tool for SQL/JDBC applications. In
Proceedings. 26th ICSE, pages 697–698.
Halfond, W. G. J. and Orso, A. (2005). AMNESIA: Anal-
ysis and Monitoring for NEutralizing SQL-Injection
Attacks. In IEEE/ACM ASE, page 174–183.
Halfond, W. G. J., Viegas, J., and Orso, A. (2006). A Clas-
sification of SQL-Injection Attacks and Countermea-
sures. In IEEE ISSSE.
Medeiros, I., Neves, N., and Correia, M. (2016a).
DEKANT: A Static Analysis Tool That Learns to De-
tect Web Application Vulnerabilities. In Proceedings
of the 25th ISSTA, page 1–11.
Medeiros, I., Neves, N., and Correia, M. (2016b). Detect-
ing and removing web application vulnerabilities with
static analysis and data mining. IEEE TR.
Oyetoyan, T. D., Milosheska, B., Grini, M., and
Soares Cruzes, D. (2018). Myths and facts about static
application security testing tools: an action research at
Telenor digital. In XP 2018, pages 86–103.
Pierce, B. C. (2002). Types and programming languages.
MIT press.
Shankar, U., Talwar, K., Foster, J. S., and Wagner, D.
(2001). Detecting format string vulnerabilities with
type qualifiers. In 10th USENIX Security Symposium.
Thiemann, P. and Vasconcelos, V. T. (2016). Context-free
session types. In ICFP, pages 462–475. ACM.
Vasconcelos, V. T. (2012). Fundamentals of session types.
Information and Computation, 217:52–70.
ENASE 2024 - 19th International Conference on Evaluation of Novel Approaches to Software Engineering
718