
artificial intelligence fields and cybersecurity mea-
sures against SQLi attacks (Alghawazi et al., 2022).
SQL injection attacks fall into seven distinct cat-
egories: tautologies, illegal or logically incorrect
queries, piggy-backed queries, stored queries, infer-
ence, and alternate encodings. In such attacks, a
harmful script is inserted into a web application with
weak security through an entry point, which is then
relayed to the database at the back end (Farooq,
2021).
Attackers target vulnerabilities in management
APIs, which, if exploited, can lead to successful at-
tacks and compromise an organization’s assets. Sub-
sequently, attackers may use the compromised cloud
to launch additional attacks on other cloud users. Ex-
ploiting vulnerabilities in the systems, software, or
applications that facilitate multi-tenancy in cloud in-
frastructure can disrupt the separation between ten-
ants. This disruption allows an attacker to access one
organization’s resources and potentially reach another
user’s or organization’s data. The nature of multi-
tenancy expands the potential attack surface, raising
the likelihood of data leakage if separation controls
are inadequate(Tripathy et al., 2020).
Most existing countermeasures against SQL injec-
tion rely on syntax-based detection methods or a set
of pre-defined rules to identify such attacks. While
these solutions may be effective against basic forms
of SQLi, they are less effective against more advanced
and sophisticated attacks. This vulnerability arises
because attackers can devise new strategies to bypass
detection, leveraging their understanding of how con-
ventional detection mechanisms, which primarily fo-
cus on analyzing SQL syntax, operate(Abdulmalik,
2021).
A lot of research has been done on Semantic
Learning-Based Detection Model. A study intro-
duced synBERT, a semantic learning-based model for
SQLi attack detection. This model embeds sentence-
level semantic information from SQL statements into
embedding vectors, which can be mapped to SQL
syntax tree structures(Lu et al., 2023a). The research
showcased synBERT’s capability to outperform pre-
vious models, demonstrating over 90% accuracy in
detecting SQLi on a wide range of datasets(Lu et al.,
2023b).
The application of deep learning technologies has
been explored to address the challenges traditional
SQLi detection methods face. One framework in-
volves offline training and online testing stages, pro-
cessing samples through encoding, generalization,
and tokenization before training a classifier that can
efficiently identify SQLi attacks(Sun et al., 2023).
Other research work has been done in Probabilis-
tic Neural Networks (PNN) in SQLi Detection. For
instance, Fawaz Khaled Alarfaj and Nayeem Ahmad
Khan proposed using a PNN optimized by the BAT
algorithm for detecting SQLi attacks. By extracting
features from SQL queries and employing Chi-Square
testing for feature selection (Alarfaj and Khan, 2023),
their PNN model achieved an accuracy of 99.19%,
demonstrating the effectiveness of deep learning and
optimization algorithms in SQLi detection.
A deep neural network-based model, SQLNN, has
been designed to detect SQL injection statements ef-
fectively. This model utilizes TF-IDF for data pro-
cessing, highlighting the importance of filtering out
common words to focus on significant terms for SQLi
detection(Zhang et al., 2022).
Building on these insights, our work introduces a
hybrid approach that combines the strengths of Naive
Bayes, LSTM, and Random Forest algorithms. This
combination seeks to address the individual limita-
tions of each method—leveraging Naive Bayes for
its efficiency with large datasets, LSTM for its deep
learning capabilities in recognizing complex patterns,
and Random Forest for its robustness and accuracy in
classification tasks. To the best of our knowledge,this
is the first study to explore such an integrated ap-
proach for SQLi detection, promising enhanced ac-
curacy and greater adaptability to the evolving land-
scape of SQLi threats.
3 SYSTEM DESIGN
Various datasets are used to train an algorithm for de-
tecting SQLi attacks. These datasets typically consist
of a mix of normal and malicious SQL queries, al-
lowing the algorithm to learn patterns associated with
SQL injection attacks.
1. Annotated Data: The queries are usually labeled
as normal or malicious. This annotation is crucial
for supervised learning methods, where the model
learns from labeled examples.
2. Diversity of SQL Queries: The dataset includes
a wide range of SQL queries, both legitimate and
malicious. This variety helps the model to differ-
entiate between normal operations and SQLi at-
tacks.
3. Malicious SQL Samples: These include typi-
cal SQL injection patterns like tautologies, il-
legal/logically incorrect queries, union queries,
piggy-backed queries, and stored procedures ex-
ploitation.
4. Normal SQL Samples: These are regular, non-
malicious SQL queries that an application would
ICISSP 2025 - 11th International Conference on Information Systems Security and Privacy
18