models. For example, Alrowais et al. (2024)
developed an upgraded RoBERTa model, called
RoBERTaNET, which uses GloVe word embeddings
to detect cyberbullying tweets with 95% accuracy.
While it shows high performance, it demands
significant computing power, making widespread
adoption challenging, especially in developing
countries. Similarly, Ogunleye and Dharmaraj (2023)
introduced a new dataset named D2 to enhance
RoBERTa’s detection capability. This approach
provides better accuracy and resists skewed class
problems but requires a large dataset, which limits its
use in environments with limited data availability.
Teng and Varathan (2023) used transfer learning
with DistilBERT to enhance detection, incorporating
psycholinguistic factors, but achieved only 64.8% on
the F-measure for logistic regression, indicating more
work is needed for diverse social media content. The
XP-CB model by Yi and Zubiaga (2022) uses
adversarial learning to improve cross-platform
detection, but it has high processing requirements,
limiting its scalability. Sen et al. (2024) combined
BERT with CNN and MLP, achieving 87.2% to
92.3% accuracy, outperforming other machine
learning methods. However, its complexity makes
real-time deployment challenging. Ejaz et al. (2024)
developed a dataset that covers multiple aspects of
cyberbullying, such as violence, repetition, and peer-
to-peer interaction. This makes it more flexible for
researchers but lacks detailed performance metrics. In
a separate study, Chow et al. (2023) found that BERT
achieved the highest accuracy (96%), slightly
outperforming Bi-LSTM (95%) and Bi-GRU (94%)
in detecting cyberbullying on tweets.
El Koshiry et al. (2024) used a CNN-BiLSTM
model with Focal Loss and GloVe embeddings,
achieving a 99% accuracy rate. However, the model
struggled with recall, indicating a need for further
improvements in capturing all instances of
cyberbullying. Lastly, Kaur and Saini (2023)
conducted a scient metric analysis of AI applications
for cyberbullying detection, highlighting trends,
contributions, and future research directions in this
field, but without evaluating specific model
performances. Ontology-based approaches (e.g.,
Gencoglu, 2020) provide structured domain
knowledge, enabling better categorization and
semantic understanding but often lack the ability to
process implicit language patterns effectively.
Transformer-based models, such as BERT and its
variants (Chen et al., 2023; Yi & Zubiaga, 2022),
excel in capturing linguistic nuances but struggle with
representing complex relationships between
concepts. Despite the strengths of these approaches,
they are generally applied independently, leaving a
gap in combining these techniques for tasks like
cyberbullying detection.
As of now, there is no existing research that
combines graph-based ontologies with BERT or
similar transformer models for cyberbullying
detection. Our work addresses this gap, providing an
opportunity to develop a dual-layered approach that
integrates the contextual understanding of BERT with
the hierarchical structuring capabilities of ontologies
to enhance both detection accuracy and adaptability
to evolving abusive behaviors.
3 METHODOLOGY
3.1 Data Description
The dataset used in this study consists of messages
labeled as either ‘cyberbullying’ or ‘not
cyberbullying.’ Data was collected from three main
sources: real-time input from university students
through surveys, direct interactions, and virtual
interviews; publicly available datasets from online
platforms; and web-scraped data from public forums
to capture diverse language patterns. The dataset
contains two primary columns: ‘message_text’,
which includes user-generated content from social
media, and ‘cyberbullying_type’, indicating whether
the content qualifies as cyberbullying. A majority of
the messages are labeled as ‘cyberbullying,’ while a
smaller portion is labeled as ‘not cyberbullying’.
3.2 Data Collection
Participants for this study were voluntarily recruited
from the university campus. After signing a consent
form, they completed a survey on Qualtrics where
they shared their experiences with cyberbullying,
detailing its impact on their mental health and
academic performance. Eligible participants were 18
years or older, currently enrolled in either bachelor's
or master’s programs, and had experienced or
witnessed cyberbullying within the past year. Those
interested in further participation engaged in 30-
minute virtual interviews via MS Teams to provide
deeper insights into their personal experiences.
Additionally, we used existing datasets such as the
Cyberbullying Detection Dataset on Twitter (2023),
Instagram Cyberbullying Dataset (2022), and the
OLID Dataset (2020). These datasets helped refine
the ontology model and enhance the accuracy of AI-
based detection algorithms.