On Improving the Efficiency of AI-Generated Text Detection
Bogdan Ichim
1,2
and Andrei-Cristian Năstase
1
1
Faculty of Mathematics and Computer Science, University of Bucharest, Str. Academiei 14, Bucharest, Romania
2
Simion Stoilow Institute of Mathematics of the Romanian Academy, Str. Calea Grivitei 21, Bucharest, Romania
Keywords: AI-Generated Text Detection, AI-Generated Content, Large-Scale AI Detection, AI-Generated Text
Classification, Dimensionality Reduction, PCA (Principal Component Analysis), TSVD (Truncated
Singular Value Decomposition), FastICA (Fast Independent Component Analysis), BERT Embeddings,
SimCSE Embeddings, TF-IDF (Term Frequency-Inverse Document Frequency).
Abstract: This paper proposes methods of making AI-Generated Text Detectors more computationally efficient
without paying a high price in prediction accuracy. Most AI-Detectors use transformer-based architectures
with high-dimensional text embedding vectors involved in the pipelines. Applying dimension reduction
algorithms to these vectors is a simple idea for making the whole process more efficient. Our experimental
results reveal that this may lead from 5 up to 500 times improvements in the training and inference times,
with only marginal performance degradation. These findings suggest that integrating such methods in large-
scale systems could be an excellent way to enhance the processing speed (and also reduce the electric
energy consumption). In particular, real-time applications might benefit from such enhancements.
1 INTRODUCTION
In parallel with the recent advances in artificial
intelligence, texts generated by computers have
become increasingly indistinguishable from human
writing (see Casal & Kessler, 2023; Jakesch et al.,
2023). Even AI-generated reviews are challenging to
distinguish (Ignat et al., 2024). Large Language
Models (LLMs) can write extremely coherent (and
sometimes relevant) texts, and that, to some, might
signal serious future problems (Bender et al., 2021).
While AI-generated texts have many uses in
fields such as marketing (generating content,
thumbnails, video edits, and so on), or customer
service, as with anything, LLM’s are a double-edged
sword potential misuses of such tools include the
possibility of impersonation, public opinion
manipulation, and disinformation in general.
Therefore, it's becoming more and more
important to be able to recognize AI-generated text
in order to preserve confidence in digital information
and communication. Robust and efficient detection
systems are essential for ensuring this.
Methods for detecting AI-generated texts have
been proposed by several authors as can be seen in
Section 2 of this paper. All these techniques are
computationally very expensive since the models
take a long time to train and the sizes of the datasets
can grow out of control. This may result in
significant costs for the hardware, for the electric
energy used and might even impact the environment.
There is interest for making such computations more
efficient (Patel et al., 2024).
By adding dimension reduction algorithms to the
AI-detection pipelines we may improve training and
inference times and at the same time significantly
lower the computational costs. While high-
dimensional text embeddings capture nuanced
semantic features, the dimensionality reduction
algorithms maintain the essential features with
minimal information loss, while at the same time
making models smaller, faster, and more efficient.
Therefore, the contributions in this paper are the
following:
We have prepared a new dataset, which may
be useful also for other experiments;
We are studying how dimensionality reduction
techniques like Principal Component Analysis
(PCA), Truncated Singular Value
Decomposition (TSVD) and FastICA can be
used to minimize the size of a host of different
text embedding vectors generated by SBERT,
SimCSE and the more traditional TF-IDF with
a minimal reduction in prediction accuracy.
Ichim, B. and N
ˇ
astase, A.-C.
On Improving the Efficiency of AI-Generated Text Detection.
DOI: 10.5220/0013433600003928
Paper published under CC license (CC BY-NC-ND 4.0)
In Proceedings of the 20th International Conference on Evaluation of Novel Approaches to Software Engineering (ENASE 2025), pages 731-738
ISBN: 978-989-758-742-9; ISSN: 2184-4895
Proceedings Copyright © 2025 by SCITEPRESS – Science and Technology Publications, Lda.
731
The newly prepared dataset, together with the
implementation used for experiments, is available on
demand from the authors.
2 RELATED WORK
Researchers have recognized the valuable role of
dimensionality reduction in the detection of AI-
generated text. For example, as shown in (Rojas-
Simón et al., 2024), feature reduction on ASCII-
based lexical representations improves classification
accuracy while reducing overheads. Our
experiments complement this type of work. While
embedding-based methods are capable of capturing
more semantic information, character-based
vectorization may serve as a viable and less
computationally expensive alternative. Hybrid
methods seem to be a very promising idea.
In (Singh et al., 2022) the authors proposed a
semantic feature selection process based on GloVe
word embeddings for eliminating redundant
features. Their method is called redundant feature
removal (rRF) and it provided a considerable
improvement in classification accuracy, beyond
reducing dimensionality. Besides that, they
introduced a novel performance metric (NPM) for
balancing between evaluation of feature-reduction
effectiveness and classification accuracy, which
could serve as a benchmark for improvements in
efficiency in AI text detection.
PCA and Autoencoders are proven to be good
dimensionality reduction techniques for increasing
classification accuracy and computational efficiency
of classic machine learning models (Khan et al.,
2022). This approach showed that PCA-based
preprocessing yielded significant improvements in
training and inference times, while also showing that
AutoEncoders (AE) could provide a viable
alternative for unsupervised data compression.
Indeed, their findings reveal that preprocessing is
essential to speeding up model convergence,
improving feature interpretability, and ensuring
robustness on various text datasets. The application
of AE’s alongside techniques like PCA, TSVD or
FastICA could achieve greater efficiency gains,
particularly in real-time AI text classification tasks.
In order to identify the optimal number of
features to use (Moulik et al., 2023) performed a
systematic analysis of feature dimensionality
reduction through PCA and decision-tree-based
classifiers (AdaBoost). Their results show that
reducing the feature space to 3-6 dimensions can
greatly speed up training time without sacrificing
classification performance - an insight directly
applicable to AI-generated text detection. Also, the
use of Kernel PCA (KPCA) with an RBF kernel
gives a valuable alternative to using regular PCA,
particularly in domains where modelling non-linear
relationships is crucial in our case, preserving
semantic integrity is of utmost importance.
For the task of detecting AI-generated texts
supervised learning is the main technique used by
many authors. In a nutshell, various machine
learning classifiers are trained on labelled datasets
(comprising both human-written and AI-produced
texts) and used for predicting the class of unlabelled
data. For example, CamemBERT, CamemBERTa
and XLM-R are used in (Antoun et al., 2023);
ChatGPT was put up to this task in (Bhattacharjee &
Liu, 2023); a logistic regression model and two deep
classifiers based on RoBERTa are tested by (Guo et
al., 2023); Bag-of-Words with a logistic regression
classifier and a fine-tuned BERT model are used in
(Ippolito et al., 2020). Building up in complexity,
the RADAR framework is proposed by (Hu et al.,
2023), then in a recent concurrent work the
OUTFOX framework is introduced by (Koike et al.,
2024). Also, human-assisted detection methods were
proposed in (Dou et al., 2022).
Commonly, such classifiers use sentence
embeddings for the role of (text-extracted) features.
Those multi-dimensional vectors supposedly
describe the text's semantic content and can easily be
fed into the classifiers.
Fine-tuned transformers, as for example
RoBERTa (Liu et al., 2019), turned out to be very
successful at detection when trained on appropriate
datasets. Another possible approach is using
stylometric features for detecting AI-generated text
the authors use differences and inconsistencies in
the writing style as markers of generated text
(Kumarage et al., 2023).
Finally, we also note that two recent surveys on
detecting of AI-generated texts are also available,
that is (Jawahar et al., 2020) and (Tang et al., 2024).
3 METHODOLOGY
3.1 Problem Statement
Assume we are presented with a two-column
dataset, the first column contains a “text” and the
second column contains a “label” which is 0 or 1 (in
our case 0 is the label for human-generated text and
1 is the label for AI-generated text). For the
associated binary classification problem the problem
is to find an efficient algorithm, that it we would like
to trade a little from the classification performance
ENASE 2025 - 20th International Conference on Evaluation of Novel Approaches to Software Engineering
732
in order to obtain a significant reduction in the
computational effort involved. In order to investigate
this problem we experiment with three text
embeddings algorithms, four machine learning
binary classification algorithms, as well as with
three dimensionality reduction algorithms.
3.2 Dataset
We have created a new dataset by merging several
publicly available labelled sets (from GitHub,
HuggingFace, and Kaggle). We ensured consistent
labelling schemes using Pandas. In our approach to
merging we also took care to remove the duplicate
entries. Next, we manually scanned samples that
were flagged by first-order heuristics: repeated
words, odd punctuation patterns, or really awkward
semantic structures. Simple examples of "unnatural"
samples include placeholder-like tokens (like
"XXXXXXXX"), repeated nonsensical phrases, or
alignment issues seen in pairs of text and labels.
Although subjective, this split was consistent in its
rule: if the text was obviously incomplete, silly, or
contained placeholder tokens, it was removed. Then,
spelling errors were rectified across the corpus with
the use of specialized libraries (SymSpell and
Neuspell). We found this step justified because
typos can introduce a large set of near-duplicate
terms in the TF-IDF vectors – diluting its power.
Also, LLMs rarely generate typos, so leaving such
mistakes in our dataset could introduce an
(exploitable) classifier bias. Typographical variance
should not be a good feature in this case.
Figure 1. The balance between human and AI-generated
text classes.
As can be seen in Figure 1, the resulting new
prepared dataset is quite balanced between the
human and AI-generated text classes. It contains
598441 samples that were further split as 80% train
dataset and 20% test dataset.
3.3 Technical Implementation
We have used three approaches in order to generate
text embeddings:
TF-IDF representations (TfidfVectorizer);
SBERT embeddings;
SimCSE embeddings.
As output we get matrices of sizes (598441,
1000) for TF-IDF and (598441, 768) for SBERT and
SimCSE. (In fact this CUDA-accelerated process
yielded gigabytes of data.) The reason we included
TF-IDF representations was that we wanted to see
how relatively easily calculable embeddings can
stack up against transformer-based embeddings. We
note that TF-IDF representations took significantly
less time to generate they were finished in a few
minutes, compared to a few hours for SBERT and
SimCSE. And to our surprise in several
experiments, they deliver quite similar performance.
Term Frequency-Inverse Document Frequency
representations (TF-IDF) is a statistical method that
is frequently used in natural language processing
tasks for representing sentences. It was introduced as
a response to the traditional Bag-of-Words
approaches in order to account for the importance of
a word in a document relative to a corpus. As a
refresher, the TF-IDF value is the product of two
values the term frequency and the inverse
document frequency. Term frequency reflects the
number of times a word appears in a document from
a collection (which can be interpreted as the
relevance of the word within the document itself),
while inverse document frequency expresses how
common or rare a word is across all documents in
the collection (the assumption here is that less
frequent words are of higher importance). By
combining these two metrics through multiplication,
the TF-IDF value effectively highlights words that
are significant in a specific document similar to a
Bag-of-Words approach, but the key difference lies
in downplaying common words that appear across
many documents.
Sentence-BERT (SBERT) embeddings were
derived from BERT embeddings and are a way of
extracting text features at the sentence-level,
essentially creating embeddings for whole sentences.
While BERT alone excels at producing
contextualized word embeddings, it cannot create
fixed-sized vectors for entire sentences, which is a
crucial requirement for tasks like semantic similarity
On Improving the Efficiency of AI-Generated Text Detection
733
and clustering. SBERT approaches this issue by
fine-tuning BERT on sentence pairs (via a siamese
network). During inference, SBERT encodes each
sentence independently into a fixed-dimensional
vector (768 in this case), preserving the semantic
relationships between sentences. As a result, we can
use these vectors to do things like semantic search
and question answering with a significantly reduced
computational cost when compared to traditional
models.
Our last type of text embeddings, Simple
Contrastive Sentence Embeddings (SimCSE), is a
recent and more sophisticated approach to
generating sentence embeddings. It enhances the
semantic content of the sentences through
contrastive learning. The main point is training the
model to distinguish between similar and dissimilar
pairs of sentences. During training, the model is
provided with sentences and their slightly perturbed
versions (with the help of data augmentation
techniques) as positive pairs, and unrelated
sentences as negative pairs. Of course, the model
builds upon pre-trained transformers like BERT or
RoBERTa and is fine-tuned to bring the embeddings
of positive pairs closer together while pushing apart
the embeddings of negative pairs in the vector space.
This supposedly helps capture fine-grained semantic
nuances. There are many possible variants of loss
functions for this task for example, TripletLoss,
MultipleNegativesSymmetricalLoss, and Softmax to
name a few all of them perform well for various
use-cases.
For the binary classification task, the following
well-known machine learning algorithms were used:
Logistic Regression;
K-Nearest-Neighbors (KNN);
Naïve Bayes;
Neural Network.
These classification algorithms were chosen
almost arbitrarily for our experiments, therefore we
do not further elaborate on how each algorithm
works.
The dimensionality-reduction algorithms chosen
were
PCA;
TSVD;
FastICA.
All these algorithms are available in Python from
scikit-learn (Pedregosa et al., 2011).
Principal Component Analysis (PCA) is a well-
known linear dimensionality reduction technique. In
fact, the principal components are just the
eigenvectors of the input data covariance matrix
ordered by the size of the associated eigenvalues. By
selecting the top k principal components we can
effectively reduce the data dimension while
simultaneously maintaining most of the data
variance. The implicit assumption is that the
variance captures the most significant information in
the data.
Truncated Singular Value Decomposition
(TSVD) is a variant of Singular Value
Decomposition (SVD) and another popular
technique. It is most commonly used when we want
to reduce the dimensionality of sparse matrices (as
for example CSR matrices).
Finally, Fast Independent Component Analysis
(FastICA) is a faster way of doing Independent
Component Analysis (ICA). In short, it is a
computational way of separating multivariate signals
into additive, independent components. In other
words, its scope is to transform multivariate data
points into statistically independent components.
The first step in FastICA is to center the data
(subtract the mean) and whiten it, that is alter the
data points to become uncorrelated and have unitary
variance. Whitening is typically achieved through
Principal Component Analysis or Singular Value
Decomposition – in a sense up to this point this
technique it is a combination of the previous two
methods. After this pre-processing step, the
algorithm iteratively adjusts a set of weights in order
to maximize the statistical independence of the
rotated components. This is a very strong condition
requiring infinite data to check; therefore a proxy
called “maximal non-Gaussianity” is used. The
algorithm stops when certain convergence criteria
are met (for example, the difference from the
changed weights to the initial is smaller than an
epsilon).
4 EXPERIMENTS
The experiments were performed using Jupyter
Notebook (The Jupyter Development Team, 2015),
scikit-learn (Pedregosa et al., 2011) and PyTorch
(Paszke et al., 2019).
4.1 Performance Metrics
Since our dataset is balanced and we are interested
in a binary classification task, we considered (test)
accuracy to be good as a base metric for evaluating
the performance.
ENASE 2025 - 20th International Conference on Evaluation of Novel Approaches to Software Engineering
734
𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑜𝑟𝑟𝑒𝑐𝑡 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛𝑠
𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛𝑠
The accuracy takes values between 0 and 1 and
the bigger the accuracy, the better the model at the
classification task.
4.2 Tuning Model Parameters
For hyperparameter tuning (see Table 1), we have
chosen to use Optuna (Akiba et al., 2019). This is a
Bayesian hyperparameter optimizer that is very
popular in machine learning competitions (for
example the Kaggle competitions). Optuna basically
is an open-source framework for hyperparameter
optimization designed to automate and streamline
the process of finding the best hyperparameters for
various machine learning models. In a nutshell, the
user needs to define the hyperparameter space and
then Optuna repeats the process of suggesting
hyperparameters by evaluating model performance
and learning from the results. It dynamically adjusts
its search strategy based on past trials to find better
configurations more efficiently. Its algorithms
intelligently navigate the hyperparameter space,
reducing the time and resources required for manual
tuning it also helps that it is framework-agnostic,
meaning it works with any possible algorithm the
user just needs to specify the value to be maximized
(which is in our case accuracy).
4.3 Performance Results
The results obtained in our experiments for the
performance of the machine learning models are
contained in Table 2, in which we use the
abbreviations TF for TF-IDF, SB for SBERT and SC
for SimCSE. Is easy to see that the neural network
classifier delivers the best accuracy with all
embedding types and dimensionality-reduction
algorithms. This came as expected. It’s worth
mentioning that TF-IDF representations performed
even better than the transformers in some cases on
some reduced datasets. Naïve Bayes was clearly the
worst performing by far – it had the best accuracy of
only 79% and the worst one of 67%. In all
experiments done KNN showed an edge over
Logistic Regression.
Regarding the performance of the text
embeddings, while SimCSE embeddings are clearly
delivering the best overall performance, we note that
TF-IDF and SimCSE are delivering quite similar
performance after the dimensionality-reducing
algorithms are applied to the datasets the
performance difference between them is not
significant. This is a quite surprising result and great
news since it is much easier computationally to
compute TF-IDF embeddings. SBERT embeddings
were systematically the worst-performing
embeddings.
Table 1: Hyperparameter Tuning Summary.
Model Hyperparameters tuned with Optuna
Logistic
Regression
C (Inverse of regularization strength), penalty ('l1', 'l2', 'elasticnet'),
solver ('newton-cg', 'lbfgs', 'liblinear', 'sag', 'saga')
K-Nearest
Neighbors
number of neighbors, weights ('uniform', 'distance'),
algorithm ('ball_tree', 'kd_tree', 'brute'), leaf_size
Naïve
Bayes
var_smoothing (represents the portion of the largest variance
of all features that is added to variances for calculation stability)
Neural
Network
number of hidden layers, number of neurons per layer, activation functions ('relu', 'sigmoid',
'tanh'), optimizer ('adam', 'sgd'), learning rate, batch size, dropout rate
Table 2: Experimental Results – Test Accuracies (% of correct predictions on test).
Dimension
Reduction
Logistic Regression K-Nearest Neighbors Naïve Bayes Neural Network
TF SB SC TF SB SC TF SB SC TF SB SC
None 89% 86% 91% 92% 91% 93% 79% 67% 76% 95% 93% 97%
PCA 84% 74% 82% 92% 90% 93% 75% 68% 72% 94% 91% 94%
TSVD 83% 74% 82% 91% 90% 93% 74% 67% 72% 93% 91% 94%
FastICA 84% 74% 82% 91% 90% 93% 74% 70% 74% 93% 92% 95%
On Improving the Efficiency of AI-Generated Text Detection
735
Table 3: Computational costs – Time to finish training (minutes).
Logistic Regression K-Nearest Neighbors Naïve Bayes Deep Neural Network
TF SB SC TF SB SC TF SB SC TF SB SC
None 6.75 100 10 0.16 0.12 0.1 4.75 1 1.25 197 210 110
PCA 0.21 0.25 0.12 0.01 <.01 0.01 0.06 0.06 0.06 62 27 20
TSVD 1 0.41 0.25 0.01 <.01 <.01 0.06 0.07 0.06 20 73 19
FastICA 0.06 0.06 0.19 0.01 <.01 0.01 0.06 0.06 0.06 48 50 19
Table 4: Computational costs – Inference on test set (seconds).
Logistic Regression K-Nearest Neighbors Naïve Bayes Deep Neural Network
TF SB SC TF SB SC TF SB SC TF SB SC
None 0.92 0.25 0.11 <.01 0.01 0.03 5.54 5 4.74 5.47 6.78 5.81
PCA 0.01 0.01 0.01 <.01 <.01 <.01 0.30 0.26 0.28 4.90 0.85 0.36
TSVD 0.01 <.01 0.01 <.01 <.01 <.01 0.28 0.32 0.28 1.29 0.33 2.42
FastICA 0.01 <.01 <.01 <.01 <.01 <.01 0.29 0.30 0.29 1.85 1.26 2.52
Finally, we note that applying the dimension
reduction techniques in order to significantly
decrease the computational effort seems only to lead
to a relatively minor decrease in the performance of
the classification task.
4.4 Computational Effort
The text encodings and the training of the machine
learning models were accelerated with CUDA we
have used an RTX 4060 Mobile graphics chip made
by NVIDIA for this. Generating the full-text
encodings took around 4 hours in total. Then
performing the dimensionality reduction took around
20 minutes for each of the three algorithms
FastICA was (ironically) the slowest to finish this
makes sense since at the processing step FastICA
includes the previous two. All processes, even the
CPU-bound ones (like, for example, PCA and KNN)
benefited from parallelization. The maximum
amount of RAM memory needed was 10 GB. We
completed a total of 720 training cycles (that is 48
Optuna runs with 15 trials each).
The training and inference times are presented in
Tables 3 and 4, in which we use the abbreviations
TF for TF-IDF, SB for SBERT and SC for SimCSE.
The reader should note that training the models
after a dimensionality reduction algorithm was
applied is 5 to 500 times faster than on the full
dataset. In Table 4 we can see a similar trend for
inference time as well. (The times presented in Table
4 are for the test dataset inference).
4.5 Threats to Validity
Internal Validity: The preparation procedure for the
dataset relies on subjective decisions about what
counts as "unnatural" texts. While we attempted to
systematically codify guidelines, there may remain
some bias regarding which texts were filtered out.
Although hyperparameter tuning with Optuna is
robust, it can still overfit to the training split if not
carefully cross-validated.
External Validity: The data were sourced from
publicly available sources (GitHub, HuggingFace,
Kaggle), which harbour potential biases with respect
to domain coverage. Therefore, caution must be
exercised in generalizing to highly professional or
domain-specific texts, for example, in the case of
legal or medical texts. The dimension-reduction
methods could perform differently when analysing
extremely large corpora or new languages.
Construct Validity: The metric chosen for
performance evaluation was classification accuracy.
While it covers much ground in terms of statistical
performance, other metrics such as F1, precision,
recall could prove useful for a more complete
analysis. Future studies could measure robustness to
adversarial attacks, interpretability, or real-time
adaptiveness, but this is beyond the scope of analysis
in this work.
Robustness: There was no systematic analysis on
how dimension-reduced embeddings stand up to
adversarial examples specifically engineered to fool
AI detectors. The preliminary findings imply that,
due to a domain shift or the introduction of synthetic
perturbations, the detection accuracy drops.
ENASE 2025 - 20th International Conference on Evaluation of Novel Approaches to Software Engineering
736
Therefore future works should explore the
intricacies of the vulnerabilities in detail.
5 CONCLUSIONS
In this study we tackled the idea of applying
dimensionality reduction techniques (PCA, TSVD,
and FastICA) to multidimensional text embeddings
vectors generated via different approaches. Our
experiments show that across the board, irrespective
of the type of text embeddings or dimension
reduction technique, applying such methods to text
embeddings vectors may lead to an important
reduction in the computational effort without a
corresponding significant reduction in performance.
In particular, applying dimensionality reduction
techniques can drastically improve the running times
and overall efficiency of AI text detectors. Through
all experiments, such techniques not only
accelerated training and inference times but also
preserved performance. By using dimension
reduction techniques on can successfully manage the
trade-off between reducing model complexity and
maintaining the accuracy of the predictions made.
This makes them perfectly suitable for deployment,
especially in resource-constrained environments.
Regarding the text embeddings we also note that
SimCSE consistently delivered the best accuracy,
with the lightweight TF-IDF proving to be a fierce
competitor beating the SBERT on multiple trials.
We think that incorporating these methods
provides a viable pathway towards optimizing large-
scale AI detectors, contributing to both their
scalability and efficiency. Other important practical
implications might be speeding up real-time systems
like social-media and content monitoring which
proves to be an increasingly important problem.
Also, applying dimensionality reduction in resource-
constrained environments can reduce the memory
and energy requirements enough to make detection
significantly cheaper, but also feasible on smaller
devices. There also are potential industrial
applications dimension reduction enables the
handling of high-throughput demands, from
massive-scale news verification to customer-service
chat filtering, without necessarily losing serious
performance.
Future research could look into the integration of
these techniques in advanced language models while
investigating their capacity to stand against potential
attacks which a key point in this field. Having the
capacity to detect generated text in a context in
which some user included adversarial tactics in order
to fool potential detectors (e.g. including subtle
changes and aiming for the text to sound more
human) is a very important aspect that should be
further investigated. Another future direction may
examine adversarial robustness to check whether
reduced embeddings are easier or harder to fool.
Finally, hybrid methods integrating statistical
dimensionality reduction techniques with semantic
feature pruning could prove useful in order to further
improve the results.
REFERENCES
Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.
(2019). Optuna: A Next-generation Hyperparameter
Optimization Framework. In KDD 2019, Proceedings
of the 25th ACM International Conference on
Knowledge Discovery & Data Mining, pages 2623
2631. https://doi.org/10.1145/3292500.3330701
Antoun, W., Mouilleron, V., Sagot, B., Seddah, D. (2023).
Towards a Robust Detection of Language Model
Generated Text: Is ChatGPT that Easy to Detect? In
Actes de CORIA-TALN 2023, pages 14 – 27.
Bender, E. M., Gebru, T., McMillan-Major, A.,
Shmitchell, S. (2021). On the Dangers of Stochastic
Parrots: Can Language Models Be Too Big? In FAccT
2021, Proceedings of the 2021 ACM Conference on
Fairness, Accountability, and Transparency, pages
610 – 623. https://doi.org/10.1145/3442188.3445922
Bhattacharjee, A., Liu, H. (2024). Fighting Fire with Fire:
Can ChatGPT Detect AI-generated Text? ACM
SIGKDD Explorations Newsletter 25, pages 14 21.
https://doi.org/10.1145/3655103.3655106
Casal, J. E., Kessler, M. (2023). Can linguists distinguish
between ChatGPT/AI and human writing? A study of
research ethics and academic publishing. Research
Methods in Applied Linguistics 2, 100068.
https://doi.org/10.1016/j.rmal.2023.100068
Dou, Y., Forbes, M., Koncel-Kedziorski, R., Smith, N.,
Choi, Y. (2022). Is GPT-3 Text Indistinguishable from
Human Text? Scarecrow: A Framework for
Scrutinizing Machine Text. In Proceedings of the 60th
Annual Meeting of the Association for Computational
Linguistics (Volume 1: Long Papers), pages 7250
7274. https://doi.org/10.18653/v1/2022.acl-long.501
Guo, B., Zhang, X., Wang, Z., Jiang, M., Nie, J., Ding, Y.,
Yue, J., Wu, Y. (2023). How Close is ChatGPT to
Human Experts? Comparison Corpus, Evaluation, and
Detection. Preprint arXiv: 2301.07597.
Hu, X., Chen, P.-Y., Ho, T.-Y. (2023). RADAR: Robust
AI-Text Detection via Adversarial Learning. Preprint
arXiv: 2307.03838.
Ignat, O., Xu, X., Mihalcea, R. (2024). MAiDE-up:
Multilingual Deception Detection of GPT-generated
Hotel Reviews. Preprint arXiv:2404.12938.
Ippolito, D., Duckworth, D., Callison-Burch, C., Eck, D.
(2020). Automatic Detection of Generated Text is
Easiest when Humans are Fooled. In Proceedings of
On Improving the Efficiency of AI-Generated Text Detection
737
the 58th Annual Meeting of the Association for
Computational Linguistics, pages 1808 – 1822.
https://doi.org/10.18653/v1/2020.acl-main.164
Jakesch, M., Hancock, J. T., Naaman, M. (2023). Human
heuristics for AI-generated language are flawed.
Proceedings of the National Academy of Sciences 120,
e2208839120. https://doi.org/10.1073/
pnas.2208839120
Jawahar, G., Abdul-Mageed, M., Lakshmanan, L. V. S.
(2020). Automatic Detection of Machine Generated
Text: A Critical Survey. In Proceedings of the 28th
International Conference on Computational
Linguistics, pages 2296 – 2309.
Khan, W., Turab, M., Ahmad, W., Ahmad, S. H., Kumar,
K., Luo, B. (2022). Data Dimension Reduction makes
ML Algorithms efficient. In Proceedings of the 2022
International Conference on Emerging Technologies
in Electronics, Computing and Communication
(ICETECC), pages 1 7. https://doi.org/10.1109/
ICETECC56662.2022.10069527
Koike, R., Kaneko, M., Okazaki, N. (2024). OUTFOX:
LLM-Generated Essay Detection Through In-Context
Learning with Adversarially Generated Examples. In
AAAI 2024, Proceedings of 38th AAAI Conference on
Artificial Intelligence, pages 21259 – 21266.
https://doi.org/10.1609/aaai.v38i19.30120
Kumarage, T., Garland, J., Bhattacharjee, A.,
Trapeznikov, K., Ruston, S., Liu, H. (2023).
Stylometric Detection of AI-Generated Text in Twitter
Timelines. Preprint arXiv: 2303.03697.
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D.,
Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.
(2019). RoBERTa: A Robustly Optimized BERT
Pretraining Approach. Preprint arXiv:1907.11692.
Moulik, R., Phutela, A., Sheoran, S., & Bhattacharya, S.
(2023). Accelerated Neural Network Training through
Dimensionality Reduction for High-Throughput
Screening of Topological Materials. Preprint arXiv:
arXiv:2308.12722.
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J.,
Chanan, G., Killeen, T., Lin, Z., Gimelshein, N.,
Antiga, L., Desmaison, A., Kopf, A., Yang, E.,
DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S.,
Steiner, B., Fang, L., Chintala, S. (2019). PyTorch: An
Imperative Style, High-Performance Deep Learning
Library. Advances in Neural Information Processing
Systems 32, pages 8024 – 8035.
Patel, P., Choukse, E., Zhang, C., Shah, A., Goiri, I.,
Maleki, S., Bianchini, R. (2024). Splitwise: Efficient
Generative LLM Inference Using Phase Splitting. In
ISCA 2024, Proceedings of the 51st ACM/IEEE
Annual International Symposium on Computer
Architecture, pages 118–132.
https://doi.org/10.1109/ISCA59077.2024.00019
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V.,
Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P.,
Weiss, R., Dubourg, V., Vanderplas, J., Passos, A.,
Cournapeau, D., Brucher, M., Perrot, M., &
Duchesnay, É. (2011). Scikit-learn: Machine Learning
in Python. Journal of Machine Learning Research 12,
2825 – 2830.
Rojas-Simón, J., Ledeneva, Y., García-Hernández, R. A.
(2024). A Dimensionality Reduction Approach for
Text Vectorization in Detecting Human and Machine-
generated Texts. Computación y Sistemas 28, pages
1919 – 1929.
https://doi.org/10.13053/cys-28-4-5214
Singh, K. N., Devi, S. D., Devi, H. M., Mahanta, A. K.
(2022). A novel approach for dimension reduction
using word embedding: An enhanced text
classification approach. International Journal of
Information Management Data Insights 2, 100061.
https://doi.org/10.1016/J.JJIMEI.2022.100061
Tang, R., Chuang, Y.-N., Hu, X. (2024). The Science of
Detecting LLM-Generated Texts. Communications of
the ACM 67, pages 50 – 59.
The Jupyter Development Team. (2015). Project Jupyter.
Jupyter Notebook. Available at https://jupyter.org/.
ENASE 2025 - 20th International Conference on Evaluation of Novel Approaches to Software Engineering
738