A Comparative Study of Log-Based Anomaly Detection Methods in

Real-World System Logs

Nadira Anjum Nipa, Nizar Bouguila and Zachary Patterson

Concordia Institute for Information and Systems Engineering, Concordia University, Montreal, Quebec, Canada

Keywords:

Anomaly Detection, Log Analysis, Machine Learning, Deep Learning, Log Parser.

Abstract:

The reliability and security of today’s smart and autonomous systems increasingly rely on effective anomaly

detection capabilities. Logs generated by intelligent devices during runtime offer valuable insights for mon-

itoring and troubleshooting. Nonetheless, the enormous quantity and complexity of logs produced by con-

temporary systems render manual anomaly inspection impractical, error-prone, and laborious. In response to

this, a variety of automated methods for log-based anomaly detection have been developed. However, many

current methods are evaluated in controlled environments with set assumptions and frequently depend on pub-

licly available datasets. In contrast, real-world system logs present greater complexity, lack of labels, and

noise, creating substantial challenges when applying these methods directly in industrial settings. This work

explores and adapts existing machine learning and deep learning techniques for anomaly detection to function

on real-world system logs produced by an intelligent autonomous display device. We conduct a comparative

analysis of these methods, evaluating their effectiveness in detecting anomalies through various metrics and

efﬁciency measures. Our ﬁndings emphasize the most efﬁcient approach for detecting anomalies within this

speciﬁc system, enabling proactive maintenance and enhancing overall system reliability. Our work provides

valuable insights and directions for adopting log-based anomaly detection models in future research, particu-

larly in industrial applications.

1 INTRODUCTION

In the current technological environment, the Internet

of Things (IoT) has become essential to various facets

of everyday life, providing an extensive range of ser-

vices. One example of an IoT device is SCiNe (Smart

City Network), a smart, autonomous display created

by Buspas (Bus, 2024), tailored speciﬁcally for the

transportation sector. SCiNe operates autonomously,

using a lithium battery and a solar panel and will de-

liver real-time transit information at bus stops. This

involves accurate bus wait times, occupancy informa-

tion, and customer trafﬁc insights to optimize vehi-

cle assignments based on demand (Bus, 2024). For

uninterrupted service and to guarantee customer sat-

isfaction, this IoT device must operate continuously,

around the clock. Even small service disruptions will

affect user experience, making dependable and con-

tinuous operation essential for such a large-scale and

intricate system.

Anomaly detection is essential for promptly iden-

tifying unusual system behavior, which is vital for re-

ducing system downtime and maintaining smooth op-

erations. Anomaly detection offers early warnings of

potential issues, enabling operators to swiftly address

and resolve problems, thereby ensuring uninterrupted

service. System logs serve as one of the most valuable

sources of data for detecting anomalies, as they docu-

ment real-time events and activities occurring within

a system. These logs provide important insights for

identifying anomalies, positioning log-based anomaly

detection as a signiﬁcant ﬁeld of study.

Historically, anomaly detection in logs has relied

on manual inspection. Nonetheless, the vast quantity

and intricacy of log events produced each second in

contemporary systems make manual analysis imprac-

tical, prompting the development of automated log

analysis methods.

Many statistical and traditional machine learning

algorithms, such as Decision Tree (Chen et al., 2004),

Principal Component Analysis (Xu et al., 2009), and

Log Clustering (Lin et al., 2016), have been used to

automate the identiﬁcation of signiﬁcant incidents or

anomalies in log data. Although these conventional

methods have made notable contributions, they are

hampered by drawbacks such as limited interpretabil-

Nipa, N. A., Bouguila, N. and Patterson, Z.

A Comparative Study of Log-Based Anomaly Detection Methods in Real-World System Logs.

DOI: 10.5220/0013367000003944

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 10th International Conference on Internet of Things, Big Data and Security (IoTBDS 2025), pages 141-152

ISBN: 978-989-758-750-4; ISSN: 2184-4976

141

ity, inﬂexibility, and the requirement for manual fea-

ture engineering (Le and Zhang, 2022), (Zhang et al.,

2019).

To address these challenges, deep learning tech-

niques such as DeepLog (Du et al., 2017), LogRobust

(Zhang et al., 2019) have been developed, demon-

strating encouraging outcomes. Although advances

have been achieved in the literature, a signiﬁcant gap

remains in the use of these techniques for industrial

datasets.

The majority of research has focused on pub-

lic datasets that come with predeﬁned conditions,

where data is already labeled, organized, and struc-

tured effectively. Conversely, real-world data intro-

duces further difﬁculties, including noise, variability,

and unstructured formats, which signiﬁcantly com-

plicate log-based anomaly detection in industrial sys-

tems. The following points will highlight the speciﬁc

challenges posed by real-world data.

1. Log Collection: Without a centralized log ag-

gregation mechanism, log gathering can be labori-

ous and time-consuming. In the absence of auto-

mated systems, logs need to be collected manually

from multiple sources, leading to signiﬁcant time con-

sumption and a heightened risk of human error. The

arduous and ineffective manual approach of collect-

ing and identifying logs hinders data preparation.

2. Log Structure: The structure of logs in indus-

trial systems exhibits a high degree of heterogeneity

and variability. In contrast to public datasets that usu-

ally adhere to a uniform log format, industrial log

data can differ greatly among various systems, appli-

cations, and components. The log ﬁle may include

messages that have varying structures, which compli-

cates the application of standard parsing or analysis

methods.

3. Log Quality: The quality of data in industrial

log systems can be notably compromised by noise and

extraneous information. Logs are produced at a rapid

pace by various applications, resulting in an over-

whelming amount of data, much of which is repet-

itive or lacking in useful information. Furthermore,

logs frequently include extraneous tokens or super-

ﬂuous metadata that do not aid in signiﬁcant analysis

yet still require processing. Various applications and

services within the same system might employ incon-

sistent logging standards, resulting in the presence of

unnecessary tokens that can obscure valuable infor-

mation and hinder the identiﬁcation of anomalies or

issues.

The supervised ML methods consist of Logis-

tic Regression (Bodik et al., 2010), Support Vec-

tor Machine (Liang et al., 2007) and Decision Tree

(Chen et al., 2004), whereas the unsupervised meth-

Figure 1: Anomaly Detection Framework (Le and Zhang,

2022).

ods include Principal Component Analysis (Xu et al.,

2009), Isolation Forest (Liu et al., 2008), and Log

Clustering (Lin et al., 2016). We employed DeepLog

(Du et al., 2017), an unsupervised method, and

LogRobust (Zhang et al., 2019), a supervised method,

for deep learning. All methods were adapted using

existing open-source toolkits (He et al., 2016, Chen

et al., 2021, Le and Zhang, 2022) minimizing the need

for reimplementation. A comprehensive analysis was

performed, assessing the accuracy and efﬁciency of

the methods. We believe that our ﬁndings will pro-

vide important insights for researchers and develop-

ers, aiding in the identiﬁcation of the challenges and

intricacies associated with working with real-world

logs. In summary, this work makes several important

contributions as follows:

1. We modiﬁed various established ML and DL

log anomaly detection methods for application to a

practical industrial dataset.

2. A thorough comparative analysis was carried

out to evaluate the performance of these methods

across different experimental conditions.

3. We offer actionable insights and guidelines to

enhance industrial log-based anomaly detection de-

rived from our research.

2 COMMON FRAMEWORK

The procedure for detecting log anomalies generally

involves four essential steps: log parsing, log group-

ing, log representation, and anomaly detection (Le

and Zhang, 2022). This framework is illustrated in

Figure. 1.

2.1 Log Parsing

The ﬁrst step after collecting logs is log parsing,

which transforms unreﬁned log messages into struc-

tured format. This entails the automatic segregation

IoTBDS 2025 - 10th International Conference on Internet of Things, Big Data and Security

142

of the ﬁxed, constant element (Log Key) of a log

message from its variable parts. The consistent ele-

ment stays unchanged throughout various log entries,

whereas the variable component changes. The ob-

jective of parsing is to derive log templates by rec-

ognizing patterns and substituting variable segments

with placeholders. For instance, in Figure. 1, the * in

EventTemplate signiﬁes variable components.

A range of automated log parsing techniques

have been developed, leveraging methods like clus-

tering (Shima, 2016, Hamooni et al., 2016), heuristic

(Makanju et al., 2009, He et al., 2017) and longest

common subsequence (Du and Li, 2016). A new log

parsing technique, NuLog (Nedelkoski et al., 2021)

was introduced that employed a self-supervised learn-

ing model and showed enhanced accuracy and efﬁ-

ciency relative to other log parsing methods.

2.2 Log Grouping

Following the parsing process, next step involves

transforming the textual logs into numerical features

suitable for use in anomaly detection methods. Be-

fore this conversion, it is essential to segment the log

data into distinct groups or sequences through vari-

ous techniques. Every group signiﬁes a series of log

events, and from these sequences, feature vectors (or

event count vectors) are generated to construct a fea-

ture matrix. This matrix acts as the input for mod-

els designed to detect anomalies. Logs can be or-

ganized into groups through three main windowing

techniques:

Fixed Window: In this approach, log events are

categorized according to a speciﬁed time frame. The

window size can ﬂuctuate from seconds to minutes

or even hours, depending on the speciﬁc issue being

addressed. Logs that occur within the same window

are considered a single sequence, ensuring there is no

overlap between consecutive windows.

Sliding Window: In this approach, the logs are

organized in a manner akin to the ﬁxed window, but it

incorporates an extra parameter—step size. The step

size, typically less than the window size, results in

overlap between successive windows, producing ad-

ditional sequences. For instance, a log sequence span-

ning an hour with a step size of ﬁve minutes will result

in overlapping windows.

Session Window: In contrast to the earlier two

methods, session windows categorize logs by us-

ing unique identiﬁers that monitor various execution

paths, facilitating a more organized grouping of re-

lated events. For example, certain public datasets use

node id, block id to identify and group related logs.

2.3 Log Representation

After logs are organized into sequences, they are

transformed into feature vectors for additional anal-

ysis. There are three main types of feature represen-

tations:

Quantitative Vector: This is referred to as the log

count vector, which records the frequency of each log

event within a sequence. For instance, in the sequence

[E1 E2 E3 E2 E1 E2], the resulting vector would be

[2 3 1], with each number indicating the frequency of

each event. This depiction is frequently used in ML

methods.

Sequential Vector: This vector represents the se-

quence of events as they unfold. For instance, the se-

quence [E1 E2 E3 E2 E1 E2] would yield the vector

[1 2 3 2 1 2]. DL methods such as DeepLog (Du et al.,

2017) utilize this method to understand event patterns

according to the sequence of their occurrences.

Semantic Vector: In contrast to quantitative

and sequential vectors, semantic vectors capture the

meaning or context of log events through the use of

language models. This method emphasizes the fun-

damental meaning of log messages instead of their

frequency or sequence. For example, in the sequence

of log events: [E1: ”Module Not Found”, E2: ”No

Override File Found”, E3: ”Error Bad parameters”,

E2: ”No Override File Found”, E1: ”Module Not

Found”], the semantic vector for each event could

look like this:

E1 (”Module Not Found”): [0.57, 0.35, 0.86, ...]

E2 (”No Override File Found”): [0.79, 0.63, 0.45, ...]

E3 (”Error Bad parameters”): [0.91, 0.37, 0.27, ...]

2.4 Anomaly Detection

After extracting the feature vectors, they are input

into ML and DL methods for the purpose of detect-

ing anomalies. ML methods generally detect unusual

log sequences by analyzing log event count vectors.

Conversely, DL methods concentrate on identifying

normal patterns within sequential logs and highlight-

ing anomalies that diverge from these established pat-

terns. While ML methods excel at detecting anoma-

lies in static datasets, DL methods are more adept at

recognizing intricate temporal patterns in logs. By in-

tegrating these techniques, we can efﬁciently identify

anomalies in extensive and evolving systems.

3 EXISTING METHODS

A range of ML and DL methods have been employed

to identify anomalies in system logs, leveraging both

A Comparative Study of Log-Based Anomaly Detection Methods in Real-World System Logs

143

supervised and unsupervised learning methods. In su-

pervised learning, models are trained using labeled

datasets, whereas unsupervised learning focuses on

training with unlabeled data, with the goal of iden-

tifying anomalies based on patterns without any pre-

deﬁned labels. In this study, we have used both types

of approaches. Here, we present a summary of the

methods applied:

3.1 Supervised ML Methods

We used three supervised methods for anomaly detec-

tion: Logistic Regression (LR), Support Vector Ma-

chine (SVM), and Decision Tree (DT). The effective-

ness of supervised methods is greatly affected by the

quality and availability of the labeled dataset, as they

rely on labeled data for training purposes. Increasing

the amount of labeled data boosts the models’ abil-

ity to learn both typical and atypical patterns, which

in turn enhances their accuracy in identifying anoma-

lies.

Logistic Regression: Logistic Regression is a

commonly used classiﬁcation algorithm, particularly

effective for binary tasks, especially in anomaly de-

tection. Using a sigmoid function, it determines the

likelihood of an instance being classiﬁed into a partic-

ular class. In the process of assessing new instances,

if the probability exceeds a speciﬁed threshold (com-

monly set at 0.5), the instance is classiﬁed as anoma-

lous; otherwise, it is considered normal.

Support Vector Machine: Support Vector Ma-

chine (SVM) is a supervised classiﬁcation method

that aims to create an optimal hyperplane to separate

classes in a high-dimensional space. In anomaly de-

tection, the training data comprises event count vec-

tors along with their corresponding labels. If a new

instance is situated below the hyperplane, it is nor-

mal; if it is positioned above, then anomalous.

Decision Tree: A decision tree predicts results by

using a sequence of nodes that divide data according

to the most signiﬁcant attribute, often using metric

such as information gain (Han et al., 2022). Begin-

ning with the root node, the data is partitioned until

a stopping criterion is reached, like having uniform

class instances. To classify a new instance, one nav-

igates the decision tree from the root to a leaf node,

which indicates the predicted class for that instance.

3.2 Unsupervised ML Methods

As previously mentioned, unsupervised methods are

ideal for real world settings where labeling is fre-

quently impractical. In this study, we used Princi-

pal Component Analysis (PCA), Isolation Forest (IF),

and Log Clustering (LC) to detect anomalies without

pre-labeled data, which allows greater scalability and

ﬂexibility in anomaly detection.

PCA: Principal Component Analysis (PCA) is

a technique for reducing dimensionality that selects

key principal components to capture primary vari-

ations, reducing data to a lower-dimensional space.

Early research (Xu et al., 2009) on PCA for log-based

anomaly detection used event count vectors to iden-

tify patterns. The data was divided into a normal

space (Sn) with leading components and an anomaly

space (Sa) with others. If the calculated projection

length calculated on the anomaly space surpasses a

speciﬁed threshold, the log sequence is marked as an

anomaly.

Isolation Forest: Isolation Forest (IF) identi-

ﬁes anomalies by leveraging their rarity, making

them easier to isolate through random partitioning.

This approach constructs a collection of Isolation

Trees (iTrees) where anomalies are identiﬁed by their

shorter average path lengths (Liu et al., 2008). In log-

based anomaly detection, each Isolation Forest tree

randomly selects an event count feature and value to

split the data, isolating unique patterns. Instances

with rare patterns show shorter average path lengths

and are isolated faster. To identify anomalies, the iso-

lation score of each instance is evaluated against a set

threshold: instances with lower scores are marked as

anomalies, while those with higher scores are consid-

ered normal.

Log Clustering: LogCluster organizes logs for

anomaly detection in two phases. First it converts log

sequences into event count vectors, categorizing them

as normal or abnormal, with each cluster represented

by a centroid vector stored in a knowledge base. In

the second phase, new vectors are compared to these

centroids. If the nearest centroid is within a thresh-

old distance, the vector joins that cluster; otherwise,

a new cluster is created. Anomalies are identiﬁed by

assessing the distance between a latest log sequence

and the corresponding vectors stored in the knowl-

edge base (Lin et al., 2016). If the closest distance

surpasses the threshold, the log sequence is catego-

rized as an anomaly.

3.3 Deep Learning Methods

To take advantage of neural networks for log

anomaly detection, various deep learning techniques

have been used which involves Recurrent Neural

Networks (RNN), Convolutional Neural Networks

(CNN), Transformers, etc. In this research, we

employed two methods: DeepLog and LogRobust.

DeepLog functions as an unsupervised model, identi-

IoTBDS 2025 - 10th International Conference on Internet of Things, Big Data and Security

144

fying patterns in log data to uncover anomalies with-

out requiring labeled inputs, whereas LogRobust is

a supervised model that employs labeled data to en-

hance anomaly detection.

DeepLog: DeepLog is a complex deep learning

model that detects log anomalies using LSTM net-

works and density clustering. The model captures

sequential dependencies between log events by rep-

resenting log messages by their log event indexes.

It functions through a predictive approach, acquiring

knowledge of the typical patterns found in log se-

quences. When a deviation from the established nor-

mal pattern takes place, it signals the occurrence as a

possible anomaly. This method successfully identiﬁes

anomalies by forecasting and recognizing deviations

from anticipated log behaviors.

LogRobust: LogRobust is a supervised classiﬁ-

cation deep neural network model designed to address

the challenges posed by the instability of logs result-

ing from noisy processing and logging systems. Un-

like other models that primarily rely on log counting

vectors for features, LogRobust converts log events

into semantic vectors. In this method a pre-trained

word2vec (Joulin, 2016) model was employed, in-

tegrated with TF-IDF weights to generate represen-

tation vectors for log templates. The semantic vec-

tors are subsequently input into an Attention-based

Bi-LSTM classiﬁcation model designed to identify

anomalies. This approach has shown promising re-

sults in successfully addressing log instability.

4 EVALUATION STUDY

In this section, we discuss the dataset used, the ex-

periment setup, and the evaluation result of the ma-

chine learning and deep learning methods. We also

present a comparison to a public dataset and assess

each model’s efﬁciency, highlighting their compara-

tive performance.

4.1 Experiment Design

4.1.1 Log Dataset

The dataset we used in this experiment is composed of

system logs that have been manually extracted from

a SCiNe device. The logs document system activ-

ities, encompassing boot messages, kernel updates,

and hardware events. A total of 30,730 log messages

were gathered during a 14-hour timeframe, represent-

ing the complex pattern of real-world system logs.

Every log entry contains details like the date, time,

device name, and the content of the message.

Table 1: Log Parser Performance.

Log Parser Name

Time Taken

(sec)

# of Event

Templates

LenMa

(Clustering)

40.881 15646

Drain

(Log Structured Heuristics)

3.445 253

AEL

(Log Structured Heuristics)

4.151 252

Spell

(Longest Common Subsequence)

4.263 347

Subsequently, the logs were manually labeled as

either normal or anomalous in collaboration with do-

main experts. The manual labeling ensured that the

labels accurately reﬂected the operational behavior of

the system. The dataset, however, showed a notable

class imbalance, containing merely 184 anomalous

messages (less than 1% of the dataset), which presents

a fundamental challenge for anomaly detection mod-

els.

This work was speciﬁcally designed for the con-

text of the BusPas system, providing a detailed per-

spective that is frequently lacking in large-scale stud-

ies. While based on a limited dataset, this study pro-

vides valuable insights into log-based anomaly detec-

tion methods, highlighting their applicability to more

extensive datasets. Furthermore, the manual labeling

process establishes a solid basis for handling propri-

etary and domain-speciﬁc logs, effectively addressing

gaps often present in current large-scale studies.

4.1.2 Experiment Setup

In our experiment, we preprocess the log data and

conduct anomaly detection method as follows:

Log Parsing: We made use of various log pars-

ing techniques to transform the unstructured logs

into structured log templates. We used four parsers:

LenMa, Drain, AEL, and Spell, from the toolkit Log-

Parser (Zhu et al., 2019). Among these, Drain demon-

strated the highest levels of accuracy and efﬁciency.

The performance of each parser is illustrated in Ta-

ble 1.

Log Grouping and Feature Extraction: We em-

ployed ﬁxed and sliding window techniques for log

grouping in our dataset, as the lack of identiﬁers ex-

cluded the session window approach. A log sequence,

in this context, denotes a set of log templates that exist

within a deﬁned time frame. The window size varied

from 10 minutes to 1 minute, with step sizes ranging

from 5 to 0.5 minutes, based on the particular experi-

ment.

After grouping the logs, we converted the se-

quences into numerical feature vectors. For each ma-

chine learning model, we generated quantitative vec-

tors (event count vectors), marking a log sequence

A Comparative Study of Log-Based Anomaly Detection Methods in Real-World System Logs

145

Table 2: Log Sequence Summary.

Window Size

10 min 7 min 5 min 3 min 1 min

Total: 84 instances,

53 anomaly, 31 normal

Total: 119 instances,

63 anomaly, 56 normal

Total: 167 instances,

68 anomaly, 99 normal

Total: 278 instances,

70 anomaly, 208 normal

Total: 833 instances,

89 anomaly, 744 normal

Step Size

5 min 3 min 2 min 1 min 0.5 min

Total: 165 instances,

68 anomaly, 97 normal

Total: 276 instances,

71 anomaly, 205 normal

Total: 414 instances,

77 anomaly, 337 normal

Total: 830 instances,

91 anomaly, 739 normal

Total: 1663 instances,

111 anomaly, 1552 normal

as anomalous if any log messages were classiﬁed as

abnormal. For DeepLog, log sequences were trans-

formed into sequential vectors by indexing each log

event, while for LogRobust we used the established

method to convert log sequences into semantic vec-

tors.

Table 2 presents the number of sequences pro-

duced for each combination of window and step sizes.

In the ﬁxed window setting with a window size of 5

minutes, a total of 167 log sequences were produced,

with 68 identiﬁed as anomalous and 99 classiﬁed as

normal. In contrast, using the sliding window ap-

proach with a 5-minute window size and a step size

of 2 minutes, a total of 414 sequences were produced,

which included 77 anomalous sequences and 337 nor-

mal sequences. The increased number of sequences

in the sliding window approach is due to the overlap

between consecutive windows, leading to more com-

prehensive groupings.

Anomaly Detection: During this phase,various

machine learning and deep learning techniques were

trained using the features obtained in the prior step,

each following its speciﬁc methodology. The dataset

was split into 80% for training purposes and 20% for

testing purposes. In the case of unsupervised meth-

ods, labels were omitted from the training data, since

these methods do not need labeled data for the learn-

ing process.

All experiments were carried out on a machine

featuring an 11th Gen Intel(R) Core(TM) i7-1185G7

Processor @ 3.00GHz and 16 GB of RAM. The pa-

rameters for each method were meticulously adjusted

to guarantee peak performance. Each model under-

went several iterations, with the most favorable out-

comes being documented.

4.1.3 Evaluation Metrics

We evaluated the accuracy of the method through Pre-

cision, Recall, Speciﬁcity, and F-measure, given that

log-based anomaly detection is a binary classiﬁcation

task. Precision measures the proportion of correctly

identiﬁed anomalies compared to the total instances

that the model categorizes as anomalies. Recall evalu-

ates how accurately true anomalies are identiﬁed from

the overall count of actual anomalies. Speciﬁcity de-

notes the proportion of correctly recognized normal

sequences compared to the overall count of genuine

normal sequences. The F1-score serves as the har-

monic mean of precision and recall, providing a well-

rounded evaluation of model performance.

Precision =

TruePositive

TruePositive + FalsePositive

Recall =

TruePositive

TruePositive + FalseNegative

Speci f icity =

TrueNegative

TrueNegative + FalsePositive

F − measure =

2 ∗ Precison ∗ Recall

Precison + Recall

True Positive (TP) denotes the count of anoma-

lies that the model accurately detects, whereas True

Negative (TN) indicate the normal log sequences that

are accurately recognized as normal. A False Posi-

tive (FP) occurs when the model mistakenly identi-

ﬁes normal log sequences as anomalies, while False

Negatives (FN) refers to the actual anomalies that the

model fails to detect.

4.2 Performance of Anomaly Detection

Methods

In this section, we discuss the performance of ma-

chine learning (ML) and deep learning (DL) models

with respect to their accuracy. The ﬁndings are de-

tailed for three supervised machine learning models,

three unsupervised machine learning models, and two

deep learning models, examined across different win-

dow conﬁgurations. Every set of models is examined

thoroughly to emphasize their strengths, limitations,

and adaptability to various experimental conditions.

In conclusion, we present a brief overview of the per-

formance trends noted for each category of models.

IoTBDS 2025 - 10th International Conference on Internet of Things, Big Data and Security

146

Table 3: Accuracy of Supervised Methods.

Fixed Window

Model WS 10 min 7 min 5 min 3 min 1 min

LR P 1 1 1 1 1

R 0.636 0.923 0.929 0.857 0.889

S 1 1 1 1 1

F1 0.778 0.96 0.963 0.923 0.941

SVM P 1 1 1 1 1

R 0.909 0.923 0.929 1 0.944

S 1 1 1 1 1

F1 0.952 0.96 0.963 1 0.971

DT P 1 1 1 1 1

R 0.909 0.923 0.929 1 0.889

S 1 1 1 1 1

F1 0.952 0.96 0.963 1 0.941

Sliding Window

Model

WS,

10 min,

5 min

7 min,

3 min

5 min,

2 min

3 min,

1 min

1 min,

0.5 min

LR P 1 1 1 1 1

R 0.929 0.867 0.813 0.895 0.957

S 1 1 1 1 1

F1 0.963 0.929 0.897 0.944 0.978

SVM P 1 1 1 1 1

R 0.929 0.8 0.938 0.947 0.957

S 1 1 1 1 1

F1 0.963 0.889 0.968 0.973 0.978

DT P 1 1 1 1 1

R 0.929 0.8 0.875 0.947 0.913

S 1 1 1 1 1

F1 0.963 0.889 0.933 0.973 0.955

4.2.1 Accuracy of Supervised ML Methods

In supervised methods, while splitting the dataset

we balanced positive and negative samples to mini-

mize bias and enhance the models’ ability to iden-

tify both normal and anomalous logs. All three

methods—Logistic Regression, SVM, and Deci-

sion Tree—exhibited perfect precision and speciﬁcity

across both ﬁxed and sliding windows, with no false

positives in any conﬁguration. However, recall varied

based on the type and size of the window. Results are

shown in Table 3.

In ﬁxed windows, Logistic Regression (LR)

shows improved recall as the window size decreases,

indicating that smaller windows capture more anoma-

lies. SVM performs reliably but experiences a slight

decline in recall with larger windows (e.g., 10 and

7 minutes). However, it improves with smaller win-

dows, enhancing both recall and F1 scores. The De-

cision Tree (DT) behaves similarly to SVM but has

slightly lower recall with larger windows. It excels at

a 3-minute window, achieving perfect test results.

Sliding windows increase the number of in-

stances by overlapping consecutive windows, enhanc-

ing models’ ability to detect patterns and anomalies.

However, the step size is critical for accurate detec-

tion. LR and SVM consistently perform well, par-

ticularly with smaller windows and step sizes in de-

tailed and imbalanced data. In contrast, Decision

Tree shows variability in recall and F1 scores, fac-

ing more challenges in generalization compared to

LR and SVM. As window and step sizes decrease,

all models improve, but LR and SVM exhibit greater

robustness in handling imbalanced datasets and iden-

tifying anomalies.

In summary, LR and SVM demonstrated impres-

sive adaptability across different experimental con-

ditions, successfully handling imbalanced data and

variations in window and step sizes with signiﬁcant

consistency. Although there are challenges with gen-

eralization, DT’s remarkable performance in limited

anomaly detection situations highlights its potential

for targeted applications. The trends suggest that se-

lecting the optimal windowing strategy and step size

is crucial for improving the performance of super-

vised machine learning models in log-based anomaly

detection.

4.2.2 Accuracy of Unsupervised ML Methods

For unsupervised methods ﬁxed and sliding window

techniques were used with the same parameters as

those applied in supervised methods. Although unsu-

pervised methods do not required labels for training,

we employed our labeled dataset during testing to as-

sess their performance.

PCA achieves perfect precision and speciﬁcity

with ﬁxed windows, avoiding false positives, but

missing many true anomalies due to low recall. Isola-

tion Forest (IF) maintains high precision and speci-

ﬁcity across most window sizes but struggles with

smaller windows, leading to more false positives and

reduced recall. In contrast, LogClustering (LC) offers

a better balance between precision and recall, partic-

ularly excelling with smaller window sizes. Although

its precision and speciﬁcity are slightly lower than

PCA and Isolation Forest, LogClustering achieves the

highest F1 scores with shorter windows, demonstrat-

ing better adaptability in a detailed ﬁxed window con-

text.

In sliding windows, PCA shows improved recall

with smaller window and step sizes, achieving a re-

call of 0.9 with a 5-minute window and 2-minute

step, though precision drops. Isolation Forest main-

tains high precision, particularly in larger windows,

but its recall is limited, while performance declines in

smaller settings. LogClustering displays more vari-

ability; it achieves high recall in some conﬁgura-

tions (e.g., W=7, S=3) but had lower precision. This

suggests LogClustering is more effective at detecting

anomalies, but with increased false positives in de-

tailed conﬁgurations. Table 4 represents the results.

In summary, PCA demonstrates strong precision

and speciﬁcity, while it faces challenges in identify-

A Comparative Study of Log-Based Anomaly Detection Methods in Real-World System Logs

147

Table 4: Accuracy of Unsupervised Methods.

Fixed Window

Model WS 10 min 7 min 5 min 3 min 1 min

PCA P 1 1 1 1 1

R 0.286 0.25 0.222 0.222 0.25

S 1 1 1 1 1

F1 0.444 0.4 0.364 0.364 0.4

IF P 1 1 1 1 0.25

R 0.286 0.25 0.222 0.222 0.083

S 1 1 1 1 0.981

F1 0.444 0.4 0.364 0.364 0.125

LC P 0.333 0.364 0.333 0.25 0.083

R 0.571 0.5 0.667 0.778 0.583

S 0.2 0.563 0.52 0.553 0.503

F1 0.421 0.421 0.444 0.378 0.145

Sliding Window

Model

WS,

10 min,

5 min

7 min,

3 min

5 min,

2 min

3 min,

1 min

1 min,

0.5 min

PCA P 1 1 0.161 1 0.333

R 0.222 0.182 0.909 0.25 0.308

S 1 1 0.278 1 0.975

F1 0.364 0.308 0.274 0.4 0.32

IF P 1 1 1 0.333 0.2

R 0.222 0.182 0.182 0.083 0.23

S 1 1 1 0.987 0.966

F1 0.364 0.308 0.308 0.133 0.222

LC P 0.375 0.32 0.2 0.071 0.073

R 0.667 0.727 0.636 0.5 0.462

S 0.583 0.622 0.625 0.494 0.763

F1 0.48 0.444 0.311 0.125 0.126

ing true anomalies. Isolation Forest excels with larger

windows but struggles with intricate conﬁgurations.

LogClustering strikes an impressive balance, demon-

strating excellence in recall and F1 scores, although

it requires meticulous tuning to reduce false positives.

These trends highlight the importance of choosing the

right method according to the particular needs of the

anomaly detection task, including an appropriate bal-

ance between precision and recall.

4.2.3 Accuracy of DL Methods

DeepLog shows reliable recall and F1-scores in larger

ﬁxed windows (10-minute and 7-minute), empha-

sizing its ability to capture long-term dependencies.

Nonetheless, its performance diminishes in smaller

windows, revealing constraints in managing intricate

patterns, although it attains greater speciﬁcity in these

instances. This indicates that DeepLog is more appro-

priate for situations that demand a wider contextual

comprehension instead of detailed anomaly detection.

In comparison, LogRobust consistently surpasses

DeepLog in all ﬁxed window settings, attaining ﬂaw-

less recall, precision, and speciﬁcity at the 5-minute

window. This emphasizes LogRobust’s ﬂexibility

with different window sizes and its capability to man-

age imbalanced datasets efﬁciently.

In sliding windows, DeepLog enhances speciﬁcity

and F1-score with a 7-minute window and a 5-minute

Table 5: Accuracy of Deep Learning Methods.

Fixed Window

Model WS 10 min 7 min 5 min 3 min

DeepLog P 0.75 0.727 0.6 0.692

R 0.857 1 0.75 1

S 0.8 0.812 0.84 0.913

F1 0.8 0.842 0.667 0.818

LogRobust P 0.778 0.889 1 0.75

R 1 1 1 1

S 0.8 0.938 1 0.935

F1 0.875 0.941 1 0.857

Sliding Window

Model

WS,

10 min,

5 min

7 min,

3 min

5 min,

2 min

3 min,

1 min

DeepLog P 0.667 0.75 0.75 0.355

R 0.923 1 0.273 0.393

S 0.684 0.833 0.966 0.851

F1 0.774 0.857 0.4 0.373

LogRobust P 0.867 0.9 1 1

R 1 1 1 0.964

S 0.895 0.944 1 1

F1 0.927 0.947 1 0.982

step size, yet its effectiveness declines with smaller

window conﬁgurations, highlighting difﬁculties in de-

tecting overlapping anomalies. Conversely, LogRo-

bust demonstrates outstanding performance in sliding

window conﬁgurations, exceeding its ﬁxed window

capabilities and exhibiting remarkable recall and pre-

cision across various step sizes. The adaptability of

LogRobust establishes it as a dependable approach for

identifying anomalies in both ﬁxed and sliding conﬁg-

urations, especially in thorough analyses.

In summary, although DeepLog demonstrates ad-

vantages in extensive windows and wider contexts,

LogRobust stands out as the more resilient and ﬂex-

ible model, achieving better outcomes across diverse

experimental scenarios. The results highlight the sig-

niﬁcance of choosing deep learning methods tailored

to the particular needs of anomaly detection tasks.

The results are presented in Table 5.

4.3 Comparison with Public Dataset

We evaluated the accuracy of each method used

on our small-scale dataset against their performance

on the publicly available HDFS dataset (Xu et al.,

2009). The HDFS dataset is composed of 575,061

log blocks, with 16,838 blocks (2.9%) identiﬁed as

anomalous (Du et al., 2017). The benchmark re-

sults for these methods were obtained using the HDFS

dataset, employing a session windowing approach for

log grouping. To facilitate a meaningful comparison,

we showcased the optimal results of each method on

our small dataset, utilizing speciﬁc window types and

sizes, as illustrated in Table 6 . This comparison un-

covers several important insights:

IoTBDS 2025 - 10th International Conference on Internet of Things, Big Data and Security

148

Table 6: Comparison with Public Dataset.

Model HDFS Dataset Private Dataset

Precision Recall F1 Precision Recall F1 Window Type

LR 0.95 1 0.98 1 0.96 0.98 Sliding (WS: 1 min, SS: 0.5 min)

Decision Tree 1 0.99 1 1 0.95 0.97 Sliding (WS: 3 min, SS: 1 min)

SVM 0.95 1 0.98 1 1 1 Fixed (WS: 3 min)

PCA 0.98 0.67 0.79 1 0.29 0.44 Fixed (WS: 10 min)

Isolation Forest 0.83 0.78 0.8 1 0.29 0.44 Fixed (Window Size: 10 min)

LogClustering 0.87 0.74 0.8 0.32 0.73 0.44 Sliding (WS: 7 min, SS: 3 min)

DeepLog 0.95 0.96 0.96 0.9 1 0.95 Sliding (WS: 7 min, SS: 3 min)

LogRobust 0.98 1 0.99 1 1 1 Fixed (WS: 5 min)

The outcomes of supervised methods applied to

our limited dataset are remarkably close to, and in

some cases better than, the benchmark results on the

HDFS dataset. This emphasizes the capability of

a properly labeled dataset, regardless of its size, to

achieve similar accuracy.

In contrast, the performance of unsupervised mod-

els was notably lower when compared to the HDFS

dataset. This indicates that unsupervised methods

need a more extensive dataset to successfully catego-

rize data and detect anomalies, as their effectiveness

is signiﬁcantly dependent on the presence of varied

patterns and an adequate amount of data.

Both deep learning methods excelled on our small

dataset, showcasing impressive accuracy even with

the restricted data size. This suggests that deep

learning models, especially those utilizing sequential

and semantic patterns, can adjust well to small-scale

datasets while maintaining performance levels.

This study further conﬁrms that log-based

anomaly detection methods can be adapted to datasets

of different sizes, opening up possibilities for their use

in both small-scale and large-scale systems.

4.4 Efﬁciency of Anomaly Detection

Methods

Efﬁciency measures how quickly a model can per-

form anomaly detection. We assess this efﬁciency by

tracking the running time required for the anomaly

detector during both the training and testing phases.

Figure 2 shows that supervised machine learning

methods maintain low processing times across differ-

ent window sizes, with Decision Tree (DT) being the

fastest. This indicates that supervised methods are ef-

ﬁcient and stable despite changes in window size. In

contrast, most unsupervised methods have longer pro-

cessing times, although PCA performs comparably

to the supervised methods. Isolation Forest has the

longest processing time overall, while LogClustering

slows down signiﬁcantly with smaller window sizes,

Figure 2: Running time of ML methods with varying win-

dow size.

Figure 3: Efﬁciency of Deep Learning methods.

highlighting the greater computational demands of

these methods, especially with larger log sequences.

In deep learning, DeepLog demonstrates impres-

sive training times, achieving the shortest duration in

the 5-minute window and maintaining efﬁcient test-

ing capabilities. In contrast, LogRobust has sig-

niﬁcantly longer training times, especially in the 5-

minute window, but outperforms DeepLog in testing

speed. While DeepLog is more efﬁcient in training,

LogRobust may offer advantages for speciﬁc perfor-

mance needs, particularly for faster inference during

testing. Figure 3 illustrates these results.

5 DISCUSSION

This study provides a comparative evaluation of su-

pervised, unsupervised, and deep learning techniques

A Comparative Study of Log-Based Anomaly Detection Methods in Real-World System Logs

149

for log-based anomaly detection using real-world

data. Although the ﬁndings offer valuable insights,

certain aspects require further discussion, particularly

regarding dataset limitations, comparative evaluation,

and model efﬁciency. This section discusses the lim-

itations of our study and suggests possible directions

for future research.

Addressing Dataset Constraints: One primary

limitation of this study is the relatively small dataset,

which could affect the generalizability of the results

to larger systems. Our comparison with the publicly

available datasets shows that the performance trends

in our dataset are consistent with those found in larger

datasets. This indicates that our ﬁndings, despite be-

ing derived from a restricted dataset, continue to be

relevant and insightful. Moreover, our dataset cap-

tures real-world limitations such as the lack of labeled

data and unstructured log formats, making it relevant

for practical applications. In the future, we will focus

on broadening the scope of our study by integrating

more extensive datasets that feature a larger volume

and diversity of log messages.

Comparative Evaluation of Methods: Our anal-

ysis shows the key trade-offs between different

anomaly detection methods. Supervised methods

such as logistic regression, SVM, and Decision Tree

demonstrated impressive accuracy when there was ac-

cess to labeled data; however, they tend to be less ef-

fective in real-world scenarios where labeled anoma-

lies are limited. Unsupervised methods such as PCA

and Isolation Forest identiﬁed anomalies without the

need for labeled data, yet they exhibited variability in

both precision and recall, especially when employing

various windowing techniques. Deep learning mod-

els (LogRobust) achieved the optimal precision-recall

balance, though they demanded signiﬁcant computa-

tional resources and extended training time.

The ﬁndings indicate that the selection of method

is contingent upon the particular needs of the system.

For scenarios that require real-time detection, conven-

tional machine learning models, such as SVM and

Decision Tree, provide quick and reliable options. In

situations where there are limited labeled data, unsu-

pervised methods such as LogClustering can be em-

ployed, although they necessitate careful parameter

tuning to minimize false positives. Deep learning

models like LogRobust provide exceptional perfor-

mance, yet they might be better suited for batch pro-

cessing or environments with ample computational

resources.

Timeliness and Efﬁciency of Approaches: Ef-

ﬁciency plays a vital role in anomaly detection, es-

pecially in the context of real-time applications. Our

evaluation indicates that Decision Tree and SVM de-

mand considerably less processing time, making them

appropriate for real-time anomaly detection. Con-

versely, deep learning models, especially LogRobust,

requires extended training periods yet, provide en-

hanced accuracy. The ﬁndings indicate that in in-

dustrial environments, lightweight ML models might

be more suitable for real-time monitoring, while deep

learning techniques offer enhanced accuracy for de-

tecting anomalies in historical log analysis.

Future Work: Future research will aim to

broaden the dataset and integrate semi-supervised

learning methods to lessen the reliance on manual la-

beling, thus enhancing efﬁciency. Furthermore, the

integration of various feature representations, such as

the combination of sequential and semantic vectors,

could enhance detection performance even more by

capturing more complex data patterns. We also plan

to investigate additional techniques like invariant min-

ing (Lou et al., 2010), LogAnomaly (Meng et al.,

2019), and CNN (Lu et al., 2018) to assess model ef-

fectiveness across a broader range of methods. Addi-

tionally, we aim to deploy the most effective method

for real-world anomaly detection.

In summary, this study provides a foundation for

understanding the trade-offs among various anomaly

detection methods and offers valuable insights into

their relevance for real-world log analysis. The ﬁnd-

ings highlight the signiﬁcance of choosing models

that align with system constraints, computational ef-

ﬁciency, and the availability of data.

6 RELATED WORK

Log-based anomaly detection (Nandi et al., 2016, Bao

et al., 2018, He et al., 2018, Nedelkoski et al., 2020,

Wang et al., 2020) has experienced notable progress

in multiple areas, such as distributed systems, cloud

environments, and so on. The exploration within this

domain has encompassed conventional log mining,

machine learning, and, more recently, deep learning

methodologies. The main approaches for log-based

anomaly detection can be classiﬁed into supervised

learning, unsupervised learning, and deep learning

techniques.

Techniques of supervised learning have frequently

been used in the detection of log anomalies. For in-

stance, decision trees (Chen et al., 2004) have been

used to pinpoint anomalies in extensive internet-based

settings, while SVM (Liang et al., 2007) classiﬁers

have been employed to uncover failures within event

logs. Regression-based methods have been investi-

gated for analyzing cloud resource logs and identify-

ing abnormalities (He et al., 2018).

IoTBDS 2025 - 10th International Conference on Internet of Things, Big Data and Security

150

Unsupervised techniques do not rely on labeled

data, which enhances their ﬂexibility in handling real-

world datasets. For example, the Isolation Forest (Liu

et al., 2008) utilizes isolation principles to identify

outliers, whereas PCA (Xu et al., 2009) was one of the

initial techniques developed for extracting system is-

sues from console logs. Invariant Mining (Lou et al.,

2010) is a signiﬁcant approach that identiﬁes linear

associations among log events derived from log event

count vectors, facilitating effective anomaly detec-

tion. The control ﬂow graphs serve to represent a sys-

tem’s typical execution paths, with anomalies being

ﬂagged when transition probabilities or sequences di-

verge from these learned models (Nandi et al., 2016).

The emergence of deep learning has brought sev-

eral novel methods for detecting anomalies based on

log data. The DeepLog (Du et al., 2017) model

uses LSTM models to forecast anomalies in log se-

quences by recognizing key patterns. It uses LSTM

in an unsupervised manner to predict the next log

event. LogAnomaly (Meng et al., 2019) improves

log stability by merging sequential and quantitative

patterns and semantic information. A probabilistic la-

bel estimation technique, integrated with an attention-

based GRU neural network, was developed to ad-

dress label scarcity (Yang et al., 2021). Additionally,

other approaches, such as CNN-based anomaly de-

tection (Lu et al., 2018), Transformer-based architec-

tures (Nedelkoski et al., 2020), LSTM-based GANs

(Xia et al., 2021), and BERT (Guo et al., 2021), have

also been explored for their effectiveness in anomaly

detection.

7 CONCLUSIONS

Logs are crucial for ensuring the reliability and secu-

rity of modern devices, but their volume and complex-

ity pose signiﬁcant challenges for anomaly detection.

Most existing research relies on public datasets and

static environments, limiting insights into real-world

effectiveness. Our paper provides a comparative anal-

ysis of machine learning and deep learning techniques

for log-based anomaly detection using a real-world

dataset. We assess the performance of these methods

through various metrics and analyze computational

efﬁciency. Our ﬁndings reveal that performance is

heavily inﬂuenced by window settings, with super-

vised methods like SVM and LogRobust achieving

the highest accuracy, making SVM the most efﬁcient

overall.

ACKNOWLEDGEMENTS

This research was conducted in collaboration with

BusPas Inc. and Mitacs Accelerate Program, and we

sincerely appreciate their support in providing access

to data and resources. We would also like to thank

Wissem Maazoun and his team for their valuable in-

sights and assistance throughout this study.

REFERENCES

([Accessed: 28 October 2024]). Real-world applications for

remarkable innovations [Use Cases]. https://buspas.

com/.

Bao, L., Li, Q., Lu, P., Lu, J., Ruan, T., and Zhang, K.

(2018). Execution anomaly detection in large-scale

systems through console log analysis. Journal of Sys-

tems and Software, 143:172–186.

Bodik, P., Goldszmidt, M., Fox, A., Woodard, D. B., and

Andersen, H. (2010). Fingerprinting the datacenter:

automated classiﬁcation of performance crises. In

Proceedings of the 5th European conference on Com-

puter systems, pages 111–124.

Chen, M., Zheng, A. X., Lloyd, J., Jordan, M. I., and

Brewer, E. (2004). Failure diagnosis using deci-

sion trees. In International Conference on Autonomic

Computing, 2004. Proceedings., pages 36–43. IEEE.

Chen, Z., Liu, J., Gu, W., Su, Y., and Lyu, M. R.

(2021). Experience report: Deep learning-based sys-

tem log analysis for anomaly detection. arXiv preprint

arXiv:2107.05908.

Du, M. and Li, F. (2016). Spell: Streaming parsing of

system event logs. In 2016 IEEE 16th International

Conference on Data Mining (ICDM), pages 859–864.

IEEE.

Du, M., Li, F., Zheng, G., and Srikumar, V. (2017).

Deeplog: Anomaly detection and diagnosis from sys-

tem logs through deep learning. In Proceedings of the

2017 ACM SIGSAC conference on computer and com-

munications security, pages 1285–1298.

Guo, H., Yuan, S., and Wu, X. (2021). Logbert: Log

anomaly detection via bert. In 2021 international joint

conference on neural networks (IJCNN), pages 1–8.

IEEE.

Hamooni, H., Debnath, B., Xu, J., Zhang, H., Jiang, G.,

and Mueen, A. (2016). Logmine: Fast pattern recog-

nition for log analytics. In Proceedings of the 25th

ACM international on conference on information and

knowledge management, pages 1573–1582.

Han, J., Pei, J., and Tong, H. (2022). Data mining: concepts

and techniques. Morgan kaufmann.

He, P., Zhu, J., Zheng, Z., and Lyu, M. R. (2017). Drain: An

online log parsing approach with ﬁxed depth tree. In

2017 IEEE international conference on web services

(ICWS), pages 33–40. IEEE.

He, S., Lin, Q., Lou, J.-G., Zhang, H., Lyu, M. R., and

Zhang, D. (2018). Identifying impactful service sys-

tem problems via log analysis. In Proceedings of the

A Comparative Study of Log-Based Anomaly Detection Methods in Real-World System Logs

151

2018 26th ACM joint meeting on European software

engineering conference and symposium on the foun-

dations of software engineering, pages 60–70.

He, S., Zhu, J., He, P., and Lyu, M. R. (2016). Experi-

ence report: System log analysis for anomaly detec-

tion. In 2016 IEEE 27th international symposium on

software reliability engineering (ISSRE), pages 207–

218. IEEE.

Joulin, A. (2016). Fasttext. zip: Compressing text classiﬁ-

cation models. arXiv preprint arXiv:1612.03651.

Le, V.-H. and Zhang, H. (2022). Log-based anomaly detec-

tion with deep learning: How far are we? In Proceed-

ings of the 44th international conference on software

engineering, pages 1356–1367.

Liang, Y., Zhang, Y., Xiong, H., and Sahoo, R. (2007). Fail-

ure prediction in ibm bluegene/l event logs. In Sev-

enth IEEE International Conference on Data Mining

(ICDM 2007), pages 583–588. IEEE.

Lin, Q., Zhang, H., Lou, J.-G., Zhang, Y., and Chen, X.

(2016). Log clustering based problem identiﬁcation

for online service systems. In Proceedings of the

38th International Conference on Software Engineer-

ing Companion, pages 102–111.

Liu, F. T., Ting, K. M., and Zhou, Z.-H. (2008). Isolation

forest. In 2008 eighth ieee international conference

on data mining, pages 413–422. IEEE.

Lou, J.-G., Fu, Q., Yang, S., Xu, Y., and Li, J. (2010). Min-

ing invariants from console logs for system problem

detection. In 2010 USENIX Annual Technical Confer-

ence (USENIX ATC 10).

Lu, S., Wei, X., Li, Y., and Wang, L. (2018). Detect-

ing anomaly in big data system logs using convolu-

tional neural network. In 2018 IEEE 16th Intl Conf

on Dependable, Autonomic and Secure Computing,

16th Intl Conf on Pervasive Intelligence and Comput-

ing, 4th Intl Conf on Big Data Intelligence and Com-

puting and Cyber Science and Technology Congress

(DASC/PiCom/DataCom/CyberSciTech), pages 151–

158. IEEE.

Makanju, A. A., Zincir-Heywood, A. N., and Milios, E. E.

(2009). Clustering event logs using iterative partition-

ing. In Proceedings of the 15th ACM SIGKDD inter-

national conference on Knowledge discovery and data

mining, pages 1255–1264.

Meng, W., Liu, Y., Zhu, Y., Zhang, S., Pei, D., Liu, Y.,

Chen, Y., Zhang, R., Tao, S., Sun, P., et al. (2019).

Loganomaly: Unsupervised detection of sequential

and quantitative anomalies in unstructured logs. In

IJCAI, volume 19, pages 4739–4745.

Nandi, A., Mandal, A., Atreja, S., Dasgupta, G. B., and

Bhattacharya, S. (2016). Anomaly detection using

program control ﬂow graph mining from execution

logs. In Proceedings of the 22nd ACM SIGKDD inter-

national conference on knowledge discovery and data

mining, pages 215–224.

Nedelkoski, S., Bogatinovski, J., Acker, A., Cardoso, J.,

and Kao, O. (2020). Self-attentive classiﬁcation-based

anomaly detection in unstructured logs. In 2020 IEEE

International Conference on Data Mining (ICDM),

pages 1196–1201. IEEE.

Nedelkoski, S., Bogatinovski, J., Acker, A., Cardoso, J.,

and Kao, O. (2021). Self-supervised log parsing.

In Machine Learning and Knowledge Discovery in

Databases: Applied Data Science Track: European

Conference, ECML PKDD 2020, Ghent, Belgium,

September 14–18, 2020, Proceedings, Part IV, pages

122–138. Springer.

Shima, K. (2016). Length matters: Clustering system

log messages using length of words. arXiv preprint

arXiv:1611.03213.

Wang, J., Tang, Y., He, S., Zhao, C., Sharma, P. K., Alfarraj,

O., and Tolba, A. (2020). Logevent2vec: Logevent-to-

vector based anomaly detection for large-scale logs in

internet of things. Sensors, 20(9):2451.

Xia, B., Bai, Y., Yin, J., Li, Y., and Xu, J. (2021). Loggan: a

log-level generative adversarial network for anomaly

detection using permutation event modeling. Informa-

tion Systems Frontiers, 23:285–298.

Xu, W., Huang, L., Fox, A., Patterson, D., and Jordan,

M. I. (2009). Detecting large-scale system problems

by mining console logs. In Proceedings of the ACM

SIGOPS 22nd symposium on Operating systems prin-

ciples, pages 117–132.

Yang, L., Chen, J., Wang, Z., Wang, W., Jiang, J., Dong,

X., and Zhang, W. (2021). Semi-supervised log-based

anomaly detection via probabilistic label estimation.

In 2021 IEEE/ACM 43rd International Conference

on Software Engineering (ICSE), pages 1448–1460.

IEEE.

Zhang, X., Xu, Y., Lin, Q., Qiao, B., Zhang, H., Dang, Y.,

Xie, C., Yang, X., Cheng, Q., Li, Z., et al. (2019).

Robust log-based anomaly detection on unstable log

data. In Proceedings of the 2019 27th ACM joint meet-

ing on European software engineering conference and

symposium on the foundations of software engineer-

ing, pages 807–817.

Zhu, J., He, S., Liu, J., He, P., Xie, Q., Zheng, Z., and Lyu,

M. R. (2019). Tools and benchmarks for automated

log parsing. In 2019 IEEE/ACM 41st International

Conference on Software Engineering: Software En-

gineering in Practice (ICSE-SEIP), pages 121–130.

IEEE.

IoTBDS 2025 - 10th International Conference on Internet of Things, Big Data and Security

152