USB-IDS-TC: A Flow-Based Intrusion Detection Dataset of DoS Attacks

in Different Network Scenarios

Marta Catillo

, Antonio Pecchia

and Umberto Villano

Universit

a degli Studi del Sannio, Benevento, Italy

Keywords:

Dataset, Intrusion Detection, Denial of Service, Network Flows, Trafﬁc Control, Network Emulation.

Abstract:

Network intrusion detection systems (NIDS) play a key role for cybersecurity. Most of the times, NIDS

are built on machine learning/deep learning (ML/DL) models that are trained and tested on public intrusion

detection datasets. This paper presents the novel USB-IDS-TC dataset, conceived to explore the dependence

of ML/DL-based NIDS on the network used to collect the training trafﬁc data. In this new publicly-available

dataset, DoS attacks have been conducted in different network scenarios, in the belief that the network has a

non-negligible effect on the detection capability of the NIDS as indicated by our initial analysis. Differently

from existing datasets that collect the data in a single scenario, USB-IDS-TC allows studying the dependence

of the attacks, trafﬁc features and ML/DL models on the network, in order to strive for generalizable and

widely-applicable NIDS.

1 INTRODUCTION

In the desperate and presumably endless struggle

against network hackers and misusers, network intru-

sion detection systems (NIDS) currently play a key

role. The detection of potentially dangerous network

activity is canonically carried out by means of pol-

icy rules and signatures based on known attacks. Un-

fortunately, this requires frequent signature updates

and is mostly ineffective against never-seen-before at-

tacks (0-day attacks). This is the primary reason for

the blossoming of an ever-increasing body of research

on machine learning/deep learning (ML/DL) detec-

tors, which aim to infer the class of network trafﬁc

or to detect anomalies by comparing the network traf-

ﬁc to a legitimate baseline. Whenever a signiﬁcant

difference is detected, an alert is raised. The hope of

the scientiﬁc community working on this topic is that

anomaly-based detectors will be able to detect also 0-

day attacks, as they should deviate from normal net-

work trafﬁc.

Most of the times, ML/DL NIDS are trained and

tested on public intrusion detection datasets. As a

matter of fact, public datasets, such as KDD-CUP’99

(

Ozg

ur and Erdem, 2016), UNSW-NB15 (Moustafa

https://orcid.org/0000-0002-5025-7969

https://orcid.org/0000-0003-2869-8423

https://orcid.org/0000-0001-5382-4650

and Slay, 2015), NDSec-1 2016 (Beer and Buehler,

2017), CICIDS2017 (Sharafaldin et al., 2018), and

many others (Ring et al., 2019) have become the

de facto standard benchmarks for evaluating novel

NIDS techniques. The wide availability of intru-

sion datasets, together with the rapid advancement

of deep learning frameworks has led to the emersion

of numerous attack detection methods in the litera-

ture. Notably, some of these detectors achieve highly

promising detection rates, approaching 100% on pub-

lic datasets. It has been argued elsewhere that many

studies leveraging public datasets for NIDS research

tend to “blindly” trust the data without considering

the representativeness of the network trafﬁc and its

potential cybersecurity implications, such as the ac-

tual impact on service continuity and performance of

the targeted applications (Catillo et al., 2021b). For

this reason, in the past our research group, based at

the University of Sannio in Benevento (USB), Italy,

has released a dataset (USB-IDS-1

) where the net-

work data collected was complemented with (i) per-

formance measurements of the victim under attack to

make it clear if the attack was actually successful in

disrupting the victim service and (ii) the actual conﬁg-

uration of the victim server (capacity, multithreading

capability and potential defense mechanism enabled,

if any).

https://idsdata.ding.unisannio.it/usbids1.html

302

Catillo, M., Pecchia, A. and Villano, U.

USB-IDS-TC: A Flow-Based Intrusion Detection Dataset of DoS Attacks in Different Network Scenarios.

DOI: 10.5220/0013248600003899

In Proceedings of the 11th International Conference on Information Systems Security and Privacy (ICISSP 2025) - Volume 1, pages 302-309

ISBN: 978-989-758-735-1; ISSN: 2184-4356

Following up our past work on these topics, this

paper presents the novel USB-IDS-TC dataset, which

addresses a different – and strongly overlooked by

the literature – issue: the dependence of the ML/DL-

based NIDS on the network scenario used to collect

the training trafﬁc data. USB-IDS-TC is motivated

by the fact that the majority of the ML/DL NIDS

do not examine individual network packets, but rely

on bidirectional trafﬁc ﬂows. Flows are obtained by

hardware (e.g., routers) or by software from the pack-

ets exchanged in the two directions pertaining to the

same connection. Each ﬂow consists of a record of

features suited for ML/DL purposes. The extracted

features include packet and payload length, number

of packets, transmitted bytes and mean length of pack-

ets, along with statistical measurements of the tim-

ing of the communication. ML/DL NIDS exploit

heavily time-related features, such as the interarrival

times of forward and backward packets making up

the ﬂow. For example, the ubiquitous ﬂow extrac-

tor CICFlowMeter, originally named ISCXFlowMe-

ter (Draper-Gil. et al., 2016), in its most recent ver-

sion generates bidirectional ﬂow records made of 93

features: it must be noted that around 30% of features

are timing-related. In consequence, it is almost natu-

ral to wonder if different network scenarios with dif-

ferent bandwidth and latency – which surely impact

the timing-related features – would affect the detec-

tion capability of ML/DL NIDS.

In response to the challenge presented, USB-IDS-

TC provides network ﬂow data – both normal traf-

ﬁc and Denial of Service (DoS) attacks – obtained in

different network scenarios. The dataset is publicly

available on our web site

. To the best of our knowl-

edge, it is the ﬁrst time that the ﬂow data relative to

the same normal trafﬁc and DoS attacks are collected

in different network scenarios. Our hope is that the

dataset could be beneﬁcial to the NIDS community,

making clear the possible hidden effect of network

characteristics. USB-IDS-TC allows studying the de-

pendence of the attacks, trafﬁc features and ML/DL

models on the network scenario, in order to strive for

generalizable and widely-applicable NIDS.

The paper is organized as follows. Section 2 dis-

cusses related work in the area and the original con-

tribution of USB-IDS-TC. Section 3 describes the ex-

perimental testbed and its emulation capabilities, the

attacks performed and the collection procedure. Sec-

tion 4 discusses the network scenarios and the dataset

organization. The key insights learned from our data

and possible future research directions are presented

in Section 5. In Section 6 we draw our conclusions.

https://idsdata.ding.unisannio.it/usbidstc.html

2 RELATED WORK

Public intrusion datasets have boosted the academic

research on NIDS. Typically, these datasets are ac-

cessible in a raw format, such as PCAP packet data

ﬁles, or in a more “reﬁned” format, such as network

ﬂows organized in comma-separated values (CSV)

ﬁles. These CSV ﬁles are ideally suited for ML ap-

plications, which is the reason for the extensive use

of public datasets. A number of these datasets have

achieved notable popularity within the literature.

For example, KDD-CUP’99

can be considered

the “pioneer” for ML-based intrusion detection. It

was collected in 1999 and consists of two weeks of in-

stances free from attacks and ﬁve weeks of instances

containing attacks. It is important to note that, de-

spite its continued popularity (Kushwaha et al., 2017)

and status as a foundational contribution to the ﬁeld

of intrusion detection, KDD-CUP’99 has several doc-

umented drawbacks (McHugh, 2000). Additionally,

after approximately two decades, it is no longer an

accurate representation of present-day network traf-

ﬁc. This also applies to the more recent NSL-KDD

dataset (Tavallaee et al., 2009), a version of KDD-

CUP’99 with reduced size and duplicate entries re-

moved. In recent years, there has been a growing

trend towards critical analysis of security datasets.

For example, (Silva et al., 2020) identiﬁed statistical

ﬂaws within the KDD-CUP’99 dataset that could in-

troduce bias during training of IDS models.

Among the latest publicly available intrusion de-

tection datasets, CICIDS2017 is undoubtedly the one

that has gained the greatest popularity. Released by

the Canadian Institute for Cybersecurity (CIC), its

reference paper (Sharafaldin et al., 2018) is currently

(November 2024) cited almost 4000 times on Google

Scholar, which places it among the most frequently

used datasets. A testbed framework was implemented

by the authors to generate benign and attack data sys-

tematically using different proﬁles. The dataset of-

fers both ready-to-use labeled ﬂows and raw PCAP

ﬁles. Furthermore, the authors have developed CI-

CFlowMeter (Lashkari et al., 2017), a tool to gener-

ate network ﬂows from raw PCAP ﬁles, which has

quickly become very popular. Regrettably, it was not

until after a long period of “blind” utilization of the

CICIDS2017 dataset by NIDS researchers that a num-

ber of studies identiﬁed major bugs and errors affect-

ing CICFlowMeter, resulting in incorrect ﬂow records

from both CICIDS2017 and the younger CSE-CIC-

IDS2018

(Engelen et al., 2021; Rosay et al., 2022;

Liu et al., 2022; Lanvin et al., 2023).

https://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html

https://registry.opendata.aws/cse-cic-ids2018/

USB-IDS-TC: A Flow-Based Intrusion Detection Dataset of DoS Attacks in Different Network Scenarios

303

Another widely known public intrusion detection

dataset is UNSW-NB15 (Moustafa and Slay, 2015).

Created by the Australian Centre for Cyber Security

(ACCS), it contains both real normal activities and

synthetic attack behaviors. The dataset is available

in both CSV and raw PCAP formats.

More recently, there has been a shift in focus away

from general purpose networks and towards attacking

networks designed for speciﬁc applications. For ex-

ample, the aforementioned CIC proposed datasets for

IoT, IoMT and IoV environments (Neto et al., 2023;

Dadkhah et al., 2024; Neto et al., 2024), and for elec-

tric vehicle (EV) charging infrastructures (Kim et al.,

2023; Buedi et al., 2024).

Our Contribution. In our previous work (Catillo

et al., 2021a), we introduced a novel dataset that

took into account both performance metrics and

application-level facets, including “accessory” pa-

rameters of the experimental testbed such as com-

modity defense mechanisms. Here instead we present

a dataset that focuses on the inﬂuence of the network

scenarios on the trafﬁc data. USB-IDS-TC captures

well known DoS attacks under diverse network sce-

narios, enabling researchers to investigate the impact

of network characteristics on intrusion detection per-

formance of ML/DL-based NIDS. To the best of our

knowledge, this is the ﬁrst dataset of its kind, prepar-

ing the way for a deeper understanding of the trans-

ferability of intrusion detection methods.

3 TESTBED

It is not realistically feasible to set up a testbed with

enough hardware to provide the wide range of per-

formances characterizing current communication net-

works. In order to perform the packet capture for

our dataset, we decided to resort to an extensively-

conﬁgurable testbed environment, which makes it

possible to reproduce the behavior exhibited by most

real-life networks. The environment is based on

Docker containers, canonically connected through a

user-deﬁned internal network that employs the bridge

network driver. It is crucial to note that the bridge

Docker network is characterized by several key at-

tributes: very high bandwidth

, minimal latency, and

absence of transmission errors (after all, it not a “real”

network). Consequently, to introduce network condi-

tions that are representative of real-world scenarios, it

is only necessary to exploit techniques such as band-

width shaping, injection of variable latency, and pos-

sibly inclusion of transmission errors.

The iperf3 utility reports 23.6 Gbits/s on the work-

station used to collect the dataset

Figure 1: Experimental testbed.

The Internet Engineering Task Force (IETF) Re-

quest for Comments (RFC) 2475

deﬁnes trafﬁc

shaping as the act of delaying packets within a traf-

ﬁc stream. This delay is introduced to ensure that the

stream conforms to a predeﬁned trafﬁc proﬁle. A traf-

ﬁc proﬁle is a speciﬁcation that details the temporal

properties of a trafﬁc stream, such as its rate (data

transmission speed) and burst size (maximum amount

of data transmitted in a short period).

Our implementation relies on the use of the Linux

tc tool. The tc tool is included within the iproute2

suite

, a collection of user-space utilities designed for

controlling the networking functionality of the Linux

kernel. The Linux manual page for the tc tool

suggests that shaping goes beyond simply adjusting

bandwidth. In fact, tc offers functionalities that ex-

tend beyond basic bandwidth control. These function-

alities include, but are not limited to, shaping outgo-

ing network trafﬁc by modifying the output rate and

introducing packet delay (which can be ﬁxed, or in-

corporate a suitably-distributed jitter). Additionally,

tc allows for the introduction of packet loss, duplica-

tion and corruption.

3.1 Components of the Testbed

The detailed structure of the experimental testbed is

shown in Fig. 1. Each container within the testbed

executes a Debian 11 bullseye Linux instance. These

containers are assigned specialized functions as out-

lined below:

• webserver: This container runs an Apache2 web

server (version 2.4.57, default conﬁguration). The

web server receives normal trafﬁc originating

from the tester node and Denial-of-Service (DoS)

trafﬁc sent by the attacker. The container also

runs the utility tcpdump, which captures all net-

https://datatracker.ietf.org/doc/html/rfc2475

https://git.kernel.org/pub/scm/network/iproute2/iproute2.git

https://man7.org/linux/man-pages/man8/tc.8.html

ICISSP 2025 - 11th International Conference on Information Systems Security and Privacy

304

Table 1: Parameters of the network scenarios.

Network rate delay jitter loss corruption duplication

1 - Enterprise wi-ﬁ 1 Gbit/s 5 ms 1 ms - - -

2 - Enterprise Branch Ofﬁce 1 Gbit/s 20 ms 8 ms - - -

3 - Site-to-Site VPN 1 Gbit/s 20 ms 15 ms - - -

4 - Remote User VPN 100 Mbit/s 20 ms 15 ms - - -

5 - Degrated 100 Mbit/s 20 ms 25 ms 5 % 5 % 2 %

work trafﬁc received by the webserver, and saves

it in PCAP format in permanent storage.

• tester: This container runs a load generator built

on top of the well-known httperf

utility. First,

as for the normal trafﬁc, it is used to issue ran-

domized web requests; second, the tester col-

lects response statistics to check the availabil-

ity/disruption of the Apache server functionality

during DoS attacks.

• attacker: This container generates DoS attacks

against the webserver by means of well known,

public-available tools, which are described below.

As mentioned above, the tc tool makes it possible

to control only outgoing network trafﬁc. Hence the

set-up of the testbed for the emulation of any given

network requires to launch tc on the outgoing links

of the three containers with the same settings, as re-

ported in Fig. 1. The network scenarios reproduced

in USB-IDS-TC are presented in Section 4.

3.2 Normal and DoS Trafﬁc

Normal Trafﬁc. The scripted load generator exe-

cuted at the tester node exploits httperf to issue

requests to the Apache web server for assorted con-

tents (small, medium and large HTML ﬁles, images,

PDF documents, . . . ). The normal trafﬁc collection

has a duration of 20 minutes. As for the DoS trafﬁc,

the attacker node launches four different DoS attacks

(HTTP ﬂood, two kinds of slowloris, slow POST)

against the webserver, by the following tools:

• hulk

: it generates an HTTP ﬂood, spawning a

large volume of obfuscated and unique requests

to prevent the recognition of a pattern that could

allow the ﬁltering of the anomalous trafﬁc;

• slowloris

: a tool producing DoS trafﬁc based

on slow HTTP requests against the victim server,

effective in the exploit of a weakness of the HTTP

protocol related to the management of TCP frag-

mentation;

https://github.com/httperf/httperf

https://github.com/grafov/hulk

https://github.com/gkbrk/slowloris

• slowhttptest

: this tool can extend anoma-

lously the duration of HTTP connections in differ-

ent ways. For the production of our dataset data,

we use slowhttptest both in the (i) “slowloris”

mode, which sends incomplete HTTP requests to

the victim server, and (ii) “slow POST” mode,

which sends message bodies at very slow speed.

The DoS tools, one at a time, are launched and kept

running for 180 seconds, enough to disrupt the web-

server service and to collect a signiﬁcant sample of

attack trafﬁc.

4 DATASET COLLECTION AND

ORGANIZATION

Normal and DoS trafﬁc is collected in four represen-

tative network scenarios from (Fulkerson, 2017): En-

terprise Wi-Fi, Enterprise Branch Ofﬁce, Site-to-Site

VPN, Remote User VPN. In addition, we constructed

a ﬁfth scenario representing a severely degraded net-

work. This was deliberately selected to highlight the

discrepancies from any network with satisfactory per-

formance, and to examine the inﬂuence on the accu-

mulated trafﬁc ﬂows. The conﬁguration speciﬁcs of

tc for the ﬁve networks are shown in Table 1.

In order to avoid any form of mislabeling – a

problem affecting many datasets currently in use –

the trafﬁc capture for each network scenario is per-

formed with 5 independent experiments: (1) the

normal trafﬁc, whose capture is named NOR, and

(2, 3, 4, 5), i.e., each individual DoS attack, lead-

ing to four additional captures named HLK (hulk),

GSL (slowloris), HSL (slowhttptest in slowloris

mode) and HSP slowhttptest in slow POST mode.

As mentioned before, during the DoS attacks, the

tester runs httperf, which continuously sends HTTP

requests to the webserver and records information on

the service availability. This probing HTTP trafﬁc

(easily recognizable by its source node address) is not

recorded by tcpdump. In consequence, the trafﬁc col-

lected in (2, 3, 4, 5) consists of “pure” attack packets,

https://tools.kali.org/stress-testing/slowhttptest

USB-IDS-TC: A Flow-Based Intrusion Detection Dataset of DoS Attacks in Different Network Scenarios

305

Table 2: Number of ﬂows by network scenario and capture.

Network NOR HLK GSL HSL HSP

1 1658 79399 4072 5211 5244

2 1725 86281 3279 5107 5100

3 1677 82200 2795 5112 5126

4 1671 88525 2861 5102 5125

5 3285 98695 686 5494 5438

not interleaved with normal trafﬁc (present during the

attack, but not recorded at all). The complete separa-

tion of normal and attack trafﬁc is useful to avoid any

possibility of mislabeling the generated ﬂows.

Overall, the ﬁve captures – conducted in the ﬁve

network scenarios assessed – lead to a total 25 PCAP

ﬁles. The PCAP ﬁles obtained have been processed

with the CICFlowMeter tool (see Fig. 1), which

is used to obtain the corresponding ﬂow records.

It is noteworthy that the original CICFlowMeter,

which was utilized to generate the initial CICIDS2017

dataset, a widely used-resource in the machine learn-

ing community, was affected by a number of bugs that

had a signiﬁcant impact on the consistency of the re-

sulting ﬂow records. We have adopted a revised ver-

sion of the tool

produced from independent studies

(Liu et al., 2022). The number of ﬂow records for

each network scenario and each class of trafﬁc is pre-

sented in Table 2. Although the duration of all the

DoS trafﬁc captures is identical, it is evident that hulk

– producing a ﬂood attack – generates a considerably

higher number of ﬂows than the slow counterparts.

Furthermore, the introduction of packet errors and du-

plications during the capture of the network number 5

increases the number of ﬂows of normal trafﬁc and

results in a delayed impact (only 686 GSL ﬂows) of

the slowloris script.

USB-IDS-TC is released in the form of ﬁve csv

ﬁles, where each ﬁle provides normal and DoS ﬂow

records of one network scenario in Table 1. Each csv

ﬁle provides ready-to-use labeled network ﬂows, ob-

tained appending the ﬁve previously-labeled ﬂow col-

lections relative to the same network scenario. The

labels are the abbreviations already used in Table 2

(NOR, HLK, GSL, HSL, HSP).

General Observations and Use Cases of USB-

IDS-TC. The dataset is not meant to be the “ulti-

mate” solution for NIDS testing, but a ﬁrst step to

promote an in-depth understanding of the possible ef-

fect of the network scenarios on the performance of

ML/DL NIDS, issue commonly neglected at the state

of the art. It is worth pointing out that the choice

of using CICFlowMeter to obtain the trafﬁc ﬂows

makes USB-IDS-TC immediately interoperable with

https://github.com/GintsEngelen/CICFlowMeter

Figure 2: Recall, precision and false positive rate of an IDS

model learned from network 1 and applied to network 1, 2,

3, and 5.

the high number of NIDS proposals based on the use

of CICIDS2017 and other major datasets of the CIC

collection. Moreover, any captured PCAP ﬁle pro-

cessed by CICFlowMeter can be used in conjunction

with USB-IDS-TC, paving the way for a study of

the transferability of NIDS models over different net-

work scenarios. As an aside, the proposed Docker/tc

testbed presented here can also be used to generate

realistic problem-space adversarial attacks by alter-

ing the timing of the packets sent by an attacker node

(Catillo et al., 2024).

5 KEY INSIGHTS

5.1 Intrusion Detection Implications

Our critical argument is that the speciﬁc network

scenario inﬂuences the trafﬁc data and – in turn –

the value of the features extracted (especially those

timing-related). In consequence, an attempt to learn

a NIDS with the ﬂow records obtained in a given

network may return a detection model that does not

transfer to the normal and DoS trafﬁc of the other net-

works. Let us supplement this argument by a concrete

example with USB-IDS-TC. At ﬁrst, we learn a NIDS

model with the ﬂow records obtained in network 1. As

for any ML/DL experiment, we remove non-relevant

and biasing features (i.e., id, timestamp and protocol

of the ﬂow records, source and destination IP address

and port) and split the records obtained in network 1

into the typical training, validation and test set (60, 20

and 20% of the records, respectively). Regarding the

ICISSP 2025 - 11th International Conference on Information Systems Security and Privacy

306

(a) hulk (b) GSL slowloris

Figure 3: Throughput measured during the progression of

the attacks for some network scenarios.

technique to infer the NIDS, we use a decision tree;

however, any other classiﬁer, e.g., Bayesian network,

oneR or DNN (to mention a few), would have ﬁt the

scope of this example. The decision tree is trained

with the training set; parameterization and overﬁtting

issues are checked with the validation set.

The NIDS model obtained is tested with the test

set of network 1 (held-out from training and valida-

tion) and the entirety of records of network 2, 3 and

5 (network 4 is not reported because the results are

close to network 2). We compute the metrics of re-

call (R), precision (P) and false positive rate (FPR)

to measure the capability of the NIDS at recogniz-

ing the classes of trafﬁc among NOR, HLK, GSL,

HSL, HSP. Fig. 2 shows the values of the metrics. As

expected, the model – when trained and tested with

the ﬂow records collected in the same network sce-

nario (as typically done in most of the NIDS papers) –

achieves more than satisfactory results. The leftmost

set of ﬁve bars of Fig. 2 (pertaining to network 1) in-

dicate that four classes are detected with both R and P

≥0.97; the FPR is almost 0. The metrics obtained for

network 2, 3 and 5 are shown by the remaining bars

in Fig. 2. Compared to network 1, the drop is signiﬁ-

cant in many cases. For example, the recall of HSP is

≤0.14 in both network 2 and 3; as for network 5, three

classes are far below 0.9 recall. Similar ﬁndings can

be noted for P, which drops in many cases, e.g, HSL

(network 2 and 5), HSP (network 2, 3 and 5), NOR

(network 5). FPR reaches the extremely high values

of 0.55 and 0.29 for HLK (network 2 and 3).

Suggested Research Directions. The lack of

transferability of ML/DL-based NIDS models is over-

looked by the NIDS literature. This is exacerbated

in the context of USB-IDS-TC because both normal

and DoS trafﬁc are indeed the same, although exe-

cuted in different network scenarios. Our data can

help researchers to strive for more generalizable and

widely-applicable detectors. In this respect, future us-

ages of USB-IDS-TC may include (but not limited to)

the analysis of: the networks leading to more gen-

eralizable detectors, the attacks being most affected

by the network parameters, the robustness of existing

ML/DL techniques – and learning paradigms – with

respect to the network, the features depending on the

network or the construction/selection of more general

features suited for detection.

5.2 Effectiveness of the Attacks

The reader may argue that the efﬁcacy of the attacks

might be affected by the speciﬁc network scenario;

however, this is not the case of USB-IDS-TC. Differ-

ently from many existing datasets (that do not disclose

any speciﬁc victim-side service availability measure-

ments), we monitored the performance of the vic-

tim server during the progression of the attacks. As

said above, in response to the probing HTTP load,

httperf generates several service metrics. Here we

provide some insights into the effectiveness of the

attacks in USB-IDS-TC by discussing the through-

put loss. The throughput loss (TL) is computed as

T L=

∗

−T

∗

·100 (with T

∗

≥T and T

∗

>0 ), where (i) T

∗

– a constant – is the throughput (i.e., successful 2xx

HTTP requests accomplished within the time unit) ex-

pected in attack-free conditions for a given network

scenario and (ii) T is the actual throughput measured

during a DoS attack. TL varies within [0, 100]%,

where 0 denotes “no loss” with respect to the attack-

free condition. Any point where TL>0% indicates

instead the presence of a DoS attack, because in our

testbed the only source of legitimate requests is the

tester node. Fig. 3 shows TL observed during the at-

tacks; for each attack, we show two networks because

all the cases produce similar results. Overall, the at-

tacks cause a variety of responses. For example, TL

raises from 0% to 90-95% in hulk (Fig. 3a), which

means the attacks leaves almost no room to serve the

legitimate requests; on the other hand, GSL slowloris

induces major ﬂuctuations of TL, as shown in Fig. 3b.

HSL slowloris (Fig. 3c) and slow POST (Fig. 3d)

present an on-off behavior, where the throughput goes

from 0 to 100% in almost no time. The attacks are ef-

fective for all the network scenarios assessed.

USB-IDS-TC: A Flow-Based Intrusion Detection Dataset of DoS Attacks in Different Network Scenarios

307

(a) network 1 (b) network 2

Figure 4: PCA-based visualization of the ﬂow records.

5.3 Visual Inspection

One more interesting ﬁnding on the trafﬁc in USB-

IDS-TC can be inferred through a visual inspection

of the ﬂow records. We conduct a Principal Com-

ponent Analysis (PCA) to visualize the ﬂow records

in the feature space. PCA is a dimensionality reduc-

tion technique whose objective is to ﬁnd the directions

along which a set of high-dimensional points line up

best. Flow records are regarded here as R

points of

a Euclidean space, where 86 is the number of features

after removal of the label and non-relevant/biasing

features. We retain the top 2 principal components

(PC) explaining almost 40% of the total variance: this

is highly satisfactory for 2D visualization purposes.

Fig. 4a and 4b show the records pertaining to three

classes of trafﬁc (i.e., NOR, HLK and GSL) for net-

work 1 and network 2, respectively. It can be noted

that the classes are “well” separated, which means it

is possible to infer a successful NIDS model on the

top of an individual network scenario; however, the

model obtained will not transfer – as shown in Sec-

tion 5.1 – to a different network scenario. This as-

pect remains surely intriguing because the classes of

trafﬁc preserve their feature-space distribution across

the network scenario. For example, Fig. 4c shows the

normal ﬂow records obtained in network 1, 2, and 3,

each marked by , ⊠ and ×, respectively. The nor-

mal points obtained in the different networks are dis-

tributed over the same area: the different networks

induce a “light” shift of the points. Similar consid-

erations can be done for the hulk records in Fig. 4d,

shown for network 3, 4, and 5.

We believe that the availability of different vari-

ants of normal and DoS trafﬁc across different net-

work scenarios, such as those in USB-IDS-TC, is

strongly beneﬁcial to the research community to learn

more ﬂexible detection models or to test the transfer-

ability of a given NIDS proposal.

6 CONCLUSION

Intrusion detection is a hot topic, and the research

on NIDS should be fed with consistent and up-to-

date datasets. The scientiﬁc community tends to rely

on rather obsolete datasets, possibly obtained by col-

lecting trafﬁc relative to attacks that are not actu-

ally harmful against targeted services. Mislabeling

is a further issue. In light of the above, new public

datasets providing effective attacks are required.

Compared to existing dataset proposals, our work

has gone in a novel direction. We have built a new

dataset where the same – well known and rather cus-

tomary – DoS attacks have been conducted over dif-

ferent network scenarios, in the belief that network

has a non-negligible effect on trafﬁc features and the

detection capability of the NIDS. Furthermore, we

have tested the effectiveness of all the attacks. Our

initial analysis shows that the network scenario can

affect the capability of ML/DL detection.

We believe that the USB-IDS-TC dataset can

stimulate the research on the transferability of intru-

sion detection methods. Furthermore, the emulation

environment presented here lends itself to the study of

realistic problem-space attacks performed by altering

the timing of the attack packets sent. These topics will

be explored in our future work, which intends also to

provide the community with additional datasets rela-

tive to non-DoS attacks and problem-space adversar-

ial examples over multiple network scenarios.

ACKNOWLEDGMENT

This work has been partially funded by the Euro-

pean Union – Next-GenerationEU – National Recov-

ery and Resilience Plan (NRRP) – MISSION 4 COM-

PONENT 2, INVESTMENT N. 1.1, CALL PRIN

2022 PNRR D.D. 1409 14-09-2022 – (Threat-driven

security testing and proactive defense identiﬁcation

for edge-cloud systems) CUP N. F53D23009270001

(CUP Master N. E53D23016380001).

ICISSP 2025 - 11th International Conference on Information Systems Security and Privacy

308

REFERENCES

Beer, F. and Buehler, U. (2017). Feature selection for ﬂow-

based intrusion detection using rough set theory. In

Proc. Int. Conf. on Networking, Sensing and Control,

pages 617–624. IEEE.

Buedi, E. D., Ghorbani, A. A., Dadkhah, S., and Ferreira,

R. L. (2024). Enhancing EV charging station security

using a multi-dimensional dataset: CICEVSE2024. In

Ferrara, A. L. and Krishnan, R., editors, Data and Ap-

plications Security and Privacy XXXVIII, pages 171–

190, Cham. Springer Nature Switzerland.

Catillo, M., Del Vecchio, A., Ocone, L., Pecchia, A., and

Villano, U. (2021a). USB-IDS-1: a public multilayer

dataset of labeled network ﬂows for IDS evaluation.

In 2021 51st IEEE/IFIP Int. Conf. on Dependable Sys-

tems and Networks Workshops (DSN-W), pages 1–6.

Catillo, M., Pecchia, A., Rak, M., and Villano, U. (2021b).

Demystifying the role of public intrusion datasets: A

replication study of DoS network trafﬁc data. Com-

puters and Security, 108:102341.

Catillo, M., Pecchia, A., Repola, A., and Villano, U. (2024).

Towards realistic problem-space adversarial attacks

against machine learning in network intrusion detec-

tion. In Proc. of the 19th Int. Conf. on Availability,

Reliability and Security, ARES ’24. ACM.

Dadkhah, S., Neto, E. C. P., Ferreira, R., Molokwu,

R. C., Sadeghi, S., and Ghorbani, A. A. (2024). CI-

CIoMT2024: A benchmark dataset for multi-protocol

security assessment in IoMT. Internet of Things,

28:101351.

Draper-Gil., G., Lashkari., A. H., Mamun., M. S. I., and A.

Ghorbani., A. (2016). Characterization of encrypted

and VPN trafﬁc using time-related features. In Proc.

of the 2nd Int. Conf. on Information Systems Security

and Privacy - ICISSP, pages 407–414. SciTePress.

Engelen, G., Rimmer, V., and Joosen, W. (2021). Trou-

bleshooting an intrusion detection dataset: the CI-

CIDS2017 case study. In 2021 IEEE Security and Pri-

vacy Workshops (SPW), pages 7–12.

Fulkerson, J. (2017). 9 sets of sample tc com-

mands to simulate common network scenarios.

https://www.badunetworks.com/9-sets-of-sample-tc-

commands-to-simulate-common-network-scenarios/.

Accessed: 2024-03-30.

Kim, Y., Hakak, S., and Ghorbani, A. (2023). DDoS Attack

Dataset (CICEV2023) against EV Authentication in

Charging Infrastructure . In 2023 20th Int. Conf. on

Privacy, Security and Trust (PST), pages 1–9. IEEE.

Kushwaha, P., Buckchash, H., and Raman, B. (2017).

Anomaly based intrusion detection using ﬁlter based

feature selection on KDD-CUP 99. In Proc. TENCON

IEEE Region 10 Conference, pages 839–844. IEEE.

Lanvin, M., Gimenez, P.-F., Han, Y., Majorczyk, F., M

L., and Totel,

E. (2023). Errors in the CICIDS2017

dataset and the signiﬁcant differences in detection per-

formances it makes. In Kallel, S. and et al., editors,

Risks and Security of Internet and Systems, pages 18–

33, Cham. Springer Nature Switzerland.

Lashkari, A. H., Gil, G. D., Mamun, M. S. I., and Ghor-

bani, A. A. (2017). Characterization of Tor trafﬁc us-

ing time based features. In Proc. International Con-

ference on Information Systems Security and Privacy,

pages 253–262. SciTePress.

Liu, L., Engelen, G., Lynar, T., Essam, D., and Joosen, W.

(2022). Error prevalence in NIDS datasets: A case

study on CIC-IDS-2017 and CSE-CIC-IDS-2018. In

2022 IEEE Conference on Communications and Net-

work Security (CNS), pages 254–262.

McHugh, J. (2000). Testing Intrusion detection systems: a

critique of the 1998 and 1999 DARPA intrusion de-

tection system evaluations as performed by Lincoln

Laboratory. ACM Transactions on Information and

System Security, 3(4):262–294.

Moustafa, N. and Slay, J. (2015). UNSW-NB15: a compre-

hensive data set for network intrusion detection sys-

tems (UNSW-NB15 network data set). In Proc. Mil-

itary Communications and Information Systems Con-

ference, pages 1–6. IEEE.

Neto, E. C. P., Dadkhah, S., Ferreira, R., Zohourian, A., Lu,

R., and Ghorbani, A. A. (2023). CICIoT2023: A real-

time dataset and benchmark for large-scale attacks in

IoT environment. Sensors, 23(13).

Neto, E. C. P., Taslimasa, H., Dadkhah, S., Iqbal, S.,

Xiong, P., Rahman, T., and Ghorbani, A. A. (2024).

CICIoV2024: Advancing realistic IDS approaches

against DoS and spooﬁng attack in IoV CAN bus. In-

ternet of Things, 26:101209.

Ozg

ur, A. and Erdem, H. (2016). A review of KDD99

dataset usage in intrusion detection and machine

learning between 2010 and 2015. PeerJ Preprints.

Ring, M., Wunderlich, S., Scheuring, D., Landes, D., and

Hotho, A. (2019). A survey of network-based in-

trusion detection data sets. Computers and Security,

86:147–167.

Rosay, A., Carlier, F., Cheval, E., and Leorux, P. (2022).

From CIC-IDS2017 to LYCOS-IDS2017: A corrected

dataset for better performance. In IEEE/WIC/ACM

Int. Conf. on Web Intelligence and Intelligent Agent

Technology, WI-IAT ’21, page 570–575. ACM.

Sharafaldin, I., Lashkari, A. H., and Ghorbani., A. A.

(2018). Toward generating a new intrusion detection

dataset and intrusion trafﬁc characterization. In Proc.

Int. Conf. on Information Systems Security and Pri-

vacy, pages 108–116. SciTePress.

Silva, J. V. V., Lopez, M. A., and Mattos, D. M. F.

(2020). Attackers are not stealthy: Statistical anal-

ysis of the well-known and infamous KDD network

security dataset. In Proc. Conf. on Cloud and Internet

of Things, pages 1–8. IEEE.

Tavallaee, M., Bagheri, E., Lu, W., and Ghorbani, A. A.

(2009). A detailed analysis of the KDD-CUP’99 data

set. In Proc. Symp. on Computational Intelligence for

Security and Defense Applications, pages 1–6. IEEE.

USB-IDS-TC: A Flow-Based Intrusion Detection Dataset of DoS Attacks in Different Network Scenarios

309