Guidelines and a Framework to Improve the Delivery of Network

Intrusion Detection Datasets

Brian Lewandowski

1,2

Computer Science, Worcester Polytechnic Institute, 100 Institute Road, Worcester, U.S.A.

Raytheon Technologies, 1001 Boston Post Road E, Marlborough, U.S.A.

Keywords:

Network Intrusion Detection, Datasets, Machine Learning, Deep Learning.

Abstract:

Applying deep learning techniques to perform network intrusion detection has expanded signiﬁcantly in recent

years. One of the main factors contributing to this expansion is the availability of improved network intrusion

detection datasets. Despite recent improvements to these datasets, researchers have found it difﬁcult to effec-

tively compare methodologies across a wide variety of datasets due to the unique features generated as part of

the delivered datasets. In addition, it is often difﬁcult to generate new features using a dataset due to the lack

of source data or inadequate ground truth labeling information for a given dataset. In this work, we look at net-

work intrusion detection dataset development with a focus on improving the delivery of datasets from a dataset

researcher to other downstream researchers. Speciﬁcally, we focus on making dataset features reproducible,

providing clear labeling criteria, and allowing a clear path for researchers to generate new features. We outline

a set of guidelines for achieving these improvements along with providing a publicly available implementation

framework that demonstrates the guidelines using an existing network intrusion detection dataset.

1 INTRODUCTION

Network intrusion detection (NID) is a methodology

to protect computer networks by analyzing network

trafﬁc in order to identify malicious network trafﬁc

(Chou and Jiang, 2022). Researchers have begun to

leverage machine and deep learning techniques in or-

der to effectively combat the increasingly complex

and evolving attacks taking place on networks today

(Yang et al., 2022). In order to research and verify the

applicability of these data intensive techniques to net-

work intrusion detection systems (NIDS), one must

utilize datasets consisting of network scenarios that

involve both benign and malicious activity. To sup-

port these efforts a growing number of datasets have

been developed and analyzed (Chou and Jiang, 2022;

Ring et al., 2019; Yang et al., 2022). Despite the great

strides made in NID dataset development, researchers

have identiﬁed limitations which make it challeng-

ing to benchmark methods and perform feature engi-

neering (Chou and Jiang, 2022; Ferriyan et al., 2021;

Sarhan et al., 2021b; Sarhan et al., 2021c; Sarhan

This document does not contain technology or Techni-

cal Data controlled under either the U.S. International Traf-

ﬁc in Arms Regulations or the U.S. Export Administration

Regulations.

et al., 2020; Sarhan et al., 2021c; Wolsing et al.,

2021).

In this work we seek to reduce the impact of lim-

itations that occur as a result of the handoff of NID

datasets from a dataset developer to downstream NID

researchers. We propose a set of guidelines to help

dataset developers overcome handoff limitations and

extend the positive impact these datasets can have on

downstream researchers. Our focus on the handoff of

NID datasets between researchers has not been well

explored in current NIDS dataset research. While

many of the guidelines are generic in nature, we pro-

vide details on how to speciﬁcally implement them for

NID datasets. In addition to the guidelines, we pro-

vide an open source containerized environment and

framework to support implementation of the guide-

lines

Figure 1 shows the dataset development process

adapted from descriptions in recent research to show

where this work logically ﬁts (Sarhan et al., 2021b;

Komisarek et al., 2021). As can be seen highlighted

in the ﬁgure, we focus on improvements for NIDS

dataset feature and label generation which leads to

additional improvements for the ﬁnal delivery of the

https://github.com/WickedElm/niddff

Lewandowski, B.

Guidelines and a Framework to Improve the Delivery of Network Intrusion Detection Datasets.

DOI: 10.5220/0012052300003555

In Proceedings of the 20th International Conference on Security and Cryptography (SECRYPT 2023), pages 649-658

ISBN: 978-989-758-666-8; ISSN: 2184-7711

 2023 by SCITEPRESS – Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)

649

Network

Trafﬁc

Generation

Collect

Source Data

Feature

Generation

Apply Labels

Evaluation

and Bench-

marking

Publish

Dataset

Figure 1: An overview of the NID dataset development process. The areas that the guidelines seek to improve are depicted in

blue with a gray background.

dataset. The provided set of guidelines can be used

such that the end delivery of a NIDS dataset includes

the original source network data as well as concrete

scripts for generating each feature and label. With

both of these items in hand, researchers will be able to

reliably recreate a dataset from source data, ensure the

same features are available across multiple datasets,

and perform additional feature engineering.

The main contributions of this work are as fol-

lows:

• To the best of the authors’ knowledge this is the

ﬁrst work to focus speciﬁcally on the handoff of

NID datasets

• We identify the limitations that affect the handoff

of NID datasets between researchers

• Guidelines are provided for overcoming limita-

tions currently present in the NID dataset handoff

process between researchers

• An open source containerized environment is pro-

vided which contains a standard toolset and a fea-

ture engineering framework to support implemen-

tation of the guidelines

The remainder of this work is outlined as follows.

In Section 2 related works that look to improve the

NID dataset development process are explored. Sec-

tion 3 outlines the limitations related to NID datasets

that this work seeks to address. In Section 4 we dis-

cuss the details of our proposed guidelines along with

the developed containerized environment and frame-

work. We conclude and outline future work in Section

2 RELATED WORK

Publicly available NID datasets that have been devel-

oped by researchers are well explored and analyzed

in the literature through a number of surveys (Chou

and Jiang, 2022; Ring et al., 2019; Yang et al., 2022).

These surveys break down the various datasets by dif-

ferent criteria such as data format, real versus syn-

thetic data, availability, as well as statistical data re-

garding the datasets. For this reason we focus this sec-

tion on other NID dataset development tools and ded-

icate Section 3 to discuss work related to NID dataset

limitations.

One of the earliest tools concerned with NID

datasets was FLAME (Brauckhoff et al., 2008). The

main goal of the FLAME tool was to take existing

netﬂow data and augment it by injecting new anoma-

lies into the existing ﬂows. In doing this, it would

allow researchers to capture live network trafﬁc and

then augment it later with anomalies resulting in a

dataset usable for developing NID methodologies.

With a goal similar to FLAME, the ID2T tool

also focused on augmenting network data with attacks

(Cordero et al., 2015; Cordero et al., 2021). The ID2T

tool, however, approached this from the packet level,

ingesting packet capture (PCAP) ﬁles as opposed to

netﬂow data. In addition, the ID2T tool is capable of

providing reports regarding the attacks injected such

that they can be used to facilitate data labeling. These

qualities allowed ID2T to be capable of injecting a

larger variety of network attacks making it useful for

the generation of new NID datasets that combine live

network trafﬁc with synthetic attacks.

The INSecS-DCS tool (Rajasinghe et al., 2018) is

another NID dataset creation tool which has an ex-

panded scope compared to both FLAME and ID2T.

Rather than focus on injecting attacks, INSecS-DCS

focuses on processing packet data, live or from a

PCAP ﬁle, such that one can customize the features

to include in a ﬁnal processed dataset. These features

can be captured as packet level statistics or based on

time windows.

Another related work introduces NDCT (Acosta

et al., 2021), which provides a toolkit for the collec-

tion and annotation of cybersecurity datasets. NDCT

is presented as a system that is primarily used during

a cybersecurity scenario exercise. During the scenario

execution, users are provided with dialogues to anno-

tate speciﬁc packets for tasks such as labeling. These

annotations can also be used to generate rules such

that similar packets receive the same labeling, mak-

ing the labeling task more efﬁcient.

Our work is distinguished from these related

works in several ways. First, our guidelines and

framework do not explicitly focus on injecting attacks

into existing source data. Our implementation, how-

ever, complements both FLAME and ID2T such that

SECRYPT 2023 - 20th International Conference on Security and Cryptography

650

one could use both tools as part of a pipeline for NID

dataset creation within the framework. Similarly, one

could incorporate the INSecS-DCS tool for feature

creation within our framework. We currently incor-

porate both Zeek

and Argus

for our container en-

vironment, however, the intention is to grow the en-

vironment such that tools such as these can be in-

corporated. While NDCT focuses on supporting the

live annotation of network data during scenario ex-

ecution, our framework would support downstream

feature engineering on the resulting source data. The

other main differentiator for our work compared to

those discussed here, is that we focus on being able to

reproduce a dataset from source ﬁles and facilitate its

exchange between researchers. The previous works

in this space do not generally have this focus as they

seek to improve the actual NID data itself as opposed

to the process of creating it.

3 NID DATASET LIMITATIONS

3.1 Reproducibility

The main focus of the guidelines and framework pre-

sented in our work is to increase the ability of re-

searchers to reproduce datasets from source ﬁles. To

be clear, we are not concerned with reproducing and

re-executing a NID scenario. Rather, we would like

to take the resulting PCAP and netﬂow ﬁles from

such a scenario and be able to reliably reproduce the

dataset’s original features and then perform further

feature engineering.

An example of the need for this type of repro-

ducibility has been researched recently by analyz-

ing the usage of publicly available datasets by down-

stream researchers (Chou and Jiang, 2022). In most

instances, the original datasets are augmented in some

way, however, it was found that most of the work re-

lated to these augmentations was unable to be dupli-

cated due to insufﬁcient code, documentation, or both

(Chou and Jiang, 2022). The framework delivered in

our work seeks to improve this situation by providing

a standard way to document and code dataset augmen-

tations from source ﬁles.

Other work has proposed a set of content and

process requirements for generating a reproducible

dataset (Ferriyan et al., 2021). The content require-

ments outlined include providing full PCAP ﬁles with

their data payload, anonymization of network trafﬁc,

providing ground truth data, using up-to-date network

https://zeek.org/

https://openargus.org/

trafﬁc, labeling the data, and providing information

regarding encryption. The process requirements per-

tain to information that should be provided in order to

make generating the dataset reproducible. Our guide-

lines support these requirements, and we look to ex-

tend them with our framework through inclusion of

the scripts used to generate features and perform la-

beling, along with full PCAP ﬁles. This leaves no am-

biguity in descriptions for how to regenerate a dataset.

In addition to these works, there is no shortage of

NID literature that discusses the need to have repro-

ducible datasets (Cordero et al., 2021; Lavinia et al.,

2020; Sharafaldin et al., 2018a; Kenyon et al., 2020).

3.2 Unclear Labeling Criteria

One of the major challenges that researchers face

when working with NID datasets is the lack of

datasets with complete and accurate labeling (Lavinia

et al., 2020; Moustafa et al., 2019). Many of these

labeling issues arise as the task is often performed

by human analysis, making it both time-consuming

and error-prone (Lavinia et al., 2020). In other cases,

the labeling criteria used is either incomplete or in-

ﬂuenced negatively by previous errors in the dataset

generation process (Lanvin et al., 2022).

For these reasons, accurate labels along with truth

data is generally considered a major component of a

useful NID dataset (Cordero et al., 2021; Rajasinghe

et al., 2018; Komisarek et al., 2021). While ground

truth on its own is useful, it can be misleading de-

pending on the granularity in which it is provided.

For instance, given only IP addresses and timestamps

one would have to assume that any trafﬁc related to

that IP address is malicious, however, it is typically

the case that a mixture of both benign and attack traf-

ﬁc would be present. For this reason, our proposed

guidelines and framework call for the inclusion of la-

beling scripts which may make use of ground truth

data if necessary. We note that in the case of manu-

ally labeled data this scripting could simply be index-

based using the ordering of the data being processed.

3.3 No Standard Feature Set

In a recent series of papers, Sarhan et al. explore

limitations of current datasets and the impact these

limitations have on evaluating methods across mul-

tiple networks and transitioning research into practi-

cal applications (Sarhan et al., 2020; Sarhan et al.,

2021a; Sarhan et al., 2021c; Sarhan et al., 2021b).

The main limitation explored in these works is the fact

that with such varied features included with delivered

datasets, one cannot reliably compare a methodology

Guidelines and a Framework to Improve the Delivery of Network Intrusion Detection Datasets

651

across multiple networks to test for generalizability.

This leads to hindrances during the transition from re-

search to practical applications.

Our work looks to extend the ideas expressed by

Sarhan et al. in order to enable researchers to over-

come these identiﬁed limitations. We aim to make it

easier for researchers to provide a dataset that is re-

producible from source data and easily expanded or

adjusted. In this way, one could easily use a standard

feature set as well as research augmenting such a fea-

ture set for improvements.

While not the main focus, both (Komisarek et al.,

2021) and (Layeghy et al., 2021) discuss and tackle

the need to use a common feature set for comparison

of their methods as opposed to using proprietary fea-

tures delivered with most NID datasets. Both works

provide informative descriptions regarding the fea-

tures used in their research. In addition, (Layeghy

et al., 2021) provides the actual calculations for the

features used in their work. We believe this is a step

in the right direction for the level of detail necessary

to reproduce datasets from source data. We seek to

naturally extend this information into scripts that are

provided along with source network captures to make

reproducing and extending the dataset more accessi-

ble and leave less opportunity for error.

4 NID DATASET DELIVERY

GUIDELINES

4.1 The Intrinsic Value in NID Datasets

We believe it is worthwhile to provide a brief discus-

sion regarding the intrinsic value provided by NID

datasets as related to their development and subse-

quent distribution. Namely, the intrinsic value of a

NID dataset is created during the scenario develop-

ment, execution, and source data collection and not

by the ﬁnal delivered features. To be clear, the ﬁ-

nal features are valuable, but they are representative

of a separate feature engineering activity that takes

place after the intrinsic value of a network scenario

has been captured in source data. In other words,

the value provided by the NID dataset is derived from

the actual network intrusion scenario and its collected

source data. A researcher could provide any number

of derived features with varying degrees of value for

attack detection, however, the intrinsic value of the

source data remains constant as it is derived from the

scenario that was captured.

One goal of our framework is to highlight these

two separate activities by advocating for the delivery

of both source data and separate scripts that gener-

ate the features that take place during any subsequent

feature engineering. Providing both items delivers the

value of both activities to downstream researchers.

4.2 Guidelines in Detail

The main ideas behind the proposed guidelines are

simple in statement but oftentimes overlooked in

practice. Speciﬁcally considering the hand off of

datasets from one researcher to another; the guide-

lines focus on ease of access, reproducibility from

source data, veriﬁcation, and extension. The guide-

lines are meant to provide general guidance for mak-

ing the delivery of NID datasets meet these four ar-

eas of focus and reduce the impact of the limitations

discussed in Section 3. We note that our framework

allows for the speciﬁc implementation of the guide-

lines to vary depending on the particular methods em-

ployed by researchers. While some common tools are

provided in our framework environment, we expect it

to expand to meet researchers’ needs as discussed fur-

ther in Section 5. In addition, it is important to make

the distinction that when we reference reproducibility

of a dataset, we refer to reproducing the dataset’s ﬁ-

nal features from the original source data as opposed

to recreating and re-executing the dataset’s NID sce-

nario.

The ten guidelines are outlined and described in

Table 1 along with their justiﬁcation and details re-

garding how NID researchers can implement each

guideline, with a focus on our companion frame-

work. Guidelines one through four pertain to provid-

ing downstream researchers with the resources neces-

sary to actively reproduce and enhance the provided

dataset. Guidelines ﬁve through nine outline steps

that can be taken to ensure that all the dataset features

and labels can be regenerated from source data, and

that the steps for this generation of features can be

veriﬁed and understood by downstream researchers.

Finally, guideline ten is speciﬁcally included to em-

phasize that the delivered datasets can be considered

active projects and adjust over time for any errors

found after initial presentation to researchers. This

aims to help avoid situations such as with the KDD

Cup ’99 (kdd, 1999) and CICIDS2017 (Sharafaldin

et al., 2018b) datasets, where researchers have found

issues with the original datasets resulting in multiple

variants of datasets being available with speciﬁc cor-

rections (Tavallaee et al., 2009; Lanvin et al., 2022;

Engelen et al., 2021).

SECRYPT 2023 - 20th International Conference on Security and Cryptography

652

Table 1: Guidelines for improving the handoff of NID datasets from dataset researchers to downstream researchers.

Guideline Justiﬁcation Implementation Details

(1) Provide direct access

to all data and scripts for

dataset

The main purpose of this guideline

is to prevent barriers to obtaining

datasets. (Ring et al., 2019; Cordero

et al., 2021)

This can be achieved through a simple

download script. The implemented frame-

work provides a mechanism such that

dataset developers can provide metadata

consisting of a download URL and destina-

tion ﬁle name to meet this guideline.

(2) Include complete

source data to the most

detailed extent possible

Full source data is necessary to ade-

quately reproduce and/or augment a

dataset. (Ring et al., 2019; Cordero

et al., 2021; Ferriyan et al., 2021)

This should generally be a standard format

such as PCAP or netﬂow. Full PCAP ﬁles

are more favorable than partial PCAP ﬁles

with no payload. If only netﬂow data is

available, a full collection of attributes is

better than a partial collection.

(3) If possible, provide ac-

cess to all tools needed to

generate dataset

Differences in tools, environments,

and their versions can limit the abil-

ity of downstream researchers to ob-

tain the same results as intended by

the original dataset authors. Without

this, extending the dataset with fea-

ture engineering may not be success-

ful. (Chou and Jiang, 2022; Sarhan

et al., 2021c; Cermak et al., 2018)

The implemented framework meets this

guideline by providing a containerized envi-

ronment with speciﬁc versions of tools such

as Zeek and Argus. This ensures that users

of the framework can use the same baseline

of tools and environment as was used by the

original dataset developers.

(4) Provide documenta-

tion indicating how to re-

produce a dataset from

source data

Clear documentation reduces ambi-

guity provided in general descriptions

of dataset creation. Differences in

commands used to generate a dataset

from source can produce different re-

sults than the original dataset. (Chou

and Jiang, 2022; Ferriyan et al., 2021)

Versions of tools and speciﬁc commands

used to execute them should be docu-

mented. The provided framework is self-

documenting as researchers can review

YAML ﬁles for each dataset to view the

commands used to generate them as dis-

cussed in Section 4.3.

(5) Include source code

needed to reproduce

dataset features

Providing feature generation source

code ensures downstream researchers

can duplicate a dataset, verify fea-

ture correctness, and understand de-

tails of the feature calculation. (Fer-

riyan et al., 2021; Ring et al., 2019)

One should avoid making code too speciﬁc

to a particular user environment. The imple-

mented framework supports this guideline

with a containerized environment, speciﬁc

directories for feature generation scripts,

and infrastructure to support features gener-

ated with network analysis tools.

(6) The source code for

each feature should be

easily identiﬁable

This guideline is recommended to

make analysis of the features of a

dataset more accessible for down-

stream researchers. (Lanvin et al.,

2022; Chou and Jiang, 2022)

This can be implemented through naming

conventions for scripts that match the ﬁnal

feature name and techniques such as using a

separate script or function for each feature.

The implemented framework supports this

by enforcing these conventions in its inter-

faces with network analysis tools.

(7) The generation of each

feature should be inde-

pendent from others

This guideline is recommended to

avoid execution dependencies be-

tween features and it facilitates the

ability to remove or add new features

by downstream researchers. This also

makes the code for each feature more

understandable and reviewable. (Lan-

vin et al., 2022; Chou and Jiang,

2022)

The implemented framework supports this

guideline in the way it interfaces with net-

work analysis tools to generate features

in an independent manner where possible.

In addition, the built-in framework encour-

ages this by providing standard conﬁgura-

tion ﬁles that can be used to identify each

feature script to run.

Guidelines and a Framework to Improve the Delivery of Network Intrusion Detection Datasets

653

Table 1: Guidelines for improving the handoff of NID datasets from dataset researchers to downstream researchers (cont.).

Guideline Justiﬁcation Implementation Details

(8) Apply guidelines out-

lined for features to labels

as well

While labels are signiﬁcant for model

training, during dataset generation

time, they can be considered a special

case of features. In this way, we want

to apply guidelines (4), (5), and (6) to

labels as well. (Ferriyan et al., 2021;

Lavinia et al., 2020; Cordero et al.,

2021; Rajasinghe et al., 2018; Komis-

arek et al., 2021)

The implemented framework supports this

goal by providing the same infrastructure

available for feature development to label

development.

(9) Make source code for

labeling distinct from

other features

This guideline is recommended to

make the labeling criteria used for

a dataset clear for collaborating re-

searchers. Because the label fea-

tures/procedure can inform machine

learning model design decisions it is

helpful to have it distinctly identiﬁ-

able. For example, if the labeling cri-

teria is based on a single IP address,

it is likely that the IP address features

should not be provided to a model.

(Lanvin et al., 2022)

The implemented framework supports this

guideline by having a separate step of pro-

cessing for label scripts and by having them

contained in a separate directory for a given

dataset.

(10) Provide a mecha-

nism to receive and imple-

ment feedback from re-

searchers to correct issues

and improve dataset

This guideline encourages collabora-

tion between NID researchers, allows

a dataset to remain current, and pro-

vides a feedback loop to dataset re-

searchers to correct any issues found

by the research community. (Ring

et al., 2019; Lanvin et al., 2022)

The implemented framework supports this

guideline through its use of scripting and

metadata to describe a dataset such that each

dataset can be maintained in an independent

source code repository or as part of the de-

fault environment.

4.3 Framework Details

In this section we cover the main ideas of our con-

tainerized environment and implementation of the

guidelines. As an example of the implementation,

we developed a demo dataset which takes a single

PCAP ﬁle from the UNSW-NB15 dataset (Moustafa

and Slay, 2015) and duplicate most of the original

dataset’s features and extends them to contain new

features. For brevity, many speciﬁcs regarding the

framework’s usage have been omitted. For addi-

tional details we recommend consulting the frame-

work repository.

4.3.1 Container Environment

We provide a containerized environment to support

our implementation in order to improve reproducibil-

ity and eliminate the need to install multiple tools

used by other researchers. Currently, this minimal

environment includes the Zeek and Argus network

analysis tools as well as python

and a set of default

python libraries as described in the tool’s repository.

It is expected that this would grow in the future, how-

ever, we consider this an adequate starting point to

demonstrate its usefulness.

The intention of our framework is that the tool and

our container would be used in conjunction together,

however, the container environment could be used

on its own just to ensure speciﬁc versions of tools

are easily accessible. Running the container without

specifying a command to execute will place the user

into a shell prompt with access to the installed tools.

The intended method of executing the environment,

however, is to map the container’s disk drive /nidd f f

to the directory of the user’s local repository of our

tool infrastructure. This allows for the development

of a dataset using the framework and container in a

variety of ways.

https://www.python.org/

SECRYPT 2023 - 20th International Conference on Security and Cryptography

654

4.3.2 Framework Implementation

Our implementation provides a standard format for

deﬁning and delivering NID datasets using conﬁgura-

tion ﬁles, naming conventions, and a standard direc-

tory structure. At the core of the implementation we

read in a YAML conﬁguration ﬁle customized for a

dataset and use that information to fully process the

dataset from source. The high level algorithm fol-

lowed by the tool can be seen in Algorithm 1.

Algorithm 1: General processing used to generate a NID

dataset based on an input conﬁguration ﬁle. The input ﬁle is

processed in a top-down manner with a loop for processing

multiple source ﬁles prior to combining them together at the

end.

Input: con f ig, YAML conﬁguration ﬁle

Output: dataset, NID dataset suitable for ML

1: Read in con f ig

3: Store documentation information from con f ig

4: Process setup options

6: Read in metadata for source data

7: if download source == TRUE then

8: Download all source PCAP and Netﬂow ﬁles

9: end if

10:

11: for each source ﬁle do

12: Execute feature processing commands

13: Execute label processing commands

14: Execute post-processing commands

15: Save intermediary dataset ﬁle

16: end for

17:

18: Execute ﬁnal dataset processing commands

19: Combine intermediary dataset ﬁles

20:

21: return dataset

4.3.3 Dataset Directory Structure

Each NID dataset has its conﬁguration and generation

scripts contained in a dedicated directory. This allows

it to be maintained by the original dataset developers

and then plugged into the framework by consumers

of the dataset. The general structure of a dataset di-

rectory is shown in Figure 2 where one can see the

YAML conﬁguration ﬁle, directories for source meta-

data, ground truth metadata, output ﬁles, and each

processing step’s ﬁles.

For the source and ground truth data, the direc-

tory contains metadata ﬁles which are in a comma-

separated format where each line contains a download

URL and the destination ﬁle name which is read in

dataset/

config.yaml

source/

pcaps.meta

ground truth/

gt.meta

output/

step acquire source data/

load .argus

load .python

load .zeek

step feature processing/

load .argus

load .python

load .zeek

step label processing/

load .argus

load .python

load .zeek

step post processing/

load .argus

load .python

load .zeek

step final dataset processing/

load .argus

load .python

load .zeek

Figure 2: A default directory structure for a dataset within

the proposed framework. Each processing step has its own

directory intended to contain loading scripts for supported

tools as well as any other scripts used in a given step. It

should be noted that these directories are only needed if they

are used for a given dataset. For instance, the framework

takes care of default processing for several stages but the

user has the option of customizing each stage with their own

scripts.

by the framework when acquiring source data. In ad-

dition, each processing step can contain simple ﬁles

with the naming convention load . < tool > where

< tool > is one of the framework’s supported tools

such as Zeek or Argus. While the particulars of how

each tool behaves varies, these ﬁles have each line

denote a single feature or process to run for a given

tool. If applicable, an associated script with the same

name as the feature it generates is contained in the

same directory. In other words, users can easily iden-

tify the features being generated by reviewing the

load . < tool > ﬁles and the scripts that they ref-

erence. This promotes having easy to identify source

code for each feature as indicated in guideline six, as

well as having self-contained features as indicated in

guideline seven.

Guidelines and a Framework to Improve the Delivery of Network Intrusion Detection Datasets

655

4.3.4 Dataset Conﬁguration File

Each dataset has a YAML conﬁguration ﬁle that

drives its creation. As seen in Listing 1 it contains

documentation, options, and can contain a mix of

built-in framework commands as well as custom com-

mands to execute. For example, the framework takes

information from the setup options section and de-

termines what source ﬁles to download during the

step acquire source data step. Other built-in com-

mands such as run zeek have default behavior re-

quiring little setup on the user’s part in the conﬁg-

uration ﬁle. In general, these commands look into

the current step’s directory and reads an associated

load . < tool > ﬁle. This ﬁle is then used by the

framework to either generate features, labels, or per-

form some other intermediary processing. Aside from

commands supported by the framework, user’s can

also specify any custom commands or scripting to ex-

ecute, and they will be processed in the order they

appear in the ﬁle. For these commands, users have

access to a number of built-in variables that can be

accessed in order to direct particulars such as paths

to source ﬁles to read in and where to place output.

The main beneﬁt of this single conﬁguration ﬁle is

that it fully self-describes how the dataset is created

and provides the information needed for users to ac-

cess the code used to generate features and perform

labeling.

4.3.5 Beneﬁts for NID Dataset Developers

The framework implementation provides several ben-

eﬁts for NID dataset developers. First, it provides

enough ﬂexibility such that there are varying degrees

of buy-in for using the framework. For instance, sup-

pose a NID dataset researcher only provides source

ﬁles and ground truth data or has a previously gener-

ated dataset that they would like to incorporate into

the framework with little effort. This can be achieved

through the framework by generating the source ﬁle

metadata ﬁles and ground truth metadata ﬁles. While

minimum effort is required by the NID dataset re-

searcher, it provides additional accessibility of the

ﬁles to downstream consumers. On the other end

of the spectrum, the container environment provides

tools for analyzing source data which can be taken

advantage of by NID dataset researchers. This use of

the container allows downstream researchers to use

the same versions of the software when working with

the dataset.

Another beneﬁt for NID dataset researchers is

that the framework implementation provides an or-

ganized structure to follow and self-documents how

the dataset features and labels were generated from

source data. When updating the dataset or expanding

it, the change history of the conﬁguration ﬁles within

the framework can be inspected to track the changes

provided there are no updates to the source data. Ad-

ditionally, any improvements or feedback can be pro-

vided from end users back to the NID dataset re-

searcher by lightweight updates to these conﬁguration

ﬁles.

The intent of this framework is such that no signif-

icant additional work is imposed on NID dataset de-

velopers as all the steps it encapsulates must already

be performed to generate a given dataset. The empha-

sis of the framework and guidelines is such that these

steps are simply organized in a standardized manner.

d oc u me n t a t i o n :

n i d d f f : n i d d f f / n i d d f f : 0 . 1

s e t u p o p t i o n s :

d a t a s e t n a m e : d e m o d a t a s e t

s o u r c e d a t a : unsw −nb15

g r o u n d t r u t h d a t a : unsw −nb15

c l e a n o u t p u t d i r e c t o r y : Tru e

e x p e c t e d o u t p u t s :

− u n s w n b 1 5 d a t a s e t . c s v

a r g u s :

c l e a n : T r u e

a r g u m e n t s : −S 60 −m

e x e c u t e r a : T r u e

s t e p a c q u i r e s o u r c e d a t a :

downlo a d : T r ue

s t e p f e a t u r e p r o c e s s i n g :

− r u n z e e k

− r u n a r g u s

− r u n p y t h o n s c r i p t s

s t e p l a b e l p r o c e s s i n g :

− r u n p y t h o n s c r i p t s

s t e p p o s t p r o c e s s i n g :

− r u n c o m b i n e f e a t u r e s

s t e p f i n a l d a t a s e t p r o c e s s i n g :

− r u n c o m b i n e d a t a

Listing 1: A sample input ﬁle consumed by our framework

specifying where to obtain source data and how to process

it to produce a ﬁnal dataset. Options can be overridden on

the command line if necessary.

4.3.6 Beneﬁts for NID Dataset Consumers

This framework also provides beneﬁts for down-

stream researchers using NID datasets. For re-

searchers looking to simply use the original dataset

as provided, there is generally no changes in work-

ﬂow imposed by the framework though they would

SECRYPT 2023 - 20th International Conference on Security and Cryptography

656

Figure 3: A diff comparison of extracted Argus features

from the ﬁrst PCAP of the UNSW-NB15 dataset. On the

top, the left hand side of the diff shows a portion of the

original Argus features from the original dataset while the

right shows the same section of the output but generated by

running Argus with no command line options on the source

PCAP. On the bottom, the left hand side of the diff shows

the same portion of the original Argus features from the

original dataset while the right now shows the same section

of the output generated by running Argus with the −S 60

option. The differences on the top demonstrate the necessity

of having the exact command line options used to generate

dataset features in order to make a dataset reproducible.

be able to easily obtain the dataset using the down-

load metadata. For researchers seeking to analyze a

dataset, the container environment and conﬁguration

ﬁles approach provides a way for them to reproduce

the dataset reliably since all the tools and the com-

mand line options used to run them are contained

within the scripts. As an example of this beneﬁt,

we look at the implementation of our demo dataset,

which uses a single PCAP from the UNSW-NB15

dataset (Moustafa and Slay, 2015). As depicted in

Figure 3, without using a particular set of options for

Argus, one would receive results with an additional

2,529 rows compared to what the original dataset au-

thors intended. This was found experimentally for our

research but shows the value of the ambiguity that is

removed when researchers have the full commands

readily available. Similar beneﬁts are gained by hav-

ing the full labeling criteria laid out in the dataset con-

ﬁguration ﬁles.

An additional beneﬁt comes in the form of being

able to generate a standard feature set from any source

data. If some standard feature set is not included

by the original dataset authors, a researcher can eas-

ily adapt the original dataset with a standard feature

set in order to facilitate comparisons across multiple

datasets. By following the guidelines and using the

framework, the scripts to produce such a feature set

become plug-n-play for any dataset that uses the same

source format.

Similar to this plug-n-play nature of scripts when

using the containerized environment and framework,

a similar beneﬁt can be realized for individual fea-

tures of a dataset. As an example, one can consider

the situation where two researchers are using the same

container version and source dataset and perform dif-

ferent feature engineering. The use of the container

and framework allows them to exchange their feature

scripts or just the resulting data for individual fea-

tures and simply merge the results into their work. As

outlined in Section 5, this ability provides additional

beneﬁts if the environment is expanded to include a

server-based component.

5 CONCLUSION AND FUTURE

WORK

In this work we propose a set of ten guidelines that

will improve the handoff of NID datasets between re-

searchers. The focus of these guidelines is to improve

ease of access, reproducibility from source data, veri-

ﬁcation, and the extension of datasets. We believe that

considering these areas while generating new datasets

will beneﬁt both dataset developers and downstream

researchers using the datasets. The provided frame-

work demonstrates these goals and their associated

beneﬁts.

While these guidelines are a step forward in

progress in this area of research, it does not elim-

inate all the complexities faced by researchers who

want to extend NID datasets. In future work we

aim to remove many of these additional complexities

by including server-based methods to facilitate these

guidelines. With the availability of a server environ-

ment, researchers could either use the container en-

vironment locally or interact with the server to per-

form scripting while leveraging the same container

environment in both contexts. In this approach, the

server could store source data locally eliminating the

need to download anything but a ﬁnal feature set. Ad-

ditionally, if other researchers had already created a

feature on the server, the scripting and data has the

potential to be re-used without the need to regenerate

anything. This future work would be able to leverage

the framework developed here making it a signiﬁcant

step towards even more efﬁciency gains.

ACKNOWLEDGMENTS

We would like to acknowledge Professor Randy Paf-

fenroth from Worcester Polytechnic Institute for his

valuable insights and guidance which helped shape

this work.

Guidelines and a Framework to Improve the Delivery of Network Intrusion Detection Datasets

657

REFERENCES

(1999). Kdd cup 99. http://kdd.ics.uci.edu/databases/kddc

up99/kddcup99.html. Accessed: 2022-08-08.

Acosta, J. C., Medina, S., Ellis, J., Clarke, L., Rivas, V., and

Newcomb, A. (2021). Network data curation toolkit:

Cybersecurity data collection, aided-labeling, and rule

generation. In MILCOM 2021 - 2021 IEEE Military

Communications Conference (MILCOM), pages 849–

854.

Brauckhoff, D., Wagner, A., and May, M. (2008). Flame: A

ﬂow-level anomaly modeling engine. In CSET.

Cermak, M., Jirsik, T., Velan, P., Komarkova, J., Spacek, S.,

Drasar, M., and Plesnik, T. (2018). Towards provable

network trafﬁc measurement and analysis via semi-

labeled trace datasets. In 2018 Network Trafﬁc Mea-

surement and Analysis Conference (TMA), pages 1–8.

Chou, D. and Jiang, M. (2022). A survey on data-driven net-

work intrusion detection. ACM Computing Surveys,

54(9):1–36.

Cordero, C. G., Vasilomanolakis, E., Milanov, N., Koch,

C., Hausheer, D., and M

uhlh

auser, M. (2015). Id2t: A

diy dataset creation toolkit for intrusion detection sys-

tems. In 2015 IEEE Conference on Communications

and Network Security (CNS), pages 739–740.

Cordero, C. G., Vasilomanolakis, E., Wainakh, A.,

uhlh

auser, M., and Nadjm-Tehrani, S. (2021). On

generating network trafﬁc datasets with synthetic at-

tacks for intrusion detection. ACM Trans. Priv. Secur.,

24(2).

Engelen, G., Rimmer, V., and Joosen, W. (2021). Trou-

bleshooting an intrusion detection dataset: the ci-

cids2017 case study. In 2021 IEEE Security and Pri-

vacy Workshops (SPW), pages 7–12.

Ferriyan, A., Thamrin, A. H., Takeda, K., and Murai,

J. (2021). Generating network intrusion detection

dataset based on real and encrypted synthetic attack

trafﬁc. Applied Sciences, 11(17).

Kenyon, A., Deka, L., and Elizondo, D. (2020). Are pub-

lic intrusion datasets ﬁt for purpose characterising the

state of the art in intrusion event datasets. Computers

& Security, 99:102022.

Komisarek, M., Pawlicki, M., Kozik, R., Hołubowicz, W.,

and Chora

s, M. (2021). How to effectively collect and

process network data for intrusion detection? Entropy,

23(11).

Lanvin, M., Gimenez, P.-F., Han, Y., Majorczyk, F., M

L., and Totel, E. (2022). Errors in the CICIDS2017

dataset and the signiﬁcant differences in detection per-

formances it makes. In CRiSIS 2022 - International

Conference on Risks and Security of Internet and Sys-

tems, pages 1–16, Sousse, Tunisia.

Lavinia, Y., Durairajan, R., Rejaie, R., and Willinger, W.

(2020). Challenges in using ml for networking re-

search: How to label if you must. In Proceedings

of the Workshop on Network Meets AI & ML, NetAI

’20, page 21–27, New York, NY, USA. Association

for Computing Machinery.

Layeghy, S., Gallagher, M., and Portmann, M. (2021).

Benchmarking the benchmark – analysis of synthetic

nids datasets.

Moustafa, N., Hu, J., and Slay, J. (2019). A holistic re-

view of network anomaly detection systems: A com-

prehensive survey. Journal of Network and Computer

Applications, 128:33–55.

Moustafa, N. and Slay, J. (2015). Unsw-nb15: a compre-

hensive data set for network intrusion detection sys-

tems (unsw-nb15 network data set). In 2015 Mili-

tary Communications and Information Systems Con-

ference (MilCIS), pages 1–6.

Rajasinghe, N., Samarabandu, J., and Wang, X. (2018).

Insecs-dcs: A highly customizable network intrusion

dataset creation framework. In 2018 IEEE Canadian

Conference on Electrical & Computer Engineering

(CCECE), pages 1–4.

Ring, M., Wunderlich, S., Scheuring, D., Landes, D., and

Hotho, A. (2019). A survey of network-based in-

trusion detection data sets. Computers & Security,

86:147–167.

Sarhan, M., Layeghy, S., Moustafa, N., and Portmann,

M. (2020). Netﬂow datasets for machine learning-

based network intrusion detection systems. In Big

Data Technologies and Applications, pages 117–135.

Springer.

Sarhan, M., Layeghy, S., Moustafa, N., and Portmann, M.

(2021a). A cyber threat intelligence sharing scheme

based on federated learning for network intrusion de-

tection.

Sarhan, M., Layeghy, S., and Portmann, M. (2021b). Eval-

uating standard feature sets towards increased gener-

alisability and explainability of ml-based network in-

trusion detection.

Sarhan, M., Layeghy, S., and Portmann, M. (2021c). To-

wards a standard feature set for network intrusion de-

tection system datasets. Mobile Networks and Appli-

cations, 27(1):357–370.

Sharafaldin, I., Gharib, A., Lashkari, A. H., and Ghor-

bani, A. A. (2018a). Towards a reliable intrusion

detection benchmark dataset. Software Networking,

2018(1):177–200.

Sharafaldin, I., Lashkari, A. H., and Ghorbani, A. A.

(2018b). Toward generating a new intrusion detection

dataset and intrusion trafﬁc characterization. ICISSp,

1:108–116.

Tavallaee, M., Bagheri, E., Lu, W., and Ghorbani, A. A.

(2009). A detailed analysis of the kdd cup 99 data

set. In 2009 IEEE symposium on computational intel-

ligence for security and defense applications, pages

1–6. Ieee.

Wolsing, K., Wagner, E., Saillard, A., and Henze, M.

(2021). Ipal: Breaking up silos of protocol-dependent

and domain-speciﬁc industrial intrusion detection sys-

tems.

Yang, Z., Liu, X., Li, T., Wu, D., Wang, J., Zhao, Y.,

and Han, H. (2022). A systematic literature review

of methods and datasets for anomaly-based network

intrusion detection. Computers & Security, page

102675.

SECRYPT 2023 - 20th International Conference on Security and Cryptography

658