Dataset Analysis for Anomaly Detection on Critical Infrastructures

German Lopez-Civera and Enrique de la Hoz

Computer Engineering Department, University of Alcala, Ediﬁcio Polit

ecnico, 28805 Alcal

a de Henares, Madrid, Spain

Keywords:

Intrusion Detection, Dataset evaluation, Machine Learning, Decision Tree.

Abstract:

Anomaly Detection techniques allow to create robust security measures that provides early detection and are

able to identify novel attacks that could not be prevented otherwise. Datasets represent a critical component

in the process of designing and evaluating any kind of anomaly detection method. For this reason, in this

paper we present the evaluation of two datasets showing the dependencies that arise between the techniques

employed and the dataset itself. We also describe the characteristics that have to be taken into account while

selecting a dataset to evaluate a detection algorithm in a critical infrastructure context.

1 INTRODUCTION

During the last few years the dependency of our soci-

ety tot the organizations and infrastructures that sup-

ports the services that provides the most basic has

risen to a level that has force all countries to take

measures to protect this kind of facilities. The pro-

tection of the systems that allow this organizations to

continue working is a requirement that no government

can ignore (Rinaldi et al., 2001). The impact that an

attack on this kind of institutions could have on the

economy, security and health of our society demands

adequate responses and a joint effort of government

institutions and private companies that operate the fa-

cilities that provides these services.

One of the most important assets inside these fa-

cilities is the telecommunication network that sup-

ports the processes that are performed in a critical in-

frastructure. The disruption of the communications in

a critical infrastructure could have multiple cascading

effects that can end up with the complete infrastruc-

ture compromised by an attacker. The early detec-

tion of attacks at this kind of organizations is a pro-

cedure that must be implemented at each layer of its

core modules.

Intrusion detection and prevention systems

(IPS/IDS) are the most used tool to deploy this kind

of detection on real infrastructures. This kind of

system can be classiﬁed in two categories depending

on the method they use to detect the attacks. Mis-

use detection techniques are based on processing

network trafﬁc data looking for known malicious

behaviour. Every packet / ﬂow is compared with

known malicious patterns that are continuously

updated by vendors and experts. While most of the

commercial options are mainly based on Misuse

detection methods, anomaly detection techniques

provide many advantages versus novel attacks.

Anomaly detection tries to ﬁnd deviations from

the normal behaviour. These irregularities can show

early stages of attacks and also some kind malfunc-

tions due to hardware or software errors. As these

techniques do not use any kind of signature that de-

scribes the attacks, they can adapt their behaviour to

different kind of topologies and applications. They

are also able to detect attacks that have not a signa-

ture yet to detect them, being helpful during the initial

phase after a vulnerability discovery.

Network anomaly detection is a research ﬁeld on

itself and the interest on it has raised during the last

years among the research community. The prolif-

eration of machine learning tools that help to test

and evaluate the performance of different algorithms

against network data samples have boost the research

on this ﬁeld. As a consequence of this growth, net-

work trafﬁc datasets that contains normal and mali-

cious behaviour are highly demanded. These datasets

have to comply with different requirements to become

an appropriate alternative to evaluate anomaly detec-

tion algorithms. In the following section we will de-

scribe the main characteristics that have to be evalu-

ated before choosing a dataset.

Lopez-Civera, G. and Hoz, E.

Dataset Analysis for Anomaly Detection on Critical Infrastructures.

DOI: 10.5220/0006017701510158

In Proceedings of the 13th International Joint Conference on e-Business and Telecommunications (ICETE 2016) - Volume 1: DCNET, pages 151-158

ISBN: 978-989-758-196-0

151

2 DATASET

CHARACTERIZATION

As the tool used to evaluate the performance of at-

tack detection techniques, datasets must include a

wide variety of data inside them. This data should

be composed by both, normal and anomalous sam-

ples. This way, the evaluation can provide meaningful

metrics that show how the algorithm performs against

different environments. A good anomaly detection

method should not only detect most of the malicious

behaviour (i.e. a good True positive rate (TPR)) but

also should not confuse normal behaviour with ma-

licious one (i.e. a high False positive rate (FPR)).

The dataset composition can help to detect bad per-

formance from both points of view, but it is crucial to

evaluate their content before trusting the results they

can output.

KDD99 dataset is the most used dataset in the aca-

demic ﬁeld as it has become a standard evaluation

benchmark. Despite this fact, this dataset is highly

outdated and does not represent the current threats

that can face a critical infrastructure. Moreover sev-

eral works (Brown et al., 2009) (McHugh, 2000) have

highlighted some deﬁciencies in this dataset that can

bias the results of the algorithms applied to it. This

fact arise the issue of searching another dataset that

meets the requirements of our scenario and that can

be used as a benchmark for anomaly detection algo-

rithms.

The process of creating a dataset that provides a

realistic scenario, while providing as much data as

possible and preventing biased information means a

signiﬁcant challenge (Shiravi et al., 2012). In this sec-

tion we will brieﬂy describe the main characteristics

that we have found more important during our work

with different kinds of datasets and the impact they

can have on algorithm performance evaluation.

• Generation Method: Dataset generation can be

either synthetic or real capture based. Real cap-

ture datasets are built by employing real trafﬁc

collected from a real institution like a university,

a research facility or a private organization. On

the other hand synthetic datasets are manually

created by injecting malicious trafﬁc into normal

trafﬁc samples. These normal can also be synthet-

ically generated or be part of a real trafﬁc capture.

Real capture datasets are inherently better as they

model real network behaviour and therefore can

offer the most realistic information about the ac-

tual characteristic of an attack.

Moreover the normal part of the dataset show the

real use of the network without needing to model

it via any kind of trafﬁc generation pattern. De-

spite of all these advantages this kind of datasets

are really hard to ﬁnd due to the complexity of

capturing real attacks and the privacy issues that

can arise from publicly share network trafﬁc of

an organization. The method employed to gener-

ate the dataset has to be taken into account when

translating the results of the performance metrics

into actual conclusions.

• Network Data Format: The format in which the

dataset is presented determines the quantity of in-

formation that is offered by it. As the data rep-

resented is network trafﬁc, the format are mainly

based on different standardized network trafﬁc

representations. The trafﬁc can be offered raw or

after performing some level of aggregation. For

sharing raw network trafﬁc, PCAP format is the

most used one. It is a standardized format that

contains a direct copy of the trafﬁc that travels

through a network, therefore it is a way to avoid

losing any kind of information when sharing net-

work trafﬁc data. The main disadvantage is the

size of the data (i.e. It takes up the same size as

the actual data collected from the network) and

as a no-loose format, the privacy issues of shar-

ing a raw copy of the data. As a consequence

real data capture of a critical infrastructure is ex-

tremely hard to ﬁnd as it would represent a huge

threat for the organization itself.

As opposed to PCAP , Netﬂow-like formats of-

fer a summarized view of the trafﬁc collected in

the dataset. Their information unit is the trafﬁc

ﬂow, that is, a sequence of messages exchanged

between two network nodes. Each one of these

ﬂows could be composed by different trafﬁc pack-

ets but it is summarized as a single ﬂow and char-

acterized by its duration, size, number of packets,

etc. This kind of formats solve some privacy and

size issues while retaining most of the core be-

haviour of the network and as a consequence are

widely employed for dataset generation.

• Anonymization Level: To solve the privacy is-

sues mentioned above different techniques are

employed to reduce the amount of private infor-

mation provided in the dataset. These techniques

try to preserve most of the actual behaviour of the

network, so attacks can still be detected and dis-

tinguished from normal trafﬁc. The most basic

anonymization method is the aggregation offered

by the format itself. As we mentioned in the pre-

vious characteristic, if the dataset is offered in a

ﬂow summarized format, the payload of the pack-

ets is removed. There exist datasets in raw for-

mat that offer Pcap ﬁles without the data payload

of the packets. Both techniques prevent leaking

DCCI 2016 - SPECIAL SESSION ON DATA COMMUNICATION FOR CRITICAL INFRASTRUCTURES

152

the data that travelled through the network but the

actual origin and destination of the communica-

tion is still present in the dataset. As the actual

IP addresses or timestamps could allow to infer

the context and intention of the communication

further anonymization methods are usually per-

formed (Coull et al., 2009). These techniques in-

clude the truncation of the data ﬁelds (e.g remove

the last octet of every IP address), regenerate

data ﬁelds based on the statistical model that best

ﬁt them, replace IP addresses by pseudonyms or

quantization ﬁelds like the communication times-

tamps. Each one of these techniques affect in a

different way the dataset and a balance between

privacy and utility must be reached during the de-

sign phase of the dataset.

• Attack Diversity: The development of new at-

tack techniques and the sustained growth in the

complexity of threats forces to continuously adapt

and update the detection methods employed. As

a consequence when developing a dataset a wide

variety of attacks must be included. If we only

focus on one kind of attacks, the algorithms eval-

uated with them can have a good performance,

while completely ignoring other kind of threats,

leading to inconsistent results. However we are

aware that an actual compilation of all kind of

attacks is not feasible, so a good selection must

be performed. It should include attacks that cov-

ers all layers of the communication network, from

TCP/IP ﬂood techniques to Web application at-

tacks like SQL injection or Cross-site Scripting.

In the case of critical infrastructure particular at-

tention is needed to model Denial of Service

(DoS) attacks and application speciﬁc attacks to

provide a heterogeneous environment for the de-

tection methods evaluated.

• Normal/Malicious Distribution: Depending on

the composition of the dataset and the ratio be-

tween normal and malicious samples the dataset

may lead to confusing results and to overtraining

the algorithms. Datasets are usually biased due

to the fact that normal trafﬁc is much more fre-

quent than malicious. Despite that we can say that

most of the time the network will behave follow-

ing this principle an attacker could try to abuse

this assumptions by mimic his attack into normal

trafﬁc patterns. For example, DoS attacks dras-

tically change the dynamic of the network trafﬁc

and can be launched in different phases that make

the trafﬁc to evolve step by step, while attempting

to avoid statistical detection. Therefore a good

balance between both kinds of samples must be

reached. Very few malicious samples could lead

to a bad model of attack behaviour but too much

attack samples can also increase the false positive

rate.

• Labelling Process: Depending on the algo-

rithm employed to detect the anomalies a labelled

dataset may be a requirement to run the algorithm

itself (i.e. Supervised machine learning algo-

rithms) but independently of the type of algorithm

employed the labels are the only way to actu-

ally evaluate the accuracy of a detection method.

Without labels the dataset can only be used as ref-

erence data but not as an evaluation benchmark.

The labelling process is complex and highly de-

pendent on how the dataset is generated.

While manually labelling the dataset is the most

accurate method it is usually infeasible due the

size of the data and the difﬁcult that involves de-

tecting attacks on the data collected. Synthetic

dataset are easier to label, often the dataset pub-

lishers know the source of the attacks (they have

launched them) and they can automatize the la-

belling process. In other cases where this man-

ual process is not viable, the dataset is labelled

according to the output of multiple anomaly de-

tection tools that are already evaluated in the past.

This output offer a relative level of conﬁdence and

can help to label massive datasets that could not

be label otherwise. The quality and completeness

of the dataset labelling will have an impact on the

evaluation and is one of the most critical aspect

while selecting a dataset for our purposes.

3 DATASET EVALUATION

Following the principles and characteristic described

in the previous section we have selected two datasets

to test machine learning techniques against them and

evaluate their performance. The datasets chosen are

TORPEDA (Torrano-Gimenez et al., ) and CTU-13.

3.1 TORPEDA and CTU13 Datasets

TORPEDA is a synthetic labelled dataset shared in

XML format. It is composed by HTTP requests

made against a custom vulnerable web application.

The main objective of the creators was to provide a

standardized dataset that helps to evaluate the perfor-

mance of Web Application Firewall products. The ac-

tual data it contains includes the HTTP method, all

HTTP headers and the path part if the URL.

CTU-13 is also a synthetic dataset focused on Bot-

net network trafﬁc but also contains a wide range of

Dataset Analysis for Anomaly Detection on Critical Infrastructures

153

attack types including DDoS or port scanning. The

dataset is composed by thirteen samples. In each sam-

ple a different botnet malware is deployed in a con-

trolled environment. The dataset comes in two for-

mats: a labelled bidirectional Netﬂow ﬁle and a full

Pcap of all malicious packets. Recently they have

added Pcap samples with both, normal and malicious

packets but removing the data payload to prevent pri-

vacy issues. Among these thirteen samples we can

found different combinations of attacks and different

normal/malicious ratios. The combination of these

two formats and the diversity shown in the thirteen

samples makes CTU 13 a multi-purpose dataset that

can ﬁt many research requirements.

3.2 Classiﬁcation Algorithms

During our research on anomaly detection datasets we

have used multiple algorithms to evaluate their perfor-

mance against different datasets. Depending on the

type of anomalies we are trying to ﬁnd there exist dif-

ferent approaches to solve the problem. We can or-

ganize anomalies as volume-based or content-based

anomalies. Volume-based anomalies are the ones that

produce a change in the quantity or frequency of in-

formation exchanged in the network. Examples of

this kind of anomalies are DoS attacks or Port scan-

ning. On the other hand content-based anomalies fo-

cus on detecting attacks in the payload of the network

trafﬁc and are able to detect application speciﬁc at-

tacks like SQL injection.

Both approaches have their own strengths and

weaknesses. Content-based anomaly detection can

offer better results against application layer attacks

but as they must inspect the payload of the trafﬁc,

depending on the volume of information that travels

through the network, the computational cost could

make them impossible to use. In contrast volume-

based anomaly detection have less computational re-

quirements when analysing the same amount of traf-

ﬁc. It is a more attack-agnostic method as they do not

try to ﬁnd speciﬁc characteristic for each type of at-

tack, instead they look for the consequences the attack

have on the whole behaviour of the network.

In this paper we will employ machine learning

algorithms to try to detect both types of anoma-

lies inside both chosen datasets. Among all types

of machine learning methods anomaly detection can

be seen as a subcategory of classiﬁcation problems.

Classiﬁcation methods aim to create rules based on

the data provided to them that helps to classify the

data into a set of categories. Once these rules are es-

tablished the algorithm is used to predict the category

where new data samples best ﬁt. The performance of

this kind of algorithms is measured based on the accu-

racy they have when predicting the categories of the

samples in a test benchmark where the categories are

already known.

To develop this analysis we have employed

Python programming language with Scikit-learn (Pe-

dregosa et al., 2011) and Matplotlib (Hunter, 2007)

libraries that helps to quickly evaluate different types

of machine learning algorithms and easily manage

the datasets. Among all available classiﬁcation algo-

rithms we have chosen Decision Trees. This kind of

algorithm is a well-known method to perform recur-

sive partition on the data it is applied to. The decision

tree consist on a set of nodes (with a root node as ori-

gin), at each one of them the data is split in different

classes depending on the feature and threshold evalu-

ated at it. The most frequent approach is to ask one

single question to the data coming to the node, and

depending on the answer, the data is divided into two

different sub-spaces. This process is repeated recur-

sively until the tree reaches particular depth or until

the leaf nodes contain less samples than an speciﬁc

threshold. Each one of this leaf nodes indicates the

predicted class (or the probability of being on it) of

the data that is inside it. Despite the fact it has not

the best accuracy of all tested algorithms, it is one of

the few algorithms that allows to see its inner working

through decision tree graphical representations.

The majority of classiﬁcation algorithms can be

customized through different parameters to adjust dif-

ferent kinds of thresholds but decision trees also al-

low to see which part of the data is most important

to guide the choices the algorithm made through its

different steps. This feature will help us to iden-

tify outliers that could be leading to bad decisions al-

lowing us to adapt the dataset if needed. This way

we can quickly diagnose overﬁtting problems. Over-

ﬁtting arise when the algorithm does not generalize

enough the underlying relationship in the data. This

problem leads to poor prediction performance as the

model will overreact to changes and therefore it must

be properly addressed.

3.3 Data Preprocessing

Prior to launch this kind of algorithms a preliminary

data preprocessing is needed to model each dataset

as a set of features that can be input to the anomaly

detection process.

Both selected datasets needed different kinds of

preprocessing. TORPEDA comes in XML format and

the information is represented as strings. A typical

approach is to deal with the dataset as it is a normal

text, splitting it into sequences of tokens of different

DCCI 2016 - SPECIAL SESSION ON DATA COMMUNICATION FOR CRITICAL INFRASTRUCTURES

154

sizes called n-grams. After that the n-grams are pro-

cessed to count the frequency of appearance in the n-

gram corpus, generating statistics like term frequen-

cyinverse document frequency (Tf-idf) that help the

anomaly detection algorithm to process the dataset,

as they usually only understand numeric data.

CTU-13 dataset labelled ﬂow format comes di-

rectly in a CSV-like format that can be easily loaded

by the analysis tools. Nevertheless due to the size

and heterogeneity of the features that comes with it,

some design decisions must be carefully taken prior

to launch the anomaly detection phase. The ﬁrst deci-

sion was to remove timestamps and IP addresses from

the data that is actually sent to the anomaly detection

algorithm. The reason was to avoid overﬁtting prob-

lems due to detecting speciﬁc IP addresses as mali-

cious instead of looking for the actual attack charac-

teristics. The timestamps was removed afterwards af-

ter noticing it adds noise during the detection phase

and did not help to improve the accuracy. The ﬁnal

set of features employed was:

• Dur: The total time employed by each ﬂow to

send the information.

• Proto: Protocol used (i.e. TCP, UDP, ICMP)

• Sport, DPort: Source and destination port.

• TotPkts, TotBytes: The total amount of packets

and bytes exchanged in the ﬂow.

Duration and source/destination ports were also

converted to numeric format as its order can be mean-

ingful to detect attack patterns.

After selecting good features two aggregation pro-

cesses were applied to the dataset aiming to reduce the

computational cost. First, the dataset was re-sampled

to ﬁve minutes slots and then only the thirty percent of

each feature that retains the highest variance is saved

for processing.

4 RESULTS

In this section we will show the results we obtained

after launching several experiments with decision tree

algorithms against the data coming from TORPEDA

and CTU-13 datasets.

The ﬁrst dataset we analysed was TORPEDA. The

dataset was split in training and test sequences that al-

low to evaluate the performance of the decision tree

classiﬁer. With this set-up we obtained a 99.99% Ac-

curacy score that made us suspect of suffering over-

ﬁtting issues. To ﬁnd the source of this behaviour we

explored the internal data structures employed by the

decision tree classiﬁer, to see which tokens had more

weight when guiding the decisions made by the clas-

siﬁer. At this point we realize that some tokens had

the most part of the weight. These tokens were com-

pletely unrelated with attacks, but with speciﬁc char-

acteristics of the context where the malicious requests

were made. Some types of these tokens are detailed

below:

• The Web application session ID assigned to the

attacker during the attack generation is present in

every malicious request.

• Some HTTP Header parameters like ”gzip”, ”de-

ﬂate” only appears on attack samples.

• The User-Agent employed by the attacker was

also constant on most of the requests.

• Some application speciﬁc parameters that were

more frequent in malicious requests than in nor-

mal trafﬁc, due to its corresponding endpoint was

attacked more times than present in benign trafﬁc.

To try to solve this issue we started removing this

tokens from the data fed to the algorithm. After each

experiment new tokens that biased the result were

found. The most representative tokens of this issue

were ”sqlmap” and ”Accept”. The ﬁrst one was in-

cluded in some attacker request and it had the 100%

of the decision weight before removing it. Despite ev-

ery request that includes this token in the User-Agent

header should be labelled as suspicious, an attacker

can easily change this header and he would bypass

the detection of our system. The second token was

harder to diagnose as all requests included at least

one ”Accept” HTTP header. As we can observe in

Figure 1 the accuracy was still biased and the token

had 99.97% of the decision weight. But as shown in

the ﬁgure, the underlying reason was that all normal

samples include four different accept headers, while

the majority of malicious ones contains less than four.

Figure 1: TORPEDA dataset result metrics.

The distribution of the Accept occurrences in the

requests was biasing the results and forcing the deci-

sion tree to make bad choices. Even after combining

multiple decision tree classiﬁers in a random forest or

randomizing the dataset split in training/test samples,

the output was the same.This kind of issues illustrates

the complexity inherent in the task of generating a

Dataset Analysis for Anomaly Detection on Critical Infrastructures

155

dataset. Little details can produce a huge impact in

detection rate if not properly identiﬁed.

To compare this results we chose CTU-13 dataset

as a good benchmark. It provides a much bigger data

sample and different kinds of features that will help

us to measure the impact they can have applying the

same algorithms.

Before launching experiments with the classiﬁer

algorithm we found interesting to plot some features

of the dataset trying to ﬁnd speciﬁc patterns that could

help us to guide the decision tree. To illustrate this

process we selected Duration and Total Packets fea-

tures. In Figure 2 we can easily identify a pattern in

which the plot of these two features clearly cluster the

malicious behaviour in three different sets on sample

10 of CTU-13. This result may mislead our research

if we do not double check them against other sam-

ples. As we can see in Figure 3 the same plot made

with sample 9 of the dataset behaves really different.

No cluster can be identiﬁed at clear sight and the mali-

cious ﬂows are evenly distributed among the most fre-

quent points of the plot. Therefore we can conclude

that despite this kind of manual analysis can help in

some cases to identify clear patterns in the datasets,

it must be carefully performed as it can mislead the

decisions we made while choosing which features to

use in the analysis.

Figure 2: Duration vs Total Packets in sample 10 of CTU-13

dataset.

Finally after preparing CTU-13 dataset for the

analysis, we launched a random forest classiﬁer to

compare the results with TORPEDA dataset. As we

can see in Figure 4 in this case we also obtained a

high accuracy score of 0.98 and the weight ranking

was less biased. Nevertheless the decision was still

dominated by source and destination port features.

Despite of the good results obtained such high ac-

curacy metrics should be double-checked, therefore

our next step was to analyse the decisions that the tree

was making. To do so, we make use of the features

Figure 3: Duration vs Total Packets in sample 9 of CTU-13

dataset.

Figure 4: CTU-13 dataset result metrics.

that Scikit-learn library provides to graph the decision

tree. In Figure 5 we can see the result of this analy-

sis. In that graph if the condition showed on top of

each node is met, then go to the child node on the left.

The amount of samples that goes on each direction is

shown at the bottom of each node.

If we examine the resulting tree we can see that the

ﬁrst check that the tree makes to each data sample is

that if the source port is smaller or bigger than 5000.

With that single test, 83% of the dataset is already

classiﬁed as benign (i.e. all samples on the left of

the value array) as the right child node is a leaf node.

Despite the fact that small source port values would

be less likely malicious as they are usually allocated

for speciﬁc applications and may need high privileges

to use them, a small source port cannot be directly

linked to benign trafﬁc as any attacker can avoid this

ﬁlter by customizing its attack source port.

Further analysis of the decision tree shows that

most of the nodes tend to look for application spe-

ciﬁc destination port (like destination port smaller

than 54.5 for DNS protocol). Moreover generic fea-

tures like Duration, Total Packets or Total size have

very little weight in the decision process.

DCCI 2016 - SPECIAL SESSION ON DATA COMMUNICATION FOR CRITICAL INFRASTRUCTURES

156

Figure 5: Decision tree computed over CTU-13 dataset.

5 CONCLUSIONS

Examining the results obtained with these two

datasets we can conclude that a direct process over

the dataset data even if we properly adapt it to re-

duce the amount of overﬁtting issues is not an ade-

quate approach. Classiﬁer algorithms have shown a

good behaviour during the evaluation of the datasets

and has helped us to identify features that can mislead

the anomaly detection process.

Nevertheless using only classiﬁer algorithms to

perform actual anomaly detection to raw datasets has

been shown as an unsuccessful approach.

In future analysis we would like to explore the im-

pact of generating new features via feature engineer-

ing techniques. This will allow to feed the detection

algorithms with features that offer more discriminat-

ing power and that help them to infer the inner struc-

ture of the data and detect the underlying anomalies.

Further analysis must be performed against bigger

datasets that try to combine multiple types of datasets

(Bhuyan et al., 2015), offering a wider variety that

will prevent the overﬁtting problem or at least make

it less likely to happen. We also consider that hybrid

approaches that combine different kinds of detection

techniques applied to ﬂow and raw capture formats

will be more resistant to overﬁtting and produce better

predictions.

ACKNOWLEDGEMENTS

This work was partially supported by SCOUT, a re-

search project supported by the European Commis-

sion under its 7th Framework Program (contract-no.

607019).

REFERENCES

Bhuyan, M. H., Bhattacharyya, D. K., and Kalita, J. K.

(2015). Towards generating real-life datasets for net-

work intrusion detection. I. J. Network Security,

17(6):683–701.

Brown, C., Cowperthwaite, A., Hijazi, A., and Somayaji,

A. (2009). Analysis of the 1999 darpa/lincoln labora-

tory ids evaluation data with netadhict. In 2009 IEEE

Symposium on Computational Intelligence for Secu-

rity and Defense Applications, pages 1–7.

Coull, S. E., Monrose, F., Reiter, M. K., and Bailey, M.

(2009). The challenges of effectively anonymizing

network data. In Conference For Homeland Secu-

rity, 2009. CATCH’09. Cybersecurity Applications &

Technology, pages 230–236. IEEE.

Hunter, J. D. (2007). Matplotlib: A 2d graphics environ-

ment. Computing In Science & Engineering, 9(3):90–

95.

McHugh, J. (2000). The 1998 lincoln laboratory ids evalua-

tion. In Recent Advances in Intrusion Detection, pages

145–161. Springer.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V.,

Thirion, B., Grisel, O., Blondel, M., Prettenhofer,

P., Weiss, R., Dubourg, V., Vanderplas, J., Passos,

Dataset Analysis for Anomaly Detection on Critical Infrastructures

157

A., Cournapeau, D., Brucher, M., Perrot, M., and

Duchesnay, E. (2011). Scikit-learn: Machine learning

in Python. Journal of Machine Learning Research,

12:2825–2830.

Rinaldi, S. M., Peerenboom, J. P., and Kelly, T. K. (2001).

Identifying, understanding, and analyzing critical in-

frastructure interdependencies. IEEE Control Sys-

tems, 21(6):11–25.

Shiravi, A., Shiravi, H., Tavallaee, M., and Ghorbani, A. A.

(2012). Toward developing a systematic approach to

generate benchmark datasets for intrusion detection.

Comput. Secur., 31(3):357–374.

Torrano-Gimenez, C., Perez-Villegas, A., and Alvarez, G.

Torpeda: Una especiﬁcacion abierta de conjuntos de

datos para la evaluacion de cortafuegos de aplica-

ciones web.

DCCI 2016 - SPECIAL SESSION ON DATA COMMUNICATION FOR CRITICAL INFRASTRUCTURES

158