tic has an affordable licencing policy, and does not
severely restrict the number of dashboards such as
Splunk does in its evaluation mode. These restric-
tions essentially made Elastic a better alternative for
our project. Comparing Splunk and Elastic as SIEM
solutions under similar conditions would be an inter-
esting extension in a larger follow-up study.
Another possibility would be to use Graylog2
10
,
which has similar features as Elastic, is Open Source,
and also is based on Elasticsearch for log data min-
ing. Graylog2 excels at postmortem debugging, secu-
rity and activity analysis. Our main reason for choos-
ing Elastic over Graylog2, is that it has more flexible
graphing and dashboard handling. Such functional-
ity would be useful for extending the framework as a
more general situational awareness platform support-
ing both security monitoring and crisis management
in the future. Kibana is user friendly by allowing data
scientists to fairly easily add more functionality to the
system in the form of new dashboards. It is also pos-
sible to add support for more advanced time series
analytics by integrating Grafana
11
. A limitation with
Kibana is that it lacks advanced user management. It
is however possible to set up simple access control us-
ing HTTP basic authentication. More advanced solu-
tions will require purchasing X-Pack which provides
encryption, authentication and authorisation.
The solution proposed here is based on virtual
machines. This plug-in architecture for IDS engines
makes the framework easily extensible for other IDS
technologies. Other similar architectures have been
suggested based on Docker containers, which are
more lightweight than virtual machines and allows
for enabling independence between cloud applica-
tions and infrastructure
12
. Our solution will needs to
use at least some virtual machines to be able to test
real operating system instances. Docker could be con-
sidered as an option in the future.
Elasticsearch can be used for building complex
search requests, but the main challenge in future re-
search will be not only conveying the IDS alerts but
also performing more complex data analysis, alert
correlation and data mining to identify the relevant in-
formation for security operators and other stakehold-
ers and reduce the amount of false alarms. Clustering,
behaviour analysis and machine learning techniques
used in anomaly detection would be natural exten-
sions of this research to improve the overall attack
detection capabilities of the system in the future.
Possible missions for the target framework in-
cludes testing the performance in terms of accuracy
10
Graylog2: https://github.com/Graylog2
11
Grafana https://grafana.com/
12
Docker: https://elk-docker.readthedocs.io/
and speed of detection, validating new IDS solutions
or validating different rule sets on IDSs. Another pos-
sibility is to extend the framework as a comprehensive
hybrid SIEM/IDS solution that uses several different
tools such as log analysis, network and host-based in-
trusion detection systems.
Future challenges that can be investigated using
this framework includes research on how to model
new protocols and how to simulate really big data
scenarios where a large cluster of sensors as well as
Elastic shards need to collaborate on the data min-
ing. Autoconfiguration of the framework is another
challenge which can use techniques such as Network
Function Virtualisation and Software Defined Net-
working. Alert normalisation is to some extent han-
dled by Logstash. Future research and standardisa-
tion is however required to define a common ontology
that ensures semantic interoperability between differ-
ent types of alerts (Krauß and Thomalla, 2016). The
platform will also act as a research vehicle for visual-
ising and analysing results as to prove specified sim-
ulation scenarios as well as for improving situational
awareness during such scenarios.
The framework is extensible and scalable. The
IDS side can be extended with additional IDS solu-
tions by adding new virtual machines to the frame-
work that support a given IDS technology. The
Logstash configuration can also be adapted to cate-
gorise different kinds of IDS solutions. Logstash is
horizontally scalable meaning that the performance
can be scaled up by adding more hardware nodes.
It can furthermore form groups of nodes running the
same information pipeline. Adaptive buffering ca-
pabilities in the Elastic stack provides smooth data
streaming even with variable throughput loads. If
Logstash becomes a bottleneck, then more nodes
(cloud instances/virtual machines) can be added.
Elasticsearch is also horizontally scalable by al-
lowing the performance to be scaled up using more
hardware nodes. The nodes in a cluster form a full
mesh topology, which means that each node main-
tains a connection to each of the other nodes. The
cluster has a single master node which is chosen au-
tomatically by the cluster and which can be replaced
if the current master node fails. An index is a log-
ical namespace which points to primary and replica
shards, which are instances managed automatically
by Elasticsearch
13
. Each document is stored in a sin-
gle primary shard. When a document is being in-
dexed, it is indexed first on the primary shard, then
on all replicas of the primary shard. A replica is a
copy of the primary shard, used to increase failover
and performance. The number of primary and replica
13
https://www.elastic.co
Intrusion Detection System Test Framework for SCADA Systems
283