Data Repository for Security Information and Event Management in

Service Infrastructures

Igor Kotenko, Olga Polubelova and Igor Saenko

Laboratory of Computer Security Problems, St. Petersburg Institute for Informatics and Automation (SPIIRAS),

39, 14th Liniya, Saint-Petersburg, Russia

Keywords: Security Repository, Security Information and Event Management, Security Ontology, Data Model, Data

Representation, Logical Inference, Service Infrastructure.

Abstract: Design and implementation of the repository is a critical problem in advanced security information and

event management (SIEM) systems, which are SIEM systems of service infrastructures. The paper discusses

several innovations which are realized to address this challenge. These include the application of an

ontological approach for repository data modeling and a hybrid approach to its development, meaning the

combined use of relational databases, XML databases and storage of triplets.

1 INTRODUCTION

At present, one of the most important research

directions in the area of computer network security

is a technology of security information and events

management (SIEM). The essence of this technology

is to ensure coherent boot in a centralized repository

of security log records from a variety of sources –

“security events”, their long- and/or short-term

storage, modeling and analysis to detect attacks,

generating efficient countermeasures. SIEM

technology can make effective safety decisions

based on event correlation, data mining, logical

inference and data visualization. Using SIEM

systems is extremely important to ensure

information security of large distributed computer

networks, management and financial services of

companies as well as for critical infrastructures, such

as dams, power plants, etc. (Miller et al., 2011).

Advances of SIEM systems in computer network

infrastructures give rise to use such systems in the

broader class infrastructures that can be defined as

service infrastructures. In these infrastructures, in

addition to computer networks there are the

infrastructures of various types of services

(financial, physical, etc.). SIEM systems which can

be used in service infrastructures are considered in

the paper as new generation SIEM systems.

MASSIF (MAnagement of Security information

and events in Service InFrastructures) is one of the

EU projects, which aims to develop solutions to

build a new generation of SIEM systems (MASSIF,

2011). SIEM systems elaborated in MASSIF must

have the following new features: removing most of

restrictions on the functions imposed by

infrastructure; coherent interpretation of the

incidents and events at various levels; high degree of

reliability and durability in capturing event data;

high scalability.

One of the most important components of the

SIEM systems used in service infrastructures is a

security repository, which is a data warehouse that

enables to store security information and event data

in an internal format and extracts it at the request of

other components for identifying security threats and

attacks and generating countermeasures.

In SIEM systems of service infrastructures,

security event data arrive from a variety of different

sources and can be presented in various input

formats. A SIEM system produces the normalization

of those data and they are converted into an internal

format. Then the security data are exposed to

correlation analysis (Stevens, 2005). In the SIEM

system of new generation it is possible to use the

advanced modeling and simulation modules, which

also use data stored in the repository to build on

their basis the attack and countermeasure graphs

(Ingols et al., 2009); (Kotenko et al., 2006).

For these reasons, the main objectives of the

repository development are as follows: to design a

unified repository, languages and tools for effective

management of security information, events and

308

Kotenko I., Polubelova O. and Saenko I..

Data Repository for Security Information and Event Management in Service Infrastructures .

DOI: 10.5220/0004075303080313

In Proceedings of the International Conference on Security and Cryptography (SECRYPT-2012), pages 308-313

ISBN: 978-989-8565-24-2

 2012 SCITEPRESS (Science and Technology Publications, Lda.)

policies, and logical inference about security; to

implement software applications for storing,

manipulating, visualizing and validating security

information, events and policies based on the unified

repository. The paper examines the main issues of

data model design and repository development for

new generation SIEM systems. We could note the

following innovations that were used to solve this

problem. First, for data modeling the ontological

approach is proposed and implemented. It provides

the necessary flexibility of internal data

representation in the repository and the possibility of

more accurate and high-quality results of queering.

Secondly, the hybrid approach to implement the

repository is suggested. It integrates relational

databases, XML databases and stores of triplets.

Finally, we propose the advanced repository

architecture implemented and tested with the data

used for attack modeling in SIEM systems.

The rest of the paper is organized as follows.

Section 2 reviews related work in the field of SIEM

data processing, representation and storage. Here we

consider standards in security event representation,

advanced SIEM systems, languages for data

representation, and approaches to implement the

repository. Section 3 considers the ontological

vulnerability model used in the repository for attack

modeling. Section 4 discusses the issues of the

repository implementation and testing based on the

data of the SIEM attack modeling module. Section 5

concludes our results and outlines further research.

2 RELATED WORK

During the analysis of state-of-the-art, we

considered perspective and widely used approaches

and standards for data representation in area of

security information and events management.

Information and event management standards

provide the most common rules for representation of

security events and incidents. Currently, there are

many different standards of security data

representation (IDMEF, IODEF, CEE, SCAP, CBE,

CEF, XDAS, CIM, etc.). The most popular of them

are Common Event Expression (CEE), SCAP

(SCAP, 2011), Common Base Event (CBE) (Ogle et

al., 2004) and Common Information Model (CIM)

(CIM, 2011). For example, CEE realizes a

comprehensive approach to handling the input

stream of information to log management systems,

including recommendations to vendors of hardware

and software systems that generate the input stream.

SCAP enables to compile a list of system platforms

and applications, set their secure configurations,

identify the most critical vulnerabilities, etc.

The main repository solutions in advanced SIEM

systems (AlienVault OSSIM, AccelOps, QRadar,

Prelude, ArcSight, IBM Tivoli, and Novel Sentinel)

are based on relational databases. The storage in

OSSIM includes a user-defined, searchable

knowledge base of incident solutions (AlienVault,

2011). AccelOps SIEM is designed to collect logs

generated by Cisco network and security devices,

and all the major network vendors’ devices. The

repository is implemented as online PostgreSQL

storage applied for log analysis in real time and for

historical log analysis (AccelOps, 2011). Qradar

stores the entire input event stream to enable

detailed forensics and compliance reporting (Miller

et al., 2011). Prelude (Prelude, 2011) supports three

databases: MySQL, PostgreSQL, and SQLlite.

ArcSight Logger 4 collects data in structured and

unstructured formats (Shenk, 2009). The system

implements role-based access and access through

a web-based interface, and intelligent and intuitive

search mechanisms with a visual query designer.

IBM Tivoli SIEM (Buecker et al., 2010) can provide

a long-lasting and compact storage of information

security events. The collected events are stored in a

database as text objects containing information

about incidents, management actions, correlation

rules, etc. Novell Sentinel Log Manager stores all

data in a compressed format (Novell, 2010). The

components of data storage use a file-based storage

and an indexing system. PostgreSQL is used for data

management.

One of the alternative solutions on data

representation in systems with complex data

structures is the ontological approach (OWL, 2009).

Using description logic, this approach can express

much easier the complex relationships between

entities. To represent an ontological meta-data we

suggest using RDFS (RDF Schema) (RDF, 2004).

RDF data model is a directed graph, which is based

on elementary statements (triples). A triple is a short

formal statement in the form of “subject-predicate-

object”. A triple store is a purpose-built database to

store and retrieve RDF metadata (Triplestore, 2010).

In addition to RDF and XML, OWL (Web Ontology

Language) can be chosen to represent data. OWL is

a language of Semantic Web, created to represent

ontologies (OWL, 2009). OWL presents ontology in

the form of documents, which can be stored and

transmitted in a global network. SWRL (Semantic

Web Rule Language) (SWRL, 2004) can be used to

specify rules. SWRL is a proposal for a Semantic

Web rule language, based on a combination of OWL

DataRepositoryforSecurityInformationandEventManagementinServiceInfrastructures

309

sublanguages with RuleML sublanguages (Parsia,

2005). SPARQL Protocol and RDF Query Language

(SPARQL) is a query language to the data presented

on the RDF model as well as the protocol for these

requests and responses (SPARQL, 2008).

We also considered systems and approaches for

logical reasoning, including Event Calculus (EC)

based on first-order logic (through an example of the

prototype we developed using CIFF and SICStus

Prolog) (Kowalski et al., 1986); (Kakas et al., 2003)

and Model checking based on linear temporal logic

using SPIN (SPIN, 2012).

3 ONTOLOGICAL DATA MODEL

The use of ontology is necessary approach that

enables to create a general model that can be flexibly

and quickly all the necessary concepts in SIEM

system in a particular area. Loose coupling of

domain ontologies and a modular approach to

development makes it easy to add, delete, and

support individual ontologies. In addition, the

components of ontologies may be dynamically

combined according to requirements during a

performance to meet specific application

requirements. In addition, changes in the ontological

data model require much less effort than the

relational model. Therefore, it is particularly

relevant in areas where it is needed to store different

types of information that can quickly change. These

include cyber security, as a whole, and SIEM

systems in particular.

Teymourian et al. (Teymourian et al., 2009)

consider the corporate semantic web technologies as

a very promising direction to improve the efficiency

of interaction between vendor services and their

applications, due to the possibility of efficient usage

of comprehensive enterprise-relevant knowledge. Li

et al. (Li et al., 2010) propose an ontology-driven

event processing framework as part of the

middleware for smart spaces. The authors develop

two key models that underlie their approach. As the

basis to build the SIEM ontological model the data

representation standards considered above are

valuable to use.

There is a number of works which suggest the

usage of these standards in security ontologies. For

example, (Guo et al., 2009); (Parmelee, 2010);

(Elahi et al., 2009) are devoted to building

ontologies for SCAP protocol standards.

(Heimbigner; 2011); (López de Vergara et al., 2004)

consider the translation of CIM into the ontology

representation. Within the task of designing the data

models and the repository of SIEM systems for

service infrastructures, we have developed an

ontology to represent the data model for Attack

Modeling and Security Evaluation Component

(AMSEC). The SCAP protocol was taken as the

basis to construct this model.

Figure 1 shows the ontology that describes the

concepts (for vulnerabilities, software/hardware

manufacturers and other concepts) and the hierarchy

and relationships between these concepts. The

ontology is made up of the data schema, which is

called the TBox (Terminology box), and the data

itself – ABox (Assertion box).

Description of the vulnerability is a certain

sequence of hardware components connected by

logical operators (AND, OR, NOT, AND, NOT

OR). In the ontology such relationships are

expressed as a set of axioms that allow bringing into

the data model the possibility of logical reasoning.

To specify the vulnerability in the relational data

model is hard task. It is stored as a string which

includes the entire list of vulnerabilities being parsed

programmatically. This process takes a considerable

amount of time and greatly increases the traffic to

communicate with the repository. Applying the

ontological approach allows solving the task of

submitting such data much more efficient, reducing

the sample size and speeding up the AMSEC

functioning. We expanded the ontological model for

vulnerabilities depicted in Figure 1 to specify the

risk assessment, the countermeasures, and other

concepts based on SCAP.

4 IMPLEMENTATION AND

TESTING

Our proposals for implementation of the repository

are, firstly, the recommendations on the choice of

DBMS. Of course, traditional and popular relational

DBMS (such as MySQL and PostgreSQL) together

with XML-based DBMS can be used, but for the

realization of an advanced ontology-based SIEM,

which includes possibilities for logical reasoning

and triplet stores are preferable.

In order to choose DBMS for repository, we

investigated a set of relational, XML-based, and

triplet stores. Regarding the choice of a relational

DBMS, the most popular at the moment are Oracle,

Microsoft SQL Server, Sybase, MySQL,

PostgreSQL. A number of popular SIEM systems

support these databases.

SECRYPT2012-InternationalConferenceonSecurityandCryptography

310

Figure 1: Ontological model for vulnerabilities.

Examples of XML-based data bases are Apache

XIndice, BaseX, Sedna, Gemfire Enterprise,

DOMSafeXML, eXist, MarkLogic Server,

MonetDB/XQuery, OZONE, Xpriori XMS, etc.).

Storage of triplets can be divided into two basic

groups: implemented as standalone solutions

(AllegroGraph, BigOWLIM and PelletDb), and parts

of complex enterprise semantic system stores

(Virtuoso, OpenAnzo and Semantics.Server).

According to the experience and the results of

load tests, as the best practical solution for storage at

the moment we proposed to use a hybrid approach

that combines storage of triplets, relational and

XML databases. This approach provides a balance in

flexibility of data manipulation, the effective use of

metadata and the acceptable processing speed.

The analysis of references shows that one of the

best solution is Virtuoso (Virtuoso, 2012) by

OpenLink Software Company. It combines the

support of all three types of storages, has an open-

source version that implements all necessary

languages and protocols for data access and also

supports a variety of necessary drivers.

SIEM modules access data through Web

services, described by the WSDL standard. In its

implementation, we have developed Web services

for data access functions to CRUD (create, read,

update, and delete). The WSDL is the link between

the clients and the repository; it allows to correctly

generate queries for accessing the repository. On the

other hand, it is a specification for development of

server-side Web service.

Implementation of the Web services was made in

Java. All Web services are implemented as stateless,

i.e. services do not share among themselves any

variables and objects. This allows you to run the

request from the client in a single thread on the

application server. Thus, a single service can handle

multiple threads of the same instances of classes.

This allows speeding up the processing of requests

and, if necessary, you can increase the number of

servers. Any server can handle any request. To

control this process use LoadBalancer which

monitors the status of servers, receives requests from

the client and forwards the request to the least-

loaded server.

The repository testing was made according the

integration repository with AMSEC of SIEM

system. Data repository updater downloads the open

databases of vulnerabilities, attacks, configuration,

DataRepositoryforSecurityInformationandEventManagementinServiceInfrastructures

311

weaknesses, platforms, and countermeasures from

the external environment. Specification generator

converts the information about network events,

configuration and security policy, from other SIEM

components or from users, into an internal

representation. Attack graph generator builds attack

graphs (or trees) by modeling sequences of

malefactor’s attack actions in the analyzed computer

network using information about available attack

actions of different types, services dependencies,

network configuration and used security policy.

Security evaluator generates combined objects of the

attack graphs (routes, threats) and service

dependencies, calculates the metrics of combined

objects, evaluates the common security level,

compares obtained results with requirements, finds

“weak” places, generates recommendations on

strengthening the security level. It performs

stochastic imitation of multi-step attacks against the

analyzed computer networks and determining the

consequences with regard to various

countermeasures and criteria defined by the decision

maker (for example, security measures/tools

effectiveness and efficiency against attacks,

maintainability, reliability, operational costs, etc.).

Security evaluator allows to select the solutions

(validated events and alerts, possible future security

events, countermeasures) needed for other MASSIF

SIEM components. Reports generator shows

vulnerabilities detected by AMSEC, represents

“weak” places, generates recommendations on

strengthening the security level and depicts other

relevant security information.

5 CONCLUSIONS

The paper has examined the main issues of data

model design and repository development for new

generation SIEM systems. The main innovations we

suggest are as follows: (1) ontological approach to

provide the necessary flexibility of data

representation in the repository and the possibility of

more accurate and high-quality results of queering;

(2) hybrid approach to implement the repository

which allows to integrate relational databases, XML

databases and stores of triplets; (3) advanced

repository architecture implemented and tested with

the data used for attack modeling in SIEM systems.

For this purpose, we analyzed the standards for

processing information about events, the most

common practical implementations of SIEM

repositories, and ontology related research papers.

We conducted a brief overview of various languages

that can be used for data representation and

manipulation. In addition, a comparative analysis of

logical reasoning languages was fulfilled. This

analysis allowed us to conclude that these languages

can be used to implement the ontological approach.

The ontological vulnerability model was

suggested. It is used in the repository for attack

modeling and security evaluation. To implement the

basic architecture of the repository, an approach

based on Service-Oriented Architecture was chosen.

We proposed to use a hybrid approach to storage

repository that provides a balance in flexibility of

data manipulation, the effective use of metadata and

the acceptable processing speed. Our proposals for

implementation of the repository are, firstly, the

recommendations on the choice of DBMS. Of

course, traditional and popular relational DBMS

(such as MySQL and PostgreSQL) together with

XML-based DBMS can be used, but for the

realization of an advanced ontology-based SIEM,

which includes possibilities of developed logical

reasoning, the triplet stores are preferable.

Therefore, for full support of different

information models being developed in the

MASSIF, we suggest to use in the repository

a hybrid approach. This approach combines the

possibilities of relational, XML-based and triplet

stores. As a practical solution, we propose to use the

Universal Server Virtuoso. It combines the support

of all three types of storages, has an open-source

version that implements all necessary languages and

protocols for data access and also supports a variety

of necessary drivers.

The paper also considers the task of integrating

the repository with other SIEM components through

an example of developing and implementing Attack

Modeling and Security Evaluation Component.

Proposed ontological approach allows making

AMSEC data more accurate, requires no further

software processing, and thus improves the

repository performance.

In further research we are planning to expand the

proposed ontology, as well as to add to the

repository different services that provide data

security, verification of security properties and

policies, etc. In addition, we are going to explore the

issues of logical reasoning based on ontology

repository, as well as the development of

mechanisms for data visualization.

ACKNOWLEDGEMENTS

This research is being supported by grants of the

SECRYPT2012-InternationalConferenceonSecurityandCryptography

312

Russian Foundation of Basic Research (projects

#10-01-00826 and #11-07-00435), the Program of

fundamental research of the Department for

Nanotechnologies and Informational Technologies

of the Russian Academy of Sciences, the State

contract #11.519.11.4008 and by the EU as part of

the SecFutur and MASSIF projects.

REFERENCES

AccelOps, 2011. AccelOps Security Information & Event

Management (SIEM). http://www.accelops.com/

product/siem.php.

AlienVault, 2011. AlienVault Unified SIEM System

description. AlienVault, Campbell, CA. 36 p.

Buecker, A., Amado, J., Druker, D., Lorenz C.,

Muehlenbrock, F., Tan, R., 2010. IT Security

Compliance Management Design Guide with IBM

Tivoli Security Information and Event Manager. IBM

Redbooks.

CIM, 2011. Common Information Model (CIM), DMTF.

Website. http://dmtf.org/standards/ cim.

Elahi, G., Yu, E., Zannone, N., 2009. A Modeling

Ontology for Integrating Vulnerabilities into Security

Requirements Conceptual Foundations. In ER'09 Proc.

28th International Conference on Conceptual

Modeling. Springer-Verlag Berlin, Heidelberg.

Guo, M, Wang, J, 2009. An Ontology-based Approach to

Model Common Vulnerabilities and Exposures in

Information Security. In ASEE Southeast Section

Conference.

Heimbigner, 2011. D. DMTF - CIM to OWL: A Case

Study in Ontology Conversion. http://

www.docstoc.com/docs/23281194/DMTF---CIM-to-

OWL-A-Case-Study-in-Ontology-Conversion.

Ingols, K., Chu, M., Lippmann, R., Webster, S., Boyer, S.,

2009. Modeling modern network attacks and

countermeasures using attack graphs. In Proceedings

of the 2009 Annual Computer Security Applications

Conference (ACSAC ’09), Washington, D.C., USA,

IEEE Computer Society.

Kakas, A., Kowalski, R., Toni, F., 2003. Abductive Logic

Programming. In Journal of Logic and Computation,

V.2, No.6.

Kotenko, I., Stepashkin, M., 2006. Attack Graph based

Evaluation of Network Security. In Lecture Notes in

Computer Science, Vol. 4237, 2006.

Kowalski, R., Sergot, M., 1986. A logic-based calculus of

events. New Generation Computing, V.4.

Li, Z., Chu, C.-H., Yao, W., Behr, R. A., 2010. Ontology-

Driven Event Detection and Indexing in Smart Spaces.

In The 4th IEEE International Conference on

Semantic Computing, September 22-24, Carnegie

Mellon University, Pittsburgh, PA, USA.

López de Vergara, J., Villagrá, V., Berrocal, J., 2004.

Applying the Web Ontology Language to management

information definitions. In IEEE Communications

Magazine. Vol.42, pp.58-74.

Marco, D., Jennings, M., 2004. Universal Meta Data

Models. Wiley.

MASSIF, 2011. Website. http://www.massif-project.eu.

Miller, D., Harris, S., Harper, A., VanDyke, S., Blask, C.,

2011. Security information and event management

(SIEM) implementation. McGraw-Hill Companies.

Novell, 2010. Novell Sentinel Log Manager 1.0.0.5.

Installation Guide.

Ogle, D., Kreger, H., Salahshour, A., Cornpropst, J.,

Labadie, E., Chessell, M., Horn, B., Gerken, J.,

Schoech, J., Wamboldt, M., 2004.

Canonical Situation

Data Format: The Common Base Event V1.0.1.

International Business Machines Corporation.

OWL, 2009. OWL 2 Web Ontology Language Document

Overview. W3C Recommendation 27 October 2009.

http://www.w3.org/TR/owl2-overview .

Parmelee, M, 2010. Toward an Ontology Architecture for

Cyber-Security Standards. The MITRE Corporation.

Parsia, B., 2005. Cautiously Approaching SWRL.

http://en.wikipedia.org/wiki/PDF.

Prelude, 2011. Prelude Pro 1.0. http://www.prelude-

technologies.com/en/welcome/index.html

RDF, 2004. RDF Vocabulary Description Language 1.0:

RDF Schema. W3C Recommendation 10 February

2004. http://www.w3.org/TR/rdf-schema.

SCAP, 2011. The Security Content Automation Protocol

(SCAP). Website. http://scap.nist.gov.

Shenk, J., 2009. ArcSight Logger 4. Combat Cybercrime,

Demonstrate Compliance and Streamline IT

Operations. A SANS Whitepaper. January 2009.

http://www.arcsight.com/collateral/whitepapers/ArcSi

ght_Combat_Cyber_Crime_with_Logger.pdf .

SPARQL, 2008. SPARQL Query Language for RDF.

W3C Recommendation, 15 January 2008.

http://www.w3.org/TR/rdf-sparql-query

SPIN, 2012. ON-THE-FLY, LTL MODEL CHECKING

with SPIN. http://spinroot.com/spin/whatispin.html

Stevens, M, 2005. Security Information and Event

Management (SIEM). In The NEbraskaCERT

Conference, August 9-11, 2005. http://www.certconf.

org/presentations/2005/files/WC4.pdf.

SWRL, 2004. SWRL: A Semantic Web Rule Language

Combining OWL and RuleML. W3C Member

Submission 21 May 2004.

http://www.w3.org/Submission/SWRL/

Teymourian, K., Paschke, A., 2009. Towards Semantic

Event Processing. In Proceedings of the Third ACM

International Conference on Distributed Event-Based

Systems (DEBS '09). ACM. New York.

Triplestore, 2010. Triple Store Evaluation Analysis

Report. Revelytix, Inc.

Vernooy-Gerritsen, M., 2009. Emerging Standards for

Enhanced Publications and Repository Technology.

Amsterdam University Press.

Virtuoso, 2012. http://virtuoso.openlinksw.com

DataRepositoryforSecurityInformationandEventManagementinServiceInfrastructures

313