ATTACK GRAPH GENERATION WITH INFUSED
FUZZY CLUSTERING
Sudip Misra
1
, Mohammad S. Obaidat
2
1
School of Information Technology, Indian Institute of Technology, Kharagpur, India
2
Department of Computer Science, Monmouth University, New Jersey, U.S.A.
Atig Bagchi
3
, Ravindara Bhatt
4
, Soumalya Ghosh
1
3
Department of Computer Sc. & Eng., Indian Institute of Technology, Kharagpur, India
4
Dept. of Electronics and Elect. Comm. Eng., Indian Institute of Technology, Kharagpur, India
Keywords: Attack Graph, Connectivity Matrix, Privilege Matrix, Fuzzy Logic Clustering, Computer Network Security.
Abstract: Modern networks have been growing rapidly in size and complexity, making manual vulnerability
assessment and mitigation impractical. Automation of these tasks is desired (Obaidat and Boudriga, 2007;
Bhattacharya et al., 2008). Existing network security tools can be classified into the following two
approaches: proactive (such as vulnerability scanning and use of firewalls) and reactive (intrusion detection
system). The modus operandi of proactive approaches have an edge over the reactive ones as they have
threat information prior to the attack. One approach, viz., generation and analysis of attack graphs, in this
class has gained popularity. In this paper, we present an algorithm to automatically generate attack graphs
based on the prevalent network conditions. The nodes in the graph that are generated by executing our
proposed algorithm have been grouped based on logical graph paradigm which helps in visualizing the
dependencies among various initial and generated network configurations towards obtaining the attacker’s
goal. In addition, fuzzy logic based clustering has been applied on the generated data corresponding to each
such group. This form of clustering is beneficial, because in the real world the boundaries between clusters
are indistinct. This form of clustering leads to better visualization of the attack graph.
Our goal is to design and develop an efficient approach for automatic attack graph generation and
visualization. The approach uses attack graph generation algorithm, and requires network initial conditions
as input. Fuzzy logic based clustering, Fuzzy C-Means (FCM) (Bezdek, 1981), is applied at the output of
attack graph generation algorithm to improve visualization. Our approach helps network administrator to
visualize attack graph in an efficient way. This reduces the burden of network administrator to a larger
extent.
1 INTRODUCTION
In the recent years, the volume of network traffic has
increased monotonically. The proliferation of the
Internet has made organizations vulnerable to cyber
attacks. Along with this, the sophistication levels of
contemporary cyber attacks as well as their severity
and anonymity have also increased.
Present day security technology can broadly be
classified as proactive (such as vulnerability scanning
and use of firewalls) and reactive (intrusion detection
system). Proactive technology aims at identifying
vulnerabilities, which a malicious attacker can
exploit, and mitigating the risk involved before these
vulnerabilities can be exploited. On the other hand,
the reactive technology bases on the analysis and
mitigation of network attacks after the attacks are
detected. To overcome the passive nature of reactive
risk management approach, the proactive methods
are favored.
Proactive methods are extensively used in
enterprise networks spanning several hosts and
subnets. Such networks typically span multiple
platforms, software packages and employ several
modes of connectivity (Sheynar, 2004). Furthermore,
organizational perimeters have been rapidly
increasing as a consequence of globalization. Such
diversified configurations present a multitude of
vulnerabilities, which can be exploited by attackers.
Existing vulnerability scanners are able to detect such
92
Misra S., Obaidat M., Bagchi A., Bhatt R. and Ghosh S. (2009).
ATTACK GRAPH GENERATION WITH INFUSED FUZZY CLUSTERING.
In Proceedings of the International Conference on Security and Cryptography, pages 92-98
DOI: 10.5220/0002277000920098
Copyright
c
SciTePress
vulnerabilities in isolation, i.e., they detect
vulnerabilities present per service per host. Normally,
an attacker will typically break into a network, by
exploiting a sequence of vulnerabilities and their
corresponding exploits, where the post-condition of
each exploit satisfies the precondition for subsequent
exploits and forms a causal relationship among them
(Sheynar, 2004).
The task of vulnerability detection is very
challenging for an administrator who needs to
consider the interactions of local isolated
vulnerabilities and find global security holes due to
such a correlation (Sheynar, 2004). Such a logical
sequence is known as attack path. Combination of all
possible attack paths over a given network forms an
attack graph. Attack graphs, which are largely used
by present day system administrators for network
monitoring, determine if designated goals can be
reached by the attacker starting from initial state
(Lippmann and Ingols, 2005). Moreover, the current
focus is towards an amalgamation of an automated
attack graph with the network’s intrusion detection
systems (IDS) to perform real-time analysis of the
attacks. The generated attack graph can be extended
to serve the purpose of network monitoring and
alarming. It has been seen that real life networks,
modeled as graphs, have millions of edges, which
makes the resultant graph incomprehensible for an
administrator.
Since the attack graphs generated on
organizational networks are very large and complex,
there arises a need to extract information to be
presented to the administrator. This process of
extracting of previously unknown information from a
large collection of data is known as Data Mining
(Han and Kamber, 2001). Data Mining can be
applied to increase the readability of the attack graph
as well as maintaining information. Clustering, a data
mining technique, concerns the grouping of similar
data points. The fuzzy clustering technique, FCM,
was first introduced by Dunn (Dunn, 1974) and later
extended by Bezdek (Bezdek, 1981). Fuzzy
clustering is advantageous over traditional clustering
techniques as in real applications there are no sharp
boundaries between clusters. The FCM technique can
be applied on attack graphs to increase a graph’s
readability. FCM can help in deciphering patterns in
the data latent in attack graphs.
In this paper, we have proposed an attack graph
generation algorithm. Our algorithm takes initial
network conditions to generate the attack graph. In
order to increase the readability of the attack graph,
FCM was applied on the output of the graph
generation algorithm. The rest of the paper is
organized as follows. Section 2 describes the related
work. This is followed by the proposed algorithm in
Section 3. Section 4 discusses the results obtained.
The conclusion and future work have been described
in Section 5.
2 RELATED WORK
An attack graph of a network is a representation of
all possible attack paths on the networks, given an
initial set of capabilities to an attacker. It can be used
as a tool for qualitative and quantitative analysis of
security attributes and vulnerabilities. One of the
earliest works of the attack graph was done by
Moskowithz and Kang (Moskowithz and Kang,
1997), in which the authors used a graph based
technique to identify the possible loop-holes, using
probability, in a network and represents insecurity
flow. The algorithm proposed by them runs in
exponential time. Phillips and Swiler (Phillips and
Swiler, 1998) provided a formal definition of attack
graphs. They designed a tool, founded on graph-
based approach to network vulnerability that
identifies the set of attack paths having high
probability of success for an attacker. Their
approach represents attack states and transitions
between them, and was based on attack graphs.
However, the attack graph of realistic size was not
generated by them. Swiler et al. (Swiler et al., 2001)
also described an attack graph generation tool for
assessment of security attributes and vulnerabilities
in computer networks. The input provided to the tool
includes pre- and post-conditions, network
information and attacker capabilities. The tool was
used to build shortest path (s) to the specified goals.
It also provided grouping of hosts representing
similar network conditions (e.g., grouping of hosts
on a LAN), and handling of unknown values (default
values, if some missing values exist).The tool has
some drawbacks such as poor scalability, and
manual input (Lippmann and Ingols, 2005).
Ou et al.
(Ou et al., 2006) presented a logical attack graph
algorithm using formal methods. The nodes in the
graph can be classified as fact nodes and derivation
nodes. The fact nodes can be further be divided into
primitive fact nodes and derived fact nodes. Each
fact node is labeled with a logical statement, which
represents a network configuration such as services
running, privileges, and connectivity. A derivation
node takes as input one or more fact nodes, which
together satisfy the pre-conditions of the rule
representing the derivation node. This node serves as
a medium between the set of conjunctive pre-
conditions and post-conditions which occur as a
result of exploiting the vulnerability corresponding
to that rule. The node corresponding to the post-
ATTACK GRAPH GENERATION WITH INFUSED FUZZY CLUSTERING
93
condition is a derived fact node. The algorithm has
asymptotic CPU time between Ο (n
2
) and Ο (n
3
),
where n represents the number of hosts in a network.
However, their algorithm requires one to express
network condition as a propositional formula
(Sheyner et al., 2002).
Visualization plays an important role in attack
graph readability and analysis. The readability of an
attack graph can be increased by employing data
mining approaches (such as traditional clustering or
FCM (Bezdek, 1981)), as well as maintaining
information. In FCM, mentioned in Section 1, each
data point can belong to a cluster specified by a
membership grade, between 0 and 1 (both inclusive).
The FCM partitions a collection of n data points into
c fuzzy clusters (where c<n), and simultaneously
seeking the best possible locations of these clusters.
For example, 200 data points can be partitioned into
4 clusters. The number of clusters is user defined.
The distance measure that forms the usual FCM
algorithm is Euclidean distance. FCM can help in
deciphering patterns in the attack graphs.
We proposed an attack graph generation
algorithm which can be used on large networks. The
output obtained by applying our attack graph
generation algorithm is clustered using the FCM
algorithm. Moreover, our algorithm runs in Ο (n
3
).
3 PROPOSED ALGORITHM
In this Section, we propose the algorithm to
automatically generate an attack graph when the
initial network conditions are provided. Our proposed
algorithm works with nodes of type base & derived
fact nodes, and rule nodes (Ou et al., 2006).
The proposed algorithm, described in Section 3.1,
requires the following inputs: Privilege-matrix
(privilege level over machine), Connection-matrix
(machine connectivity for services). In addition, the
following data structures are used: Ruleset (pre-
condition and post-condition privilege levels for each
service), and Label (nodes in the graph). The nodes
set can be of three types: Base-fact nodes (initial
network conditions), derived-fact nodes, and rule-
nodes. The derived-fact node and rule-nodes are
dynamically created and populated as algorithm
advances.
The graph generation module relies on many
matrices which need to be maintained throughout the
run and change dynamically as per requirements to
generate new nodes which are classified into types
mentioned earlier.
Once an attack graph has been generated,
different matrix operations can be performed on the
final graph’s adjacency matrix for various IDS-based
integration related work, as mentioned in Section 1.
Along with this, as mentioned already, there is a need
to increase the readability of the generated graph,
since they can be extremely large in size and
complexity. We have focused on increasing the
readability of the output graph by employing
clustering methods. We have used, FCM (Bezdek,
1981) (Section 3.2), a fuzzy logic-based clustering
method for the purpose. This method has been
applied to represent clusters. Clustering can greatly
reduce overhead by reducing the amount of data to be
visualized. The output of the graph generation
algorithm is fed as input to FCM. This result in
clustering the attack graph generated data set into
user defined clusters.
3.1 Graph Generation Algorithm
Initialization:
Number of hosts in the network: n
Number of attackers in the network: 1
1) Initialize Privilege-matrix (Priv), which is a
(n+1) field row vector. Fill it up with values
0 for no privilege, 1 for user privilege and 2
for root privilege. The (n+1) values have
been used as one machine is the attacker
itself. Initially, the attacker has root
privilege only over his own machine.
2) Fill in Connection-matrix (Conn), which is
one n×n binary matrix for each of the s
services. Each n×n matrix is to be filled
with a 1 if there exists a machine
connectivity over that service.
3) Make a Ruleset (Ruleset) for each service
and fill it up with precondition and post-
condition privilege levels for each service.
We assume that each service is vulnerable.
4) A null set Label (Label) is added to
identify nodes of the graph.
5) Three node sets are initialized. Base fact
nodes contain initial network condition.
This will have one label each for every
service running on every machine. Attacker
privilege is also to be added here. Derived
fact nodes and rule nodes are dynamically
created when the algorithm runs.
6) An empty set of edges also needs to be fed
to the algorithm as input.
SECRYPT 2009 - International Conference on Security and Cryptography
94
Input:
1) Priv[n+1][n+1] Å Has Values 0(none),
1(user), 2(root), Init: Priv[i][i] Å 2, rest 0.
2) Conn[i][j][k] Å Is 1 if machine j can
connect to k via service i.
3) Ruleset[s] Å Rules for each service
Ruleset[i] Å Struct with 2 fields pre
(int) and post (int) privilege levels for
service i.
4) Label Å Label to identify node
5) Node sets:
a) Base Å Contains labels of base fact
nodes
b) Derived Å Contains labels of derived
fact nodes
c) Rule Å Contains exploit labels
6) Edges Å of form (i,j) for edge from node i
to j
Output:
1) Graph nodes and edges
Algorithm:
1) Loop over each service s
a) Check for each node i:
If attacker has precondition privilege level
or above on machine i and machine i can
connect to service s running on machine
some machine j, then do the following:
i.) Priv(k) Å max(current value of Priv(k),
postcondition privilege level as per
Ruleset).
ii.) Put the label of the new privilege in
derived fact
node set.
iii.) Make edge from service node
corresponding to s
and old privilege node to rule node (for this
service s on machine j from machine i).
iv.) Make and edge from that rule node to
new
privilege node
2) Go to line 1 if any new node is added.
3.2 Flow Chart of the Graph
Generation Algorithm
The control flow of the graph generation algorithm
is elucidated in Figure 1.
3.3 Fuzzy Clustering Algorithm
The output of graph generation algorithm (Section
3.1) is used as input to the FCM algorithm
(Bhattacharya et al., 2008). The output is arranged
into a matrix. The columns of the matrix are various
attributes of the attack graph. These attributes
include service running on a particular machine,
source identification (I.D.) of the machine, privilege
on the machine, target identification of the machine,
and privilege on target machine. The data points are
grouped into various clusters (user defined). These
clusters have “fuzzy” boundaries, in the sense that
each data value belongs to every cluster to some
degree.
Figure 1: Graph generation algorithm.
FCM Algorithm (adopted from (Bhattacharya et al.,
2008)):
Input:
(Graph nodes and edges)
1) Let x
k
be the k
th
(possibly m – dimensional
vector) data point (k = 1, 2, …, n). In our
case nÅ22, and m Å 5 (Refer Table 2).
2) Membership matrix.
3) Number of clusters.
Output:
1) n data points are clustered into c fuzzy
clusters where (c<n).
ATTACK GRAPH GENERATION WITH INFUSED FUZZY CLUSTERING
95
Algorithm:
1) The first phase is initialization. In
initialization phase the membership matrix
M, and number of clusters are initialized
with random values 0 and 1.
2) The initialization phase is followed by
iteration phase. In this phase cluster is
computed according to the objective
function until the objective function reaches
a specified (user defined) threshold.
3) The termination phase is the last phase. It
signifies that algorithm has reached a stable
phase
4 RESULTS
4.1 Test Network
The network shown in Figure 2 is a simulated
network consisting of one attacker, one host and a
screened subnet having three hosts. It has been
adopted from the network considered by Sheynar
(Sheynar, 2004). The network consists of four hosts:
Host 0 (H0), Host 1(H1), Host 2 (H2), and Host
3(H3). The system characteristics are shown in
Table 1; the Connectivity Matrix.
4.2 Connectivity Matrix
The Connectivity Matrix shown in Table 1 is the
input to graph generation algorithm.
Table 1: Connectivity Matrix.
To
Attacker
H0
H1
H2
H3
From
Attacker
- IIS_Web_Service None None None
H0
- - ftp,ssh net Squid
H1
- IIS_Web_Service - net Squid
H2
- IIS_Web_Service ftp,ssh - Squid
H3
- IIS_Web_Service ftp,ssh ---- -
4.3 Results of Graph Generation
Algorithm
The result of graph generation algorithm (Section
3.1) on Table 1 (Connectivity Matrix) is shown in
Table 2. Each row in the table represents one run of
the graph generation algorithm. Further, results
shown in Table 2 are used to generate Figure 3
(Attack Graph).
Table 2: Data points for clustering.
Sl.No. Service
Number
Source
ID
Source
Privilege
Target
Number
Target
Privilege
1 0 4 2 0 2
2 1 0 2 1 1
3 2 0 2 1 2
4 3 0 2 3 2
5 3 1 2 3 2
6 4 0 2 2 1
7 4 1 2 2 1
8 0 1 2 0 2
9 0 2 1 0 2
10 0 3 2 0 3
11 0 4 2 0 2
12 1 0 2 1 2
13 1 2 1 1 2
14 1 3 2 1 2
15 2 0 2 1 2
16 2 2 1 1 2
17 2 3 2 1 2
18 3 0 2 3 2
19 3 1 2 3 2
20 3 2 1 3 2
21 4 0 2 2 1
These attributes in Table 2 include service
running on a particular machine (Service number),
source identification (Source ID) of the machine,
privilege on the source machine (Source privilege),
target identification of the machine (Target number),
and privilege on target machine (Target privilege).
The various attributes in Table 2 are given some
number as per the conventions mentioned below:
1) The service number can be of the
following four types:
a) IIS_Web_Service : 0
b) ftp: 1
c) ssh: 2
d) sqid: 3
e) netbios:4
2) The Source ID and Target Number can be
of following four types:
a) Host 0 : 0
b) Host 1: 1
c) Host 2: 2
d) Host 3: 3
e) Attacker :4
3) The Source Privilege of following four
types:
a) No Privilege: 0
b) User Privilege: 1
c) Root Privilege: 2
SECRYPT 2009 - International Conference on Security and Cryptography
96
Figure 2: Test network.
ftp(1)
(0,2)
Figure 3: Attack Graph Generation.
4.4 Results of FCM Algorithm
The result of FCM algorithm on the output of the
graph generation algorithm, Table 2 (Data Points for
Clustering) results in cluster formation shown in
Figure 4 and Figure 5.
As can be inferred from the clusters formed, if we
use the number of clusters to be equal to the number
of hosts in the network, then each cluster
approximately aggregates on a per host basis. This is
shown in Figure 4. On the other hand, if a different
grouping is used (in case of three partitions), clusters
aggregate approximately on the basis of different
levels in attack paths used, such as entry level, mid
level and exit level. This is shown in Figure 5.
Data Point Clustering Using Clustering Size =3
0
1
2
3
0 5 10 15 20 25
Data Points
Cluster Number
Cluster Number
Figure 4: Data Points Clustering using Cluster Size = 3.
IIS_b(2,0)
IIS_b(4,0)
IIS(0)
(4,2)
(0,2)
ftp_r(0,1)
ssh_b(0,1)
ssh(1)
(1,2)
squid_r(0,3)
Squid(3)
(3,2)
squid_r(1,3)
net_n(0,2)
net(2)
(2,1)
net_n(1,2)
IIS_b(1,0)
IIS_b(3,0)
ftp_r(2,1)
ftp_r(3,1)
ssh_b(2,1)
ssh_b(3,1)
squid_r(2,3)
ATTACK GRAPH GENERATION WITH INFUSED FUZZY CLUSTERING
97
Data Point Clustering Using Cluster Size =5
0
1
2
3
4
5
0 5 10 15 20 25
Data Points
Cluster Number
Cluster Number
Figure 5: Data Points Clustering using Cluster Size = 5.
5 CONCLUSIONS AND FUTURE
WORK
In this paper, we have proposed a graph generation
algorithm. Our algorithm runs in Ο (n
3
)
computational time. The algorithm currently does
not prevent cycles. Moreover, clustering improves
visualization on attack graph. Clustering greatly
reduces overhead in IDS operations by reducing the
amount of data to process as in each case as it
generates super nodes.
We are currently working on improving the
computational efficiency of the algorithm by
utilizing matrix multiplication methods so that
building graphs for large network takes less time.
Further on the output data set of our proposed
algorithm, we intend to perform “false threat” and
“missed threat” detection in context of IDS alarms.
REFERENCES
M. S. Obaidat and N. Boudriga,” Security of e-Systems
and Computer Netwokrs,” Cambrdige Univeristy
Press, 2007.
S. Bhattacharya, S. Malhotra, S.K. Ghosh, “A Scalable
Representation towards Attack Graph Generation”,
Proceedings of the 2008 1st International Conference
on Information Technology, (IT 2008), 19-21 May
2008, Gdansk, Poland.
J.C. Bezdek, “Patten Recognition with Fuzzy Objective
Function Algorithms”, Plenum Press, New York,
1981.
O. M. Sheynar, “Scenario Graphs and Attack Graph,” PhD
Thesis, Carnegei Mellan University, USA, April 2004.
R. P. Lippmann, and K. W. Ingols, “An Annotated review
of past papers on attack graphs,” Project report IA-1,
Linchon Laboratory, MIT, 31
st
March 2005.
J. Han, M. Kamber, Data Mining: Concepts and
Techniques, Morgan Kaufmann Publishers 2001.
J. C. Dunn., “A fuzzy relative of the ISODATA process
and its use in detecting compact well seperated
clusters”, J. Cybernetics, Vol. 3, pp. 32-57, 1974.
I. S. Moskowithz, and M. H. Kang., “An insecurity flow
model”, In Proceedings of the 6
th
New Security
Paradigms Workshop, Langdale,UK, pp. 61-74, 1997.
C. Phillips, L. P. Swiler, “A graph-based system for
network-vulnerability analysis”, In Proceedings of the
Workshop on New Security Paradigms (NSPW), pp.
71-79, 22-26 September 1998.
L. P. Swiler, C. Phillips, D. Ellis, and S. Chakerial ,”
Computer- Attack Graph Generation Tool,”
Proceedings of the Second DARPA Information
Survivability Conference and Exposition (DISCEX II),
Volume II, pp. 307-321, IEEE Computer Society,
2001.
X. Ou, W. F. Boyer, M. A. McQueen, “A Scalable
Approach to Attack Graph Generation”, Proceedings
of the 13th ACM conference on Computer and
Communications Security (CCS), Alexandria,
Virginia, USA, pp. 336-345, 30 October - 3 November
2006.
O. Sheyner, J. Haines, S. Jha, R. Lippmann, and J.
M.Wing, “Automated generation and analysis of
attack graphs,” In Proceedings of the 2002 IEEE
Symposium on Security and Privacy, pages 254–265,
2002.
SECRYPT 2009 - International Conference on Security and Cryptography
98