in order to perform the task efficiently; this structure
can be computed in low polynomial time (even linear,
if one uses the randomized algorithm of (Karger, Pan-
igrahi 2009)), which makes our algorithm especially
suitable for databases consisting of a single large
graph. The mincut structure of a graph also allows
us to increase frequent patterns by more than just a
node or an edge at a time than the standard approach.
We also prove the optimality of this algorithm by
showing that every frequent subgraph produced by
our algorithm (even if it is only used as a building
block for a supergraph satisfying edge connectivity
constraints) has to be produced by a competing
algorithm.
This paper is organized as follows. Section 2 con-
tains the basic definitions and graph theoretic facts
required for our approach. Section 3 describes the
algorithm and contains proofs of the algorithm’s cor-
rectness. Section 4 contains the proof of algorithm’s
optimality.
2 STATEMENT OF THE
PROBLEM
2.1 Basic Definitions
In this paper, we deal with undirected labeled graphs.
In a graph G = (V, E), V denotes the node set, E ⊆
V ×V denotes the edge set, and each node v ∈ V has a
label l(v). A graph G
′
= (V
′
,E
′
) is called a subgraph
of G, denoted by G
′
⊆ G, if V
′
⊆ V, E
′
⊆ E and ev-
ery edge in E
′
has both ends in V
′
. G
′
is an induced
subgraph of G if it is a subgraph of G and for every
pair of nodes v,u ∈ V
′
such that (u,v) is an edge of G,
(u,v) is also an edge of G
′
.
A graph G = (V, E) is disconnected if there exists
a partitionV
1
,V
2
ofV so that no edge in E has one end
in V
1
and another in V
2
. If no such partition exists,
G is called connected. G is called k-edge-connected,
k ∈ N, if G is connected and one has to remove at least
k edges from E to make G disconnected.
A partition of edge set E into X ⊂ E and X :=
E \ X is called a cut. Removing all edges having one
end in X and another in X (called (X, X)-edges) from
G disconnects the graph. The size of a cut (X,X) is
the number of (X,X)-edges, denoted |(X,X)|. The
(X,X)-edges whose removal disconnects the graph
are often also called a cut. A cut of minimum size
is called a minimum cut or a mincut. The smallest
size of a cut in a graph is the edge connectivity of a
graph. In general, for two foreign subsets X,Y ⊂ V
we denote by |(X,Y)| the number of edges in G with
one end in X and another in Y.
We study the problem of graph mining in the fol-
lowing setting: our database is a single large undi-
rected labeled graph G. We are given a user-supplied
support threshold S ∈ N and a connectivity constraint
k and we are looking for all k-edge-connected sub-
graphs of G with a count of at least S (these sub-
graphs are called frequent). The count of a graph
in a database is determined by a function count()
that satisfies the downward closure property: for all
subgraphs g
1
,g
2
of any database graph D such that
g
1
⊆ g
2
we always have count(g
1
,D) ≥ count(g
2
,D).
The main idea of our approach is to employ the spe-
cial structure of mincuts in the database graph in or-
der to make the search for frequent k-edge-connected
subgraphs faster.
2.2 The Cactus Structure of Mincuts
An unweighted undirected multigraph is called a cac-
tus if each edge is contained in exactly one cycle
(i.e., any pair of cycles has at most one node in com-
mon). Dinitz, Karzanov and Lomonosov showed in
(Dinits,Karzanov,Lomonosov1976) that all minimum
cuts in a given graph with n vertices can be repre-
sented as a cactus of size 0(n). This cactus represen-
tation plays an important role in solving many con-
nectivity problems, and we use it here for the efficient
mining of graphs with connectivity constraints.
Formally, let G = (V,E) be an undirected multi-
graph and let {V
1
,...,V
n
} be a partition of V. We de-
note the set of all minimum cuts of G by Cuts(G) .
Let R = (V
R
,E
R
) be a multigraph with node set V
R
:=
{V
1
,...,V
n
} and edge set E
R
:= {(V
i
,V
j
) | (v
i
,v
j
) ∈
E, v
i
∈ V
i
,v
j
∈ V
j
}.
Definition 1. R is a cactus representation of Cuts(G)
if there exists a one-to-one correspondence ρ :
Cuts(G) → Cuts(R) such that for every mincut
(X,X) ∈ Cuts(G) ρ((X,X)) ∈ Cuts(R) and for every
mincut (X, X) ∈ Cuts(R) ρ
−1
((X,X)) ∈ Cuts(G).
Dinitz, Karzanov and Lomonosov
(Dinits,Karzanov,Lomonosov 1976) have proved
that for any undirected multigraph, there exists a
cactus representation (in fact, they showed that this
is always true for any weighted multigraph). A dual
graph to any cactus representation, if the cactus
cycles are taken as nodes, is a tree. The size of a
cactus tree is linear in the number of vertices in the
original graph, and any cut can be retrieved from the
cactus representation in time linearly proportional to
the size of the cut. In addition, the cactus displays
explicitly all nesting and intersection relations among
minimum cuts. Note that a graph can have at most
n
2
mincuts, where n is the size of graph’s node
KDIR 2011 - International Conference on Knowledge Discovery and Information Retrieval
6