illustration. So, a reachability checking needs only
O(1) time.
Figure 2: Graph encoding.
Note that the above method can also be used for
cyclic graphs (graphs containing cycles) since we
can always transform a cyclic graph to a DAG by
identifying all the strongly connected components
(SCCs) and then collapse each of them into a
representative node. Clearly, all of the nodes in an
SCC is equivalent to its representative as far as
reachability is concerned (Wang
et al., 2006). Using
Tarjan’s algorithm (Tarjan, 1972), all SCCs in G can
be found in O(n + e) time.
This idea was first suggested by Jagadish
(Jagadish, 1991). However, his algorithm needs
O(n
3
) time to decompose a DAG into a minimal set
of disjoint chains (see page 566 in Jagadish, 1991).
For this reason, Jagadish suggested a heuristic
method to decompose a DAG into a set of paths and
then stitch some paths together to form a chain. In
doing so, the number of the produced chains is
normally much larger than the minimum number of
chains, increasing significantly both space and query
time.
In (Y. Chen and Y. Chen, 2008), Chen discussed
a new algorithm to do the task. It requires only O(n
2
+ bn
b
) time, where b is the DAG’s width, defined
to be the size of a largest subset of pairwise unreach-
able nodes. Unfortunately, in some cases, the chain
set found using Chen’s algorithm is not always
minimum.
In this paper, we propose a new algorithm to
decompose a DAG into a minimal set of disjoint
chains. The time complexity of the new algorithm is
still bounded by
O(n
2
+ bn
b
).
The rest of the paper is organized as follows. In
Section 2, we present our algorithm in detail. Then,
in Section 3, we analyze the time complexity.
Finally, a short conclusion is set forth in Section 4.
2 ALGORITHM DESCRIPTION
In this section, we give our new algorithm, which is
inspired by Chen’s algorithm. However, to remove
the problem in Chen’s algorithm, we devise two new
procedures for generating chains and resolving
virtual nodes, respectively.
First, for the chain generation, we distinguish
between two kinds of virtual nodes and handle them
in different ways so that the reachability between
nodes can be transferred bottom-up by using such
virtual nodes.
Second, for the virtual node resolution, a new
data structure, the so-called combined alternating
graph, is constructed so that the number of virtual
nodes resolved at each level is maximized.
In the following, we first discuss how a DAG
can be decomposed into disjoint chains which may
contain virtual nodes in 2.1. Then, in 2.2, we show
how the virtual nodes can be resolved.
2.1 DAG Stratification and Chain
Generation
As with Chen’s algorithm, our algorithm works in
three phases: DAG stratification, chain generation,
and virtual node resolution.
In the first phase, a DAG G(V, E) is stratified
into several levels V
0
, ..., V
h-1
such that V = V
0
∪ ...
∪ V
h-1
and each node in V
i
has its children appearing
only in V
i-1
, ..., V
1
(i = 2, ..., h), where h is the height
of G, i.e., the length of the longest path in G. For
each node v in V
i
, its level is said to be i, denoted l(v)
= i. In addition, C
j
(v) (j < i) represents a set of links
with each pointing to one of v’s children, which
appears in V
j
. Therefore, for each v in V
i
, there exist
i
1
, ..., i
k
(i
l
< i, l = 1, ..., k) such that the set of its
children equals
)(
1
vC
i
∪ ... ∪
)(vC
k
i
. Assume that
V
i
= {v
1
, v
2
, ..., v
l
}. We use
i
j
C
(v) (j < i) to represent
C
j
(v
1
) ∪ ... ∪ C
j
(v
l
).
This phase is almost the same as Chen’s. But for
each node v at a level, we also use B
j
(v) to represent
a set of links with each pointing to one of v’s
parents, which appears in V
j
.
In the second phase, a series of (undirected)
bipartite graphs (Asrtian
et al., 1998; Hopcroft et al.,
1973) will be constructed. In this process, some
virtual nodes may be introduced into the levels V
i
(i
= 1, ..., h - 2). Especially, we distinguish between
two kinds of virtual nodes. One is the virtual nodes
created for actual nodes; and the other is the virtual
nodes generated for virtual nodes. They will be
handled differently.
•
•
•
a
c
e
(1 1)
(2 2)(3 3)
(1 2)
(2 _3)(3, _)
(1, 3)
(2, _)(3, _)
•
•
•
h
i
(2, 1)
(1, 2)(3, 3)
(2, 2)
(1, 2)(3, 3)
(2, 3)
(1, _)(3, _)
h
i
(3, 1)
(1, _)(2, 3)
(3, 2)
1, 3
2,
(3, 3)
(_, _)(2, _)
1 2 3
1 2 3
2 2 3
2 3 -
- 3 -
3 - -
2 1 3
- 3 1
3 - 1
- - 3
a
b
c
d
e
f
g
h
i
a chain
(a) (b)
ICEIS 2009 - International Conference on Enterprise Information Systems
18