new path. In the following, it will be explained how
the level of a vertex is calculated and afterwards how
the path cover is generated.
The array level stores the level of each vertex v,
i.e. level[v] = level(v). The queue Q
l
contains all ver-
tices of level l. Initially l is 0, Q
l
= {s} and level[s] =
0 because the source is the only vertex in level 0. Fur-
thermore, there is an array d
in
, which initially stores
the number of incoming edges for each vertex. When
a vertex v at level l is encountered, d
in
[w] is decre-
mented by one for each successor w of v. If d
in
[w] = 0,
then all predecessors of w have been visited before.
In this case, level(w) = level(v) + 1 = l + 1 because
v has highest level among all the predecessors of w.
Lines 23-24 of Algorithm 1 handle this case by set-
ting level[w] = l + 1 and adding w to the queue Q
l+1
.
When all vertices of level l have been dealt with, the
algorithm will start the next iteration with the vertices
of the level l + 1 (lines 25-27). With this method, the
vertices are visited level by level. We now explain
how the algorithm covers the vertices with paths.
The counter p stores the number of paths used so
far (upon termination, p is the size of the path cover)
and for each path identifier i, 1 ≤ i ≤ p, there is a
list of vertices path[i] in the array path. In the first
iteration, the algorithm will add the path path[1] = [s].
This means that the first path consists of the vertex s.
Moreover, the algorithm calculates a set R
v
for
each vertex v, which contains the identifiers of all
candidate paths for v, i.e. all paths that can poten-
tially cover v. A path that covers v must be selected
from this set R
v
. In the following, we choose min(R
v
)
to be that path. A condition for a path i to be in
R
v
is that i was a candidate for a predecessor u of v
i.e. i ∈ R
u
. It follows as a consequence that R
v
is a
subset of
S
(u,v)∈E
R
u
, but in general the sets are not
equal. For example, a path i ∈
S
(u,v)∈E
R
u
might have
already been used to cover a vertex w 6= v at level
l = level(v). We use a case distinction to find the can-
didate paths. Let i ∈ R
u
, where u is a predecessor of
v, and let last be the last element of path[i]. If (a)
level[last] < level[u], then i was a candidate path for
u that was neither chosen to cover u nor one of its suc-
cessors. Hence it is a candidate path for v. If (b) path i
ends at u (i.e. last = u), then i is also a candidate path
for v.
Algorithm 1 calculates R
v
by the case distinction
described above (lines 8-11). If R
v
is empty after this
step, none of the existing paths can be used to cover
v. In this case, the algorithm adds a new path with
identifier i
new
= p + 1 and uses this to cover v (lines
13-15). Otherwise, if R
v
is non-empty after this step,
the path with identifier i = min(R
v
) is chosen to cover
v and path[i] is updated accordingly (lines 17-19). To
this end, one must find a path P (implemented as a list
of vertices) from the last vertex last of path i to v. This
is done be tracing a path from v back to last. Initially,
the list P just contains v. Then vertices are added at
the front of the list until the vertex last is reached.
A vertex u can extend the path P at the front if (a)
it is a predecessor of the current front, (b) i ∈ R
u
, and
(c) level[u] > level[last]. If we repeat this process, we
will ultimately reach last because the condition at line
10 of Algorithm 1 ensures that, among all vertices at
level level(last), only last can propagate the path i.
Upon termination of Algorithm 1, the paths in ar-
ray path form a path cover. However, we are solely
interested in paths from s to t. In what follows we
explain Algorithm 2, that extends the paths on both
ends. Let v
1
and v
2
be the first and the last vertex of
path[i], respectively. We have to add a subpath from s
to a predecessor of v
1
and a subpath from a successor
of v
2
to t to obtain a path from s to t. We can find
such subpaths by a depth first search in linear time, or
we can copy them from other paths. Note that path
1 is a path from s to t already, since 1 is the smallest
path identifier and will therefore be used continuously
from vertex s to t. For i 6= 1 all predecessors of v
1
are covered by paths < i because v
1
is the first vertex
covered by a path ≥ i. Likewise, all successors of v
2
are covered by paths < i because i was not chosen to
cover one of them. These observations lead to an it-
erative process for finding the subpaths that shall be
copied. When extending path i, all paths < i are paths
from s to t. Among all predecessors u of v
1
, we select
the one that is covered by the path with the smallest
identifier j. Since j < i, j is a path from s to t and we
can copy the subpath from s to u. In the same way, we
can find a path from a successor of v
2
to t. Algorithm
2 extends each path to be a path from s to t.
Algorithms 1 and 2 compute an initial path cover
in O(p m) time, where p is the size of the cover and
m is the number of edges in the DAG. This is because
every edge is considered exactly once in line 7 of Al-
gorithm 1 and the loop in line 8 of Algorithm 1 is
iterated at most p times. Moreover, the overall time
required by line 18 of Algorithm 1 and Algorithm 2
is also O(p m) because for each of the p paths each
edge is considered at most once. (If one wants to cal-
culate the minimum path cover as described above,
one could directly insert flow into the flow network
instead of storing and reporting a path.)
4.2 A Simple Heuristic
In general, Algorithms 1 and 2 do not deliver a mini-
mum path cover; see Fig. 3. This is because the order
in which vertices are covered by paths plays a key
Coordinate Systems for Pangenome Graphs based on the Level Function and Minimum Path Covers
25