{takes, supervisor,teaches}, Γ = {t
1
,t
2
,t
3
}, and
δ(t
1
) = (takes :: t
2
)
∗
k (supervisor :: t
3
)?,
δ(t
2
) = ε,
δ(t
3
) = (teaches :: t
2
)
∗
.
In RBE, a :: t matches an edge e if the label of e is a
and the target node of e is of type t. Thus, assuming
that each node in Fig. 3 is of the type colored in red,
the type of each node v
i
matches the outgoing edges
of v
i
. Thus G is a valid graph of S.
For queries q
1
,q
2
and a ShEx schema S, q
2
contains q
1
over S if for any valid graph G of S,
Ans(q
1
,G) ⊆ Ans(q
2
,G).
3 ALGORITHM
Our algorithm is essentially based on the node corre-
spondence between q
1
and q
2
. Such a correspondence
may already be known in some cases, e.g., compar-
ing an updated query and its original one. But this is
not always the case. Thus, our algorithm firstly finds
a node correspondence between q
1
and q
2
(Sec. 3.1)
if necessary, and then under the obtained node corre-
spondence, the algorithm checks the containment of
q
1
and q
2
(Sec. 3.2).
3.1 Finding Node Correspondence
We assume that the size of output tuples of q
1
and q
2
are identical (otherwise q
1
and q
2
are incomparable),
and thus we can identify the correspondence between
the output nodes of q
1
and q
2
. Thus, in the follow-
ing we consider finding a correspondence of between
their non-output nodes. This is done by the following
steps.
1. Let S be a ShEx schema. By using S, we identify
the type(s) of each node in q
1
and q
2
. This is done
by an extension of an algorithm for checking satis-
fiability of pattern queries (Matsuoka and Suzuki,
2020) (details are omitted because of space limi-
tations).
2. By comparing the type(s) of each node obtained
in step (1), we find correspondence(s) between
the nodes of q
1
and q
2
.
1
Their correspondences
are found from the output node of q
1
in the order
of connection. Two nodes do not correspond to
each other even if they are of the same type, when
there is no correspondence between their adjacent
1
If more than one type is associated with a node in step
(1), then we examine each of them one by one. Thus in each
correspondence every node is associated with one type.
nodes. For example, u
2
of q
1
corresponds to v
1
of
q
2
in Fig 4. This is because x
1
, an adjacent node
of u
2
, corresponds to y
1
, an adjacent node of v
1
.
On the other hand, u
2
of q
1
cannot correspond to
v
2
of q
2
in Fig 4 because there is no correspon-
dence between their adjacent nodes. For each ob-
tained correspondence, we compute a maximum
common edge subgraph of q
1
and q
2
under the
correspondence. The problem is NP-hard, but the
types of nodes obtain in step (1) can reduce the
search space of the problem.
3. Among the correspondences obtained in step (2),
output the correspondence that yields the maxi-
mum edge common subgraph of q
1
and q
2
.
Node correspondence is expressed by function
µ(). Let u ∈ V (q
1
). We write µ(u) = v (and we also
write µ(v) = u) if u corresponds to v ∈ V (q
2
), and
µ(u) = v
nil
if there is no node corresponding to u,
where v
nil
is a new node not in V (q
2
). For an edge
e = (v,a,v
0
) in q
2
, (µ(v), a, µ(v
0
)) is called the corre-
sponding edge of e in q
1
.
We explain the above steps (1) to (3) by an exam-
ple. Consider queries q
1
and q
2
shown in Fig. 4, and
suppose that in step (1) the type of each node is ob-
tained as shown in the figure. Consider step (2). By
the assumption, x
1
and x
2
of q
1
correspond to y
1
and
y
2
of q
2
, respectively, but no correspondence for the
other nodes is known. Since u
2
is of type t
2
, u
2
may
correspond to v
1
and v
2
. However, by checking edges
adjacent to u
2
it is impossible for u
2
to correspond to
v
2
, and we know that u
2
is able to correspond to only
v
1
. Since u
1
is of type t
3
but there is no node of type t
3
except y
2
, we know that there is no node correspond-
ing to u
1
. Similarly, there is no node corresponding
to v
2
. Therefore, we have µ(x
1
) = y
1
, µ(x
2
) = y
2
,
µ(u
2
) = v
1
, and µ(u
1
) = v
nil
(and µ(v
2
) = u
nil
). Based
on this correspondence, we create adjacency matri-
ces of q
1
and q
2
(Fig. 5). There are three elements
(colored red) appearing in both matrices at the same
position, meaning that we have three common edges
between q
1
and q
2
under the correspondence.
In this case we have only one correspondence, but
in general there may be more than one correspon-
dence between two queries. In such a case, for each
possible correspondence with each node associated
with one type, we compute the size of the common
edge subgraph under the correspondence, and choose
the maximum one among them.
3.2 Checking Containment
Let G be a graph. By u(G) we mean the undirected
graph obtained by replacing each directed edge of G
with an undirected one. A subgraph G
0
of G is weakly
WEBIST 2022 - 18th International Conference on Web Information Systems and Technologies
280