Let q be a query and S be a ShEx schema. In
the above operations, add lt(), add opr(), del opr(),
change opr(), add type() do not affect q in that q
remains “valid” against the update schema of S. On
the other hand, del lt(), change lt(), del type() may
affect q, i.e., q may become “invalid” under the up-
dated schema of S in that q may lost some part of
answers that were obtained under S. Thus, our al-
gorithm shown below transforms q when del lt(),
change lt(), or del type() is applied to S.
We finally note that when a ShEx schema S is up-
dated, the data under S must also be updated accord-
ing to the schema update. Thus we have developed a
method for updating data according to schema update
(details are omitted).
3.2 Algorithm
Our algorithm consists of Algorithms 1 and 2. Algo-
rithm 1 is the main part of our algorithm. For a given
update script s = op
1
.op
2
. · ·· .op
n
on ShEx schema
S and start type t
s
, the algorithm transforms a given
query q according to s. Let G
S
be the schema graph
of S, and let G
S
(q, t
s
) be the traversal area of q from
t
s
(lines 1 and 2). First, we take copies H
S
and G
0
S
of
G
S
(q, t
s
) and G
S
, respectively (line 3). Then for each
operation op
i
of s the algorithm modifies H
S
accord-
ing to op
i
(lines 4 to 27), and converts H
S
to trans-
formed query q
0
(line 28). The for loop in lines 4 to
27 proceeds as follows. The algorithm does nothing
if op
i
does not affect the traversal area H
S
(lines 5 to
7). Otherwise, H
S
(and G
0
S
) is modified according to
op
i
in lines 8 to 26, as follows.
• Lines 8 to 14 deal with change lt(t, i, l
0
:: t
0
). This
operation changes label::type pair l
i
:: t
i
of δ(t) at
position i to l
0
:: t
0
. According to this, we replace
edge (t, l
i
, t
i
) with (t, l
0
, t
0
) in H
S
and G
0
S
. If t
i
= t
0
,
then we are done. Otherwise, since t
i
is changed
to t
0
, a path from t
s
to some accepting node via
t
i
may be disconnected by this change. To repair
this, we find a set of simple paths P from t
0
to t
i
in
G
0
S
by FindPaths and add each path p ∈ P to H
S
to
connect t
i
and t
0
.
Here, for given types t, t
0
, FindPaths (not shown)
is a method for finding the set P of simple paths
p from t to t
0
over G
0
S
with inverse edge traversal
allowed. But if the length of every simple path
p exceeds a given threshold, FindPaths also tra-
verses paths from t to the neighbours of t
0
and if
shorter simple path(s) is found, then the FindPaths
reports the shorter paths instead of P.
• Lines 15 to 19 deal with del lt(t, i). This opera-
tion deletes the label::type pair l
i
:: t
i
at position i
of δ(t). According to this, we delete edge (t, l
i
, t
i
)
from H
S
and G
0
S
. By this edge deletion t and t
i
may be disconnected, thus we find paths from t to
t
i
over G
0
S
by FindPaths and add the paths to H
S
.
• Lines 20 to 26 deal with del type(t). This opera-
tion deletes type t from S. Thus t and every edge
incident to t is deleted from H
S
and G
0
S
. To repair
this, we find the set T
s
of nodes outgoing to t and
the set T
g
of nodes incoming from t, and then find
paths from T
s
to T
g
and add the paths to H
S
.
In line 28, ConstructPropertyPath (Algorithm 2) con-
verts H
S
to new query q
0
. This is done by regard-
ing H
S
as an NFA M with start state t
s
and the set
Ans(G
S
(q, t
s
)) of accept states (line 2), constructing a
DFA M
0
equivalent to M (line 3), and then converting
M
0
into a query q
0
(line 4). The conversion from M
0
to q
0
is done by using an extension of the state elimi-
nation method for DFA.
4 PRELIMINARY EXPERIMENT
In this section, we present the result of our prelimi-
nary evaluation experiment. We applied our algorithm
to several queries in order to examine if the trans-
formed queries show “good” behaviour in the sense
that the answers of the original queries are maintained
after schema update.
The data used in this experiment is Japanese Text-
book LOD (Egusa and Takaku, 2018a; Egusa and
Takaku, 2018b). Here, Japanese Textbook LOD is
RDF data compiled from a collection of textbooks
that has been organized over the years by NIER Edu-
cation Library and Textbook Research Center Library.
The data structure of Japanese Textbook LOD is illus-
trated in Fig. 4. Japanese Textbook LOD consists of
233,001 triples of the Turtle format. The data size is
12MB.
In this experiment, we manually created five
queries and short schema updates shown in Table. 1.
We transformed each query by the algorithm (and the
data is also transformed according to the schema up-
date), executed the original and transformed queries
over the original and updated data, respectively, and
calculated the recall, precision and F-measure values.
Let q be a query, q
0
be the transformed query of q, and
Ans(q) be the set of obtained answer nodes of q. The
recall of q
0
w.r.t. q is defined as follows.
recall(q, q
0
) =
|Ans(q) ∩ Ans(q
0
)|
|Ans(q)|
.
Similarly, the precision of q
0
w.r.t. q is defined as fol-
lows.
precision(q, q
0
) =
|Ans(q) ∩ Ans(q
0
)|
|Ans(q
0
)|
.
Transforming Property Path Query According to Shape Expression Schema Update
295