Transforming Property Path Query

According to Shape Expression Schema Update

Goki Akazawa

∗

, Naoto Matsubara

†

and Nobutaka Suzuki

‡

University of Tsukuba, Tsukuba, Japan

Keywords:

ShEx, Property Path, Schema Update.

Abstract:

Suppose that we have a query q under schema S and then S is updated. Then we have to update q according

to the update of S, since otherwise q no longer reports correct answer. However, updating q manually is often

a difﬁcult and time-consuming task since users do not fully understand the schema deﬁnition or are not aware

of the details of schema update. In this paper, we consider transforming queries automatically according to

schema update. We focus on Shape Expression (ShEx) and Property Path as schema and query language,

respectively, and we take a structural approach to transform Property Path query. For a Property Path query q

and a schema update op to an ShEx schema S, our algorithm checks how op affects the structure of q under S,

and transforms q according to the result.

1 INTRODUCTION

Schema plays an important role in management of

various kinds of data, and the importance holds for

RDF/graph data as well. Since user requirements to

RDF data may change over time, schema tends to be

updated continuously to meet the requirements. Here,

suppose that we have a query q written for data under

schema S, then S is updated, and that q is (re)executed

after the update. Such a situation often arises, e.g., (a)

q is embedded in a program code and the code is ex-

ecuted after a schema update, (b) q is recorded in a

user’s history and she/he tries to use q again, and so

on. In such cases, we have to update q according to

the update of S, since otherwise q no longer reports

correct answer. However, updating q manually is of-

ten a difﬁcult and time-consuming task, since users do

not fully understand the schema deﬁnition or are not

aware of the details of schema update.

To address the problem, in this paper we con-

sider transforming queries automatically according

to schema update. We focus on Shape Expression

(ShEx) (Baker and Prud’hommeaux, 2019) and Prop-

erty Path as schema and query language, respectively.

Here, ShEx is a novel schema language for RDF and

is already used in a number of areas (Thornton et al.,

2019). For RDF data, there is another schema lan-

guage, called SHACL (Knublauch and Kontokostas,

2017), but it has some differences. SHACL schema

description tends to be more complicated due to its

strict deﬁnition. On the other hand, ShEx has higher

readability and is easy to handle although the vocab-

ulary has some limitation. In addition, recursion is

formally supported in the ShEx speciﬁcation, but not

in that of SHACL (depending on the implementation).

As for query language, Property Path is a well-known

path query language included in SPARQL 1.1. In

this paper, we ﬁrst deﬁne update operations to ShEx

schema, then propose an algorithm for transforming

a given query into a new query according to schema

update.

In this paper we take a structural approach to

transform query. For a query q and an update oper-

ation(s) op to ShEx schema S, our algorithm checks

how op affects the structure of S, examines how the

changes to S affects the structure of q, and then trans-

forms q into new query q

according to the result.

Here, it is desirable that the transformed query q

pre-

serves the behaviour of q as much as possible, i.e.,

the answer of q

should be as close to that of its origi-

nal query q as possible. To examine the effectiveness

of our structural approach, we made a small prelimi-

nary experiment. The result suggests that transformed

queries obtained by our algorithm show rather good

behaviors on this respect.

1.1 Related Work

For XML documents, a number of studies on schema

updates have been made so far. Guerrini et al. pro-

292

Akazawa, G., Matsubara, N. and Suzuki, N.

Transforming Property Path Query According to Shape Expression Schema Update.

DOI: 10.5220/0010173002920298

In Proceedings of the 16th International Conference on Web Information Systems and Technologies (WEBIST 2020), pages 292-298

ISBN: 978-989-758-478-7

posed update operations that assures any updated

schema contains its original schema so that docu-

ments under an original schema remains valid under

its updated schema (Guerrini et al., 2005). Junedi et

al. studied query-update independence analysis and

showed that the performance of (Benedikt and Ch-

eney, 2010) can be drastically enhanced in the use of

µ-calculus (Junedi et al., 2012). Oliveira et al. pro-

posed an algorithm for detecting possible problems

affecting XQuery code according to XML Schema

update (Oliveira et al., 2012). Wu et al. proposed

an algorithm for correcting XSLT stylesheet accord-

ing to DTD update (Wu and Suzuki, 2016).

For RDF/graph schema update, Chirkova and

Fletcher proposed a model of RDF schema evolu-

tion (Chirkova and Fletcher, 2009) but no query trans-

formation was considered. Bonifati et al. discussed

evolution of property graph schema by using graph

rewriting operations (Bonifati et al., 2019).

To the best of our knowledge, however, no stud-

ies on transforming Property Path query according to

ShEx schema update have been made so far.

The rest of this paper is organized as follows. Sec-

tion 2 gives some preliminary deﬁnitions. Section 3

shows some operations to types of ShEx schema and

our algorithm. Section 4 presents the result of our pre-

liminary evaluation experiment. Section 5 shows our

conclusion.

2 PRELIMINARIES

Let Σ be a set of labels. A labeled directed graph

(graph for short) is denoted G = (V, E), where V is a

set of nodes and and E ⊆ V × Σ ×V is a set of edges.

Let e ∈ E be an edge labeled by l ∈ Σ from a node

v ∈ V to a node v

∈ V . Then e is denoted (v, l, v

), v is

called source, and v

is called target.

Unlike XML documents, in RDF/graph data the

order among sibling nodes are less signiﬁcant. Thus

ShEx uses regular bag expression (RBE) to represent

content model of type (Staworko et al., 2015). RBE

is deﬁned similarly to regular expression except that

RBE uses unordered concatenation instead of ordered

concatenation. Let Γ be a set of types. Then RBE over

Σ × Γ is recursively deﬁned as follows.

• ε and a :: t ∈ Σ × Γ are RBEs.

• If r

, r

, · ·· , r

are RBEs, then r

|· ·· |r

is an

RBE, where | denotes disjunction.

• If r

, r

, · ·· , r

are RBEs, then r

k r

k ·· · k r

an RBE, where k denotes unordered concatena-

tion.

• If r is an RBE, then r

[n,m]

is an RBE, where n ≤ m.

In particular, r

= r

[0,1]

, r

∗

= r

[0,∞]

, and r

= r

[1,∞]

For example, let r = a :: t

k (b :: t

|c :: t

) be an RBE.

Since k is unordered, r matches not only a :: t

b :: t

and a :: t

c :: t

but also b :: t

a :: t

and c :: t

a :: t

A ShEx schema is denoted S = (Σ, Γ, δ), where

Γ is a set of types and δ is a function from Γ to

the set of RBEs over Σ × Γ. For example, let S =

(Σ, Γ, δ) be a ShEx schema, where Σ = {a, b, c}, Γ =

, t

}, and

δ(t

) = a :: t

k b :: t

k (c :: t

)

∗

δ(t

) = b :: t

|c :: t

δ(t

) = c :: t

δ(t

) = ε,

δ(t

) = a :: t

For example, consider the graph G shown in

Fig. 1(left). In RBE, a :: t matches an edge e if the

label of e is a and the target node of e is of type t.

Thus, assuming that each node v

is of type t

, δ(t

)

matches the outgoing edges of v

. Then it is easy to

verify that G is a valid graph of S.

The schema graph of a ShEx schema S =

(Σ, Γ, δ) is a graph G

= (V

, E

), where V

= Γ and

= {(t, a, t

) | δ(t) contains a :: t

}. For example,

Fig. 1(right) shows the schema graph of S.

Property Path query (query for short) over Σ is de-

ﬁned as follows.

• ε and any a ∈ Σ is a query. Here, query a matches

an edge labeled by a.

• ∗ is a “wildcard” query, which matches any edge.

• For a set of labels {a

, a

, · ·· , a

!{a

, a

, · ·· , a

} is a query. Here, ! denotes

negation and this query matches an edge whose

label is not in {a

, a

, · ·· , a

• For a label a ∈ Σ, a

−1

is a query, which matches

the inverse of an edge labeled by a.

• For queries q

, q

, · ·· , q

, q

. · ·· .q

and

|· ·· |q

are queries. The former matches a

path p = p

. p

. · ·· . p

if q

matches subpath p

for

every 1 ≤ i ≤ k. The latter matches a path p if one

of q

, q

, · ·· , q

matches p.

• For a query q, q

∗

is a query. This query matches a

path p = p

. p

. · ·· . p

if q matches subpath p

for

every 1 ≤ i ≤ k (k ≥ 0).

In this paper, we focus on single source query traver-

sal. For a graph G, query q, and node v

, the answer

of q from v

over G is a set of nodes v such that G

contains a path from v

to v whose sequence of labels

is matched by q. For example, if q = a

−1

.!{a, b}, then

the answer of q from v

over G in Fig.1 is {v

Transforming Property Path Query According to Shape Expression Schema Update

293

Figure 1: Valid graph G and schema graph of S.

Let G

= (V

, E

) be the schema graph of S, q

be a query, and t be a type of S. By G

(q, t) we

mean the traversal area of q from t over G

, that is,

the subgraph of G

traversed by q from t over G

For example, let G

be the schema graph shown in

Fig. 1(right), q = b.(c

−1

)

∗

.(a|b). Then G

(q, t

) is

shown in Fig. 2. By Ans(G

(q, t)), we mean the “an-

swer” types of G

(q, t)), i.e., the “answer” types ob-

tained by traversing q from t over G

. For example,

in Fig. 2 Ans(G

(q, t

)) = {t

, t

Figure 2: Traversal area G

(q, t

3 QUERY TRANSFORMATION

In this section, we ﬁrst deﬁne operations to types

of ShEx schema. Then we present an algorithm for

transforming a given query according to schema up-

date.

3.1 Operation on Types

To represent schema update, we introduce update op-

erations (operations for short) to types. First, we in-

troduce tree representation of type. To identify the

position of each node, an id based on Dewey ordering

is given to each node. For example, let

δ(t

) = a :: t

∗

k (b :: t

|c :: t

Then the tree representation of t

is shown in Fig. 3.

The id associated with each node is the position of the

node.

We deﬁne the following eight operations to types

of ShEx schema. Let t be a type of a ShEx schema S.

• Changing label::type pair of type:

– add

l t(t, i, l

:: t

): this operation adds la-

bel::type pair l

:: t

to δ(t) at position i, where

Figure 3: Tree representation of t

i is a Dewey order. This operation corresponds

to adding an edge (t, l

, t

) to the schema graph

of S.

– del lt(t, i): this operation deletes label::type

pair at position i of δ(t). Let l

:: t

be the pair

to be deleted. Then this operation corresponds

to deleting an edge (t, , t

, l

) from the schema

graph of S.

– change lt(t, i, l

:: t

): this operation replaces la-

bel::type pair at position i of δ(t) with l

:: t

Let l

:: t

be the pair to be replaced. Then

this operation corresponds to replacing an edge

(t, l

, t

) with an edge (t, l

, t

) in the schema

graph of S.

• Changing operator (|, k, [n, m]) of type:

– add opr(t, i, op): this operation adds an opera-

tor op to to δ(t) at position i.

– del opr(t, i): this operation deletes the opera-

tion at position i from δ(t).

– change opr(t, i, op): this operation replaces the

operation at position i of δ(t) with op.

• Adding/deleting type of schema:

– add type(t): this operation adds a new type t

to S. Initially, δ(t) = ε.

– del type(t): this operation deletes type t from

An update script is a sequence s = op

·· ·op

operations. For example, consider t

in Fig. 3 and let

s = change lt(t

, 1.2, k)add lt(t

, 1.1, d :: t

)

be an update script. By applying s to t

, we obtain

δ(t

) = d :: t

k a :: t

∗

k (b :: t

k c :: t

WEBIST 2020 - 16th International Conference on Web Information Systems and Technologies

294

Let q be a query and S be a ShEx schema. In

the above operations, add lt(), add opr(), del opr(),

change opr(), add type() do not affect q in that q

remains “valid” against the update schema of S. On

the other hand, del lt(), change lt(), del type() may

affect q, i.e., q may become “invalid” under the up-

dated schema of S in that q may lost some part of

answers that were obtained under S. Thus, our al-

gorithm shown below transforms q when del lt(),

change lt(), or del type() is applied to S.

We ﬁnally note that when a ShEx schema S is up-

dated, the data under S must also be updated accord-

ing to the schema update. Thus we have developed a

method for updating data according to schema update

(details are omitted).

3.2 Algorithm

Our algorithm consists of Algorithms 1 and 2. Algo-

rithm 1 is the main part of our algorithm. For a given

update script s = op

.op

. · ·· .op

on ShEx schema

S and start type t

, the algorithm transforms a given

query q according to s. Let G

be the schema graph

of S, and let G

(q, t

) be the traversal area of q from

(lines 1 and 2). First, we take copies H

and G

(q, t

) and G

, respectively (line 3). Then for each

operation op

of s the algorithm modiﬁes H

accord-

ing to op

(lines 4 to 27), and converts H

to trans-

formed query q

(line 28). The for loop in lines 4 to

27 proceeds as follows. The algorithm does nothing

if op

does not affect the traversal area H

(lines 5 to

7). Otherwise, H

(and G

) is modiﬁed according to

in lines 8 to 26, as follows.

• Lines 8 to 14 deal with change lt(t, i, l

:: t

). This

operation changes label::type pair l

:: t

of δ(t) at

position i to l

:: t

. According to this, we replace

edge (t, l

, t

) with (t, l

, t

) in H

and G

. If t

= t

then we are done. Otherwise, since t

is changed

to t

, a path from t

to some accepting node via

may be disconnected by this change. To repair

this, we ﬁnd a set of simple paths P from t

to t

by FindPaths and add each path p ∈ P to H

connect t

and t

Here, for given types t, t

, FindPaths (not shown)

is a method for ﬁnding the set P of simple paths

p from t to t

over G

with inverse edge traversal

allowed. But if the length of every simple path

p exceeds a given threshold, FindPaths also tra-

verses paths from t to the neighbours of t

and if

shorter simple path(s) is found, then the FindPaths

reports the shorter paths instead of P.

• Lines 15 to 19 deal with del lt(t, i). This opera-

tion deletes the label::type pair l

:: t

at position i

of δ(t). According to this, we delete edge (t, l

, t

)

from H

and G

. By this edge deletion t and t

may be disconnected, thus we ﬁnd paths from t to

over G

by FindPaths and add the paths to H

• Lines 20 to 26 deal with del type(t). This opera-

tion deletes type t from S. Thus t and every edge

incident to t is deleted from H

and G

. To repair

this, we ﬁnd the set T

of nodes outgoing to t and

the set T

of nodes incoming from t, and then ﬁnd

paths from T

to T

and add the paths to H

In line 28, ConstructPropertyPath (Algorithm 2) con-

verts H

to new query q

. This is done by regard-

ing H

as an NFA M with start state t

and the set

Ans(G

(q, t

)) of accept states (line 2), constructing a

DFA M

equivalent to M (line 3), and then converting

into a query q

(line 4). The conversion from M

to q

is done by using an extension of the state elimi-

nation method for DFA.

4 PRELIMINARY EXPERIMENT

In this section, we present the result of our prelimi-

nary evaluation experiment. We applied our algorithm

to several queries in order to examine if the trans-

formed queries show “good” behaviour in the sense

that the answers of the original queries are maintained

after schema update.

The data used in this experiment is Japanese Text-

book LOD (Egusa and Takaku, 2018a; Egusa and

Takaku, 2018b). Here, Japanese Textbook LOD is

RDF data compiled from a collection of textbooks

that has been organized over the years by NIER Edu-

cation Library and Textbook Research Center Library.

The data structure of Japanese Textbook LOD is illus-

trated in Fig. 4. Japanese Textbook LOD consists of

233,001 triples of the Turtle format. The data size is

12MB.

In this experiment, we manually created ﬁve

queries and short schema updates shown in Table. 1.

We transformed each query by the algorithm (and the

data is also transformed according to the schema up-

date), executed the original and transformed queries

over the original and updated data, respectively, and

calculated the recall, precision and F-measure values.

Let q be a query, q

be the transformed query of q, and

Ans(q) be the set of obtained answer nodes of q. The

recall of q

w.r.t. q is deﬁned as follows.

recall(q, q

) =

|Ans(q) ∩ Ans(q

|Ans(q)|

Similarly, the precision of q

w.r.t. q is deﬁned as fol-

lows.

precision(q, q

) =

|Ans(q) ∩ Ans(q

|Ans(q

Transforming Property Path Query According to Shape Expression Schema Update

295

Figure 4: Data structure of Japanese Textbook LOD.

Table 1: Original query and update script.

No. (a) original query and (b) update script

(a) catalogue.school

(b) del lt(Textbook, 6) add lt(Textbook, 1, sub jectType :: Sub jectType)

(a) catalogue

−1

.publisher

−1

.curriculum.hasSubjectArea.hasSubject

(b) del type(Publisher) del lt(Catalogue, 1)

(a) curriculum

−1

.school

(b) del lt(Textbook, 5) del type(Sub jectArea)

(a) (catalogue | subjectArea).school

(b) del lt(Textbook, 6) add lt(Sub jectType, 3, hasSub ject :: Sub ject)

(a) subjectArea

−1

.curriculum.hasSubjectArea.hasSubject.school

(b) del type(Sub ject) change lt(CurriculumGuideline, 1, version :: Version)

Table 2 lists the transformed queries for the orig-

inal queries and their recall, precision and F-measure

values. The average F-measure of the ﬁve queries

is 0.87, and thus the transformed queries showed

rather good behaviors overall. However, the ﬁrst,

second, and fourth transformed queries missed some

correct answers, especially the second one. A rea-

son for this is as follows. Japanese Textbook LOD

schema contains many edges associated with “*” or

“?”. Such edges are “optional” and their correspond-

ing edge may not appear in the RDF data. There-

fore, if transformed query contains a label of such

“optional” edges, the answers obtained by the trans-

formed queries do not coincide with those of their

original queries. In the experiment, “hasSubject” of

the second query has such optional edges.

The results show some potential of our approach,

however, the queries and schema updates used in the

experiment are very limited and we need to conduct

more experiments by using more queries and update

operations. This is left as a future work.

5 CONCLUSION

In this paper, we ﬁrst deﬁned update operations to

ShEx schema, and then proposed an algorithm for

transforming a given query into a new query accord-

ing to schema update. We made a small preliminarily

experiment and the results showed that queries trans-

formed by our algorithm shows good behaviour in

that their answers were close to that of the original

queries.

However, we have to some works to do. First,

the dataset used in our experiment is limited. Thus

we need to conduct more experiments with a variety

kinds of datasets. In the experiment, each schema up-

date consists of only two update operations. However,

we need to examine schema update consisting of more

update operations in order to reﬂect real schema up-

date situations. Moreover, there are some ShEx ele-

ments missing in our paper, e.g., negation. Thus we

plan to consider more broader class of ShEx schema.

WEBIST 2020 - 16th International Conference on Web Information Systems and Technologies

296

Table 2: Transformed Query and Recall, Precision, F-measure.

No. Transformed Query recall precision F-measure

1 publisher.catalogue.school 0.88 0.99 0.93

2 catalogue

−1

.curriculum.hasSubjectArea.hasSubject 0.70 0.50 0.59

3 curriculum

−1

.publisher

∗

.catalogue.school 1.00 1.00 1.00

4 (publisher.catalogue | subjectArea).school 0.88 0.77 0.82

5 (subjectArea

−1

.curriculum.hasSubjectArea)

∗

.subjectType

∗

.school 1.00 1.00 1.00

average of ﬁve queries 0.89 0.85 0.87

Algorithm 1: Query Transformation.

Input: ShEx schema S = (Σ, Γ, δ), update script s =

·· ·op

to S, query q, type t

∈ Γ

Output: query q

1: construct the schema graph G

of S

2: construct the traverse area G

(q, t

) of q from t

on G

3: H

← G

(q, t

); G

← G

4: for i = 1, 2, ··· , n do

5: if op

does not affect H

then

6: continue

7: end if

8: if op

= change lt(t, i, l

:: t

) then

9: let l

:: t

be the label::type pair at position i

of δ(t)

10: replace (t, l

, t

) with (t, l

, t

) in H

and G

11: if t

6= t

then

12: P ← FindPaths(G

, t

)

13: add all p ∈ P to H

14: end if

15: else if op

= del lt(t, i) then

16: let l

:: t

be the label::type pair at position i

of δ(t)

17: delete (t, l

, t

) from H

and G

18: P ← FindPaths(G

, t, t

)

19: add all p ∈ P to H

20: else if op

= del type(t) then

21: T

←{t

|(t

,l,t) is an edge from t

to t in G

22: T

←{t

| (t,l,t

) is an edge from t to t

in G

23: delete t and every edge adjacent to t from H

and G

24: P ← {p | p ∈ FindPaths(G

, t

), t

∈

, t

∈ T

}

25: add all p ∈ P to H

26: end if

27: end for

28: q

← ConstructPropertyPath(H

, ans(G

(q,t

)))

29: return q

Algorithm 2: ConstructPropertyPath.

Input: traversal area H

, start type t

, set of types Ans

Output: query q

1: let V and E be the sets of nodes and edges of H

respectively

2: construct an NFA M = (Q, Σ, δ, t

, Ans), where

Q = V and δ is a transition function s.t. δ(t, a) = t

iff (t, a, t

) ∈ E

3: construct a DFA M

equivalent to M

4: construct a query q

from M

5: return q

REFERENCES

Baker, T. and Prud’hommeaux, E. (2019).

Shape expressions (ShEx) primer.

http://shexspec.github.io/primer/.

Benedikt, M. and Cheney, J. (2010). Destabilizers and in-

dependence of XML updates. Proc. VLDB Endow.,

3(1-2):906–917.

Bonifati, A., Furniss, P., Green, A., Harmer, R., Oshurko,

E., and Voigt, H. (2019). Schema validation and evo-

lution for graph databases. In Conceptual Modeling,

pages 448–456.

Chirkova, R. and Fletcher, G. H. (2009). Towards well-

behaved schema evolution. In Proc. 12th Interna-

tional Workshop on the Web and Databases (WebDB

2009).

Egusa, Y. and Takaku, M. (2018a). Building and publishing

japanese textbook linked open data. The journal of

Information Science and Technology, 68(7):361–367.

Egusa, Y. and Takaku, M. (2018b). Japanese textbook LOD.

https://jp-textbook.github.io/en/about.

Guerrini, G., Mesiti, M., and Rossi, D. (2005). Impact of

XML schema evolution on valid documents. In Proc.

WIDM, pages 39–44.

Junedi, M., Genev

es, P., and Laya

ıda, N. (2012). XML

query-update independence analysis revisited. In

Proc. ACM DocEng’12, pages 95–98.

Knublauch, H. and Kontokostas, D. (2017).

Shapes constraint language (SHACL).

https://www.w3.org/TR/shacl/.

Oliveira, R., Genev

es, P., and Laya

ıda, N. (2012). Toward

automated schema-directed code revision. In Pro-

Transforming Property Path Query According to Shape Expression Schema Update

297

ceedings of the 2012 ACM Symposium on Document

Engineering, DocEng ’12, page 103–106.

Staworko, S., Boneva, I., Gayo, J. E. L., Hym, S.,

Prud’hommeaux, E. G., and Solbrig, H. R. (2015).

Complexity and expressiveness of ShEx for RDF.

In Proceedings of 18th In-ternational Conference on

Database Theory (ICDT 2015), page 17p.

Thornton, K., Solbrig, H., Stupp, G. S., Labra Gayo, J. E.,

Mietchen, D., Prud’hommeaux, E., and Waagmeester,

A. (2019). Using shape expressions (ShEx) to share

RDF data models and to guide curation with rigorous

validation. In Hitzler, P., Fern

andez, M., Janowicz,

K., Zaveri, A., Gray, A. J., Lopez, V., Haller, A., and

Hammar, K., editors, In Proceedings of the European

Semantic Web Conference, pages 606–620.

Wu, Y. and Suzuki, N. (2016). An algorithm for correct-

ing xslt rules according to dtd updates. In Proceed-

ings of the 4th International Workshop on Document

Changes: Modeling, Detection, Storage and Visual-

ization, DChanges ’16.

WEBIST 2020 - 16th International Conference on Web Information Systems and Technologies

298