A RULE-BASED SPARQL ENDPOINT WRAPPER

Mamdouh Farouk and Mitsuru Ishizuka

Creative Informatics Department, The University of Tokyo, Tokyo 1138656, Japan

Keywords: RDF, SPARQL Endpoint, SPARQL Query.

Abstract: The web of data is a web vision in which data are represented into RDF and interlinked to each other. The

fast growth of linked data motivates researchers to exploit this huge amount of well-represented data in

linked data cloud. Moreover, finding the implicit answers is important to enable web agent to understand

deeply web data. This paper proposes a SPARQL endpoint wrapper to enable original endpoint to answer

more queries and facilitate query writing for the client. The proposed wrapper uses inference rules to expand

user query. A prototype for the proposed wrapper is presented to show the effectiveness of the proposed

approach.

1 INTRODUCTION

Linked data is a way for publishing web data into a

machine understandable format. Resource

Description Framework (RDF) is used as standard

language for linked data. RDF is a W3C

recommendation that represents current web data into

machine understandable format. In this way, data will

be more useful. Moreover, since the start of linked

data project more and more people publish their data

in the linked data cloud (Bizer, 2008). Moreover, the

new web should be completely machine-

understandable (Arup, 2011).

Furthermore, the growth of well-represented

linked data cloud motivates researchers to improve

web search functionality and exploit the machine

understandable format.

On the other hand, one of the most important

tools of the Internet is web search. Almost all

Internet users use web search to find their needs.

Consequently, improving searching RDF data is an

urgent task. One challenge of searching web of data

is getting implicit answers. In other words, search

engines should go behind the raw data to understand

web data and get query answers. Moreover, there is

urgent need for smart search techniques that make

the use of the new evolution of linked data.

The main objective of converting web data to

RDF is to enable web agent to understand this data.

Linked data is a web of machine understandable data

(Correndo, 2010). However, web agents, that query

linked data, cannot deeply understand RDF data. For

example, consider a SPARQL query to get

information about authors who are interested in

semantic data representation from the corpus of

semantic web conferences, http://data.semanticweb.

org, which contains information about some

conferences in semantic web filed. Although, the

corpus contains the needed information, the user may

obtain no results. This is because a query engine

cannot get the implicit answers.

There are a lot of woke in searching RDF data.

Many of these research based on SPARQL query

language (Shady, 2010) (Olaf, 2009). SPARQL is a

query language for RDF. SPARQL is quite similar to

SQL (Axel, 2007). Moreover, there are many

SPARQL endpoints attached to linked data sets to

facilitate querying RDF data sets. The user can

submit SPARQL query to these endpoints and get the

results via http.

On the other hand, the dataset admin who

represents the original data into RDF faces a problem

of selecting RDF vocabularies because there are

many equivalent vocabularies available on the

Internet. Moreover, the client, which queries the RDF

dataset is restricted to use the same vocabularies as

RDF dataset.

This paper proposes creating a wrapper for

SPARQL endpoint in which dataset admin adds a set

of rules depending on the meaning of the dataset and

the common asked queries. The proposed wrapper

uses these rules to improve endpoint functionality.

Moreover, a related approach, which tries to

express rules and infer additional RDF data, is SPIN

(Holger, 2011). SPIN is a group of RDF properties

that can be used to express rules. These rules

729

Farouk M. and Ishizuka M..

A RULE-BASED SPARQL ENDPOINT WRAPPER.

DOI: 10.5220/0003941407290732

In Proceedings of the 8th International Conference on Web Information Systems and Technologies (WEBIST-2012), pages 729-732

ISBN: 978-989-8565-08-2

 2012 SCITEPRESS (Science and Technology Publications, Lda.)

attached to a specific ontology class and can be

applied to infer data, or modify the current data.

spin:rule property can be used to defined an

inference rule using SPARQL construct or

insert/delete.

Moreover, SPIN, which submitted to w3c to be

discussed, adds rules to ontology level. However,

our approach separates between rules level and

ontology level. Separation between ontology and

rules levels gives the user flexibility to add rules. In

other words, it is difficult for the user to update the

standard shared ontology to add his rules. Moreover,

there are many users may add rules to infer the same

property depending on their own data. The user

wants to extend his data depending on the semantics

of the data and the expected queries to be asked.

Therefore, the users have different data want to

make many rules even for the same ontology.

Attaching rules to dataset gives flexibility to the

users and avoids rules conflicts on ontology level.

As an example that will be used throughout this

paper, consider that we will start a research project in

semantic web, in which a huge amount of data should

be represented into a semantic format. Some

experiments will be done on this data to study the

future of semantic web. It is decided that Tim

Berner-Lee is the general manager for this project.

The smart web agent, which is responsible for

nominating project members, searches the linked

data could to find qualified researchers. The agent

may ask queries such as: Who are interested in

semantic web, who are interested in semantic web

and know Tim Berner-Lee, and so on.

The remainder of this paper is organized as

follows. Section 2 describes the overall system

architecture. Section 3 explains the idea of adding

user-defined rules. Section 4 describes the proposed

wrapper for SPARQL endpoint that exploits the

defined rules. The experiments and results are

discussed in section 5. Finally, section 6 provides

the conclusion of this research.

2 SYSTEM ARCHITECTURE

The proposed wrapper consists of a set of user-

defined rules and a simple inference engine to

expand SPARQL queries based on the defined rules.

Backward chaining is used to expand the queries.

The wrapper sends SPARQL queries after expansion

to the original endpoint. In addition, the wrapper

collects the results of these queries and sends them

back to the client. Figure 1 shows a general

overview of the proposed wrapper.

Figure 1: System overview.

Moreover, the query expansion process starts

with finding appropriate rules in the defined rule set.

Furthermore, rules index is used to facilitate finding

rules during searching process. A matching between

the retrieved rules and the user query produces

expanded queries after replacing query conditions

with rules premises.

3 ADDING RULES

The data publisher may add extra knowledge (rules),

which considered as an extension for the original

data. Moreover, adding this kind of knowledge

enables the data admin, who selects specific

vocabularies to represent the published data into

RDF, to suggest alternatives vocabularies and

expresses them in rules instead of duplicate RDF

data with different vocabularies. In addition, the

SPARQL endpoint admin should add a set of rules

depending on the meaning of RDF dataset and

common queries that users used to ask.

Moreover, there are two types of rules that can

be added to the proposed wrapper. The first type

focuses on discovering new RDF triples to get the

implicit data, for example, the rules that infers the

relation between topics and authors. This rules

defined as: if a person X is an author for the paper Y

and the topic of the paper Y is T then the person X is

interested in the topic T. The other type of rules

focuses on vocabularies mapping to allow client to

use some different equivalent vocabularies. This will

not restrict the user to use the same vocabulary as

RDF dataset.

The format of user-defined rule is a simple

production rule, “condition Æ action”. The

condition part syntax is the same as SPARQL query

condition syntax. The action part is also represented

into SPARQL syntax. Figure 2 shows an example

for the rule: If a person A is an author to a paper Y,

and a person B is an author to the paper Y, then Æ A

knows B. The first part of the rule format is xml

namespaces for the used vocabularies. The second

part is the condition of the rule. The last part is the

action part.

WEBIST2012-8thInternationalConferenceonWebInformationSystemsandTechnologies

730

Figure 2: Example of user-defined rules.

Using these rules during the processing of

original data to infer more data on the fly costs

overhead processing time. However, using inference

rules during searching give better results. Moreover,

the proposed approach avoids data duplication and

keeps the data size.

4 SEARCHING RDF

The proposed wrapper is an extension to the normal

SPARQL endpoint. An inference step is added to the

SPARQL endpoint to make the use of the user-

defined rules. The proposed wrapper can answer

some queries that cannot be answered using normal

endpoint. Using inference, the engine goes behind

the raw data to find implicit query answer.

The proposed wrapper is developed based on

Jena SPARQL API with additional inference step.

The actual usage of the added inference step is not to

infer more data. However, the inference step is used

for query expansion to get detailed queries that can

be answered using the normal query engines.

4.1 Query Expansion

The proposed approach applies backward chaining

to expand user query. The algorithm of SPARQL

query expansion is a recursive algorithm that gets all

possible queries based on a set of predefined rules.

Indexing for the defined rules are established to link

different rules based on rules action. This index for

the defined rules facilitates finding the appropriate

rules to expand a SPARQL query. The basic idea of

this query expansion is to replace query condition

with other conditions based on backward chaining of

the rules.

Query expansion based on backward chining

algorithm is as follows:

Input: SPARQL query, a set of rules

Output: a list of new queries equivalent to the

inputted query

1. Get list of properties (predicates) used in query

conditions.

2. Get related rules that can be used to expand the

inputted query (based on rule index)

3. For each related rule

• Match between rule actions and query

conditions

• Bind matched variables and keep them in a

mapping state

• Replace the matched query conditions with rule

premises

• Recursive call to expand the new query // this

call starts matching the new query and only the

rest of rule set

4. Combine all new queries in one list

The resulted queries are executed using the

normal SPARQL endpoint. The results of these

queries are combined and sent back to the client.

5 EXPERIMENT

In order to show the effectiveness of the proposed

approach, we developed a web based application

using JSP as a wrapper for semantic web

conferences corpus endpoint, data.semanticweb.org

This data set contains information about some

conferences related to semantic web field and some

related information such as papers, authors, and so

on. It contains around 180445 unique RDF triples.

The developed wrapper is available on the Internet:

http://cic012.cic.ci.i.u-tokyo.ac.jp:8080/wrapper/

In this experiment, the developed wrapper uses a

set of five rules. Example of these rules is:

• If a person A is an author to a paper Y, and a

person B is an author to the same paper Y Æ A

knows B.

• If a person A is an author to a paper Y, and the

main topic of Y is T then Æ A is interested in T.

• If a topic X is a keyword for a paper Y Æ topic

X is a subject of Y.

The first two rules infer implicit data. However, the

third rule maps ontology vocabularies to facilitate

finding query answer.

Table 1 shows the list of queries used in this

<text>

if two people are authors for the same paper, then

they know each others.

</text>

<con>

{

?ppr rdf:type iswc:InProceedings.

?person rdf:type foaf:Person.

?person2 rdf:type foaf:Person.

?ppr dc:Creator ?person.

?ppr dc:Creator ?person2.

FILTER (?person != ?person2)

}

</con>

</condition>

{ ?person foaf:knows ?person2. }

</action>

</rule>

ARULE-BASEDSPARQLENDPOINTWRAPPER

731

experiment. A set of queries are run using SPARQL

endpoint of the semantic web conferences corpus,

http://data.semanticweb.org/snorql/

, and using the

proposed wrapper. The results of this experiment are

shown in table 2.

Table 1: List of Queries used in the experiment.

Number Query

Q1 Who is interested in Semantic Web

Q2 Who knows Jure Leskovec

Who knows Jure Leskovec and is interested in

'social networks'

Q4 Get papers in semantic web field

Q5 Get all papers of the author 'Gang Wang'

Q6 Get all papers in Ontology Learning field

The first query in table 1 asks about a person

who is interested in semantic web field. The

SAPRQL syntax for this query is as follow:

PREFIX foaf: http://xmlns.com/foaf/0.1/

select distinct ?ResearcherName where

{

?x foaf:name ?ResearcherName.

?x foaf:topic_interest 'Semantic Web'.

}

The RDF dataset does not contain topic_interest

relation between authors and topics. Therefore the

original endpoint returns no results. However, using

inference rules the proposed wrapper expand the

client query to another query depending on the

defined rules set. Finally, the proposed wrapper can

suggest 284 answers for this query.

In table 2, the third column shows the number of

extra queries that generated by the wrapper. This

number highly affects execution time.

Table 2: Query result using normal technique and the

proposed approach.

Query

number

Original

endpoint

Proposed wrapper

Number of

results

# of generated

queries

Total number

of results

Execution

time

Q1 0 1 284 2.010

Q2 0 1 9 1.132

Q3 0 3 4 2.248

Q4 106 1 108 1.908

Q5 3 1 6 1.116

Q6 1 1 6 1.106

Numbers of the retrieved answers for some

queries, such as Q4 and Q6, are increased because of

using inference to get the equivalent vocabularies.

Finally, using the proposed wrapper improves query

results. It enables web agent not only to get more

answers for queries but also to get answers for some

queries cannot be answered using the original

endpoint.

6 CONCLUSIONS

SPARQL endpoints facilitate querying RDF datasets

over http. However, there are many queries cannot

be answered. This paper proposes adding a new

layer to the existing SPARQL endpoints as a

wrapper. This wrapper facilitates writing queries for

clients and increases the query answering recall. The

proposed wrapper uses inference rules to go behind

the raw data and get query answer based on a set of

predefined rules. The experiments show that the

wrapper can answer some queries that cannot be

answered using the original endpoint. In addition,

the proposed wrapper can used in vocabulary

mapping to facilitate query writing for web clients.

REFERENCES

Axel Polleres, From SPARQL to rules (and back),

Proceedings of the 16th international conference on

World Wide Web (WWW2007), May 2007, Banff,

Canada pages 787–796

Christian Bizer and Andreas Schultz. Benchmarking the

Performance of Storage Systems that expose SPARQL

Endpoints. In Proceedings of the 4th International

Workshop on Scalable Semantic Web knowledge Base

Systems (SSWS), 2008.

Arup Sarkar, Ujjal Marjit, Utpal Biswas “linked data

generation for the university data from legacy

database” International Journal of Web & Semantic

Technology Year: 2011 Vol: 2 Issue: 3 Pages/record

No.: 21-31

Shady Elbassuoni, M. Ramanath, R. Schenkel, and G.

Weikum. Searching rdf graphs with SPARQL and

keywords. IEEE Data Engineering Bulletin, 33(1),

2010

Correndo, G., Salvadores, M., Millard, I., Glaser, H.,

Shadbolt, N.: Sparql query rewriting for implementing

data integration over linked data. In: 1st International

Workshop on Data Semantics (2010)

Olaf Hartig, Christian Bizer, and Johann-Christoph

Freytag: Executing SPARQL Queries over the Web of

Linked Data. In Proceedings of the 8th International

Semantic Web Conference (ISWC), Washington, DC,

USA, Oct. 2009

Holger Knublauch, James A. Hendler, Kingsley Idehen

“SPIN - Overview and Motivation”, http://www.w3.

org/Submission/2011/SUBM-spin-overview-2011022

2/, February 2011.

WEBIST2012-8thInternationalConferenceonWebInformationSystemsandTechnologies

732