Listing 1: Dailymed Sample Triples.
dailymeddrug : 8 2 a dailymed : drug
dailymeddrug : 8 2 dailymed : a ct i v e i n g r e di e n t dail y m e d i n g :
Phenytoin
dailymeddrug : 8 2 r df s : la b e l ” D i l a n t in −125 ( Suspension ) ”
dailymeddrug :201 a dailymed : drug
dailymeddrug :201 dailymed : a c t i v e i ng r e d i e n t dail y m e d i n g :
Ethosuximide
dailymeddrug :201 r d f s : l a b el ” Za r o n t i n ( Capsule ) ”
dailymedorg : Parke−Davis a dailymed : o r ga n i za ti o n
dailymedorg : Parke−Davis r d f s : l a b e l ” Parke−Davis ”
dailymedorg : Parke−Davis dailymed : producesDrug
dailymeddrug : 8 2
dailymedorg : Parke−Davis dailymed : producesDrug
dailymeddrug :201
dailymeding : Ph eny t o i n a dailymed : i ng r e di e n ts
dailymeding : Ph eny t o i n rd f s : l ab e l ” Phenytoin”
dailymeding : Ethosuximide a dailymed : i n g r e d i e n ts
dailymeding : Ethosuximide r d f s : la b e l ” Ethosuximide ”
the Operating System and Fuseki 1.0 as the SPARQL
Endpoint server. For each dataset, we set up Fuseki
on different ports. We re-used the query set from our
previous work (Rakhmawati and Hausenblas, 2012).
We limited the query processing duration to one hour.
Each query was executed three times on two federa-
tion engines, namely SPLENDID (G¨orlitz and Staab,
2011) and DARQ (Quilitz and Leser, 2008). These
engines were chosen because SPLENDID employs
VoID(http://www.w3.org/TR/void/) as data catalogue
that contains a list of predicates and entities, while
DARQ has a list of predicates which is stored in the
Service Description(http://www.w3.org/TR/sparql11-
service-description/). Apart from using VoID,
SPLENDID also sends a SPARQL ASK query to de-
termine whether or not the source can potentially re-
turn the answer. We explain the details of our dataset
generation and metrics as follows:
4.1 Data Distribution
To determine the correlation between the commu-
nication cost of the federated SPARQL query and
the data distribution, we generate 9 datasets by di-
viding the Dailymed(http://wifo5-03.informatik.uni-
mannheim.de/dailymed/) into three partitions based
on following strategies:
4.1.1 Graph Partition
Inspired by data clustering for a single RDF storage
(Huang et al., 2011), we performed graph partition
over our dataset by using METIS (Karypis and Ku-
mar, 1998). The aim of this partition scheme is to
reduce the communication needed between machines
during the query execution process by storing the con-
nected components of the graph in the same machine.
We initially identify the connections of subject and
object in different triples. We only consider the URI
object which is also a subject in other triples. Intu-
itively, the reason is that the object which appears as
the subject in other triples can create a connection if
the triples are located in different dataset partitions.
V(D) denotes the set of pairs of subject and object that
are connected in the dataset D which can be formally
specified as V(D) = {(s, o)|∃s,o, p, p
′
∈ U : (s, p,o) ∈
D ∧ (o, p
′
,o
′
) ∈ D
′
}. We assign a numeric identifier
for each s,o ∈ V(D). After that, we create a list of se-
quential adjacent vertexes for each vertex then uses it
as input of METIS API. Run METIS to divide the ver-
texes and get a list of the partition number of vertexes
as output. Finally, we distribute each triple based on
the partition number of its subject and object. Con-
sider an example, given Listing 1 as a dataset sample,
then
V(D)={(dailymeddrug:82,
dailymeding:Phenytoin),(dailymeddrug:201,
dailymeding:Ethosuximide),(dailymedorg:Parke-Davis,
dailymeddrug:82),(dailymedorg:Parke-Davis,
dailymeddrug:201)}
Starting an identifier value from one and increment
the identifier later, we set the identifier for daily-
meddrug:82 = 1, dailymeding:Phenytoin =2, dai-
lymeddrug:201=3, dailymeding:Ethosuximide=4 and
dailymedorg:Parke-Davis=5. After that, we can
create list of sequential adjacent vertexes V(D) is
{(2,5),1,(4,5),3,(1,3)}. Suppose that we divide the
sample of dataset into 2 partitions, then the output of
METIS partition is {1,1,2,2,1} where each value is
the partition number for each vertex. According to the
METIS output, we can say that dailymeddrug:82 be-
longs to partition 1, dailymeding:Phenytoin belongs
to partition 1, dailymeddrug:201 belongs to partition
2 and so on. In the end, we have two following parti-
tions:
Partition 1: all triples that contain dailymeddrug:82, daily-
meding:Phenytoin and dailymedorg:Parke-Davis
Partition 2: all triples that contain dailymeddrug:201 and
dailymeding:Ethosuximide
4.1.2 Entity Partition
The goal of this partition is to distribute the number of
entities evenly in each partition. Different classes can
be located in a single partition. However, the entities
of the same class should be grouped in the same parti-
tion until the number of entities reaches the maximum
number of entities for each source. We initially create
a list of the subjects along with its class (E(D)). The
set E(D) of pairs of subject and its class in the dataset
D is defined as E(D) = {(s,o)|∃(s,rd ftype,o) ∈ D}
Then, we sort E(D) by its class o and store each pair
WEBIST2014-InternationalConferenceonWebInformationSystemsandTechnologies
122