common variable of the two triple patterns
determines the different cases in Figure.
Note that the executions of the queries presented
in Figure and in Figure retrieve the same results for
the bindings of the variables. They are equivalent
queries except of the order of their triple patterns.
Thus, case 3 and case 7 (and analogously case 2 and
case 4, and case 6 and case 8) of Figure are
symmetric cases, where only the order of the triple
patterns are exchanged, which does not influence the
final result. Therefore, we only have to consider the
cases 1, 2, 3 5, 6 and 9, when we exchange the order
of triple patterns for the cases 4, 7 and 8.
Keys are unambiguous objects to identify the
entries in indices. Indices are data structures and
methods to administer pairs of keys and their
corresponding entries, which are here the results of
the joins. Especially the access to the entry by using
the key should be fast.
Variables in the triple patterns specify which of
the subjects, predicates and objects of two
considered triples of the RDF data are bound to the
variables. Literals in triple patterns are fixed
constraints in the triple patterns. Thus, the sequence
of literals of the two triple patterns is the key for the
triples of the RDF data. If we do not consider the
position of the literals in the triple patterns, we can
only ambiguously retrieve the relevant triples of the
RDF data, as we can retrieve in general the same key
for different constraints. For example, consider two
queries Q1 and Q2. Let us assume that Q1 has the
literals L1 and L2 at the subject and object position
of the first triple pattern, and Q2 has the literals L1
and L2 at the predicate and object position of the
second triple pattern. It is obvious that we should
retrieve different results for both queries, but we
determine the same key <L1, L2> from both queries
Q1 and Q2 and we are thus not able to distinguish
these results by the “key”. We propose to use
different indices for all possible situations of
positions of literals by determining the current index
from the positions of literals in the currently
considered query.
As we consider queries containing one join
expressed by a common variable, we only have to
consider four relevant positions in the two triple
patterns together. Figure contains the relevant
positions for the different join cases of Figure
(except the eliminated symmetric join cases 4, 7 and
8). Thus, we have to construct and use 2
4
=16
different indices for each join case of Figure except
the eliminated symmetric join cases 4, 7 and 8.
Therefore, we have to administer 6*2
4
=96 different
indices, which is practical as shown in the
experimental evaluation. We can determine the
index for a specific query by computing Σ
i=0
3
B
i
*2
i
,
where B
i
=1 if there is a literal at the position i (see
Figure) in the considered triple patterns without the
join partners as declared in Figure, otherwise B
i
is 0.
Position 0 Position 1 Position 2 Position
3
Case 1 p1 o1 p2 o2
Case 2 p1 o1 s2 o2
Case 3 p1 o1 s2 p2
Case 5 s1 o1 s2 o2
Case 6 s1 o1 s2 p2
Case 9 s1 p1 s2 p2
Figure 7: The relevant positions 0 to 3 for the join cases of
Figure 3 except the eliminated symmetric cases, where sx
represents the subject position, px the predicate position
and ox the object position in triple pattern x∈{1, 2}.
After the determination of the correct index, we
can access the correct triple set for the two triple
patterns in the index by using the key o
i=0
3
L
i
, where
o is the concatenation operator for keys and L
i
is the
literal at the position i (see Figure) in the considered
triple patterns if there is a literal, otherwise L
i
is the
empty key. For example, we compute the key
“<
http://purl.org/dc/elements/1.1/title>|<ht
tp://purl.org/dc/elements/1.1/title>
” from
the query of Figure.
When using a hash map as index, we can retrieve
the results of a join over two triple patterns in
constant time.
If there are two or three joins in two triple
patterns, then we can use the index for one join and
additionally compare the constraints of the second or
third on the retrieved triple set, or construct and use
also indices for these (more seldom) cases.
If there are two or more joins over more than
two different triple patterns, one approach is to split
the joins into several (part) joins over two triple
patterns, access their results separately by using our
approach, each access saves the processing time of
one join, and then joining the results.
Another approach is to extend our approach for
multi-joins over different triple patterns, which we
present in Section 2.4.
2.3 Constructing the Index
In order to construct the index from the input RDF
data, we first construct three indices to access the
triples of common subjects, predicates and objects.
For this purpose, we iterate one time through all
triples of the input RDF data set and add the current
triple into three hash maps, where we use the subject
as key for the first hash map, the predicate as key for
the second hash map, and the object as key for the
third hash map. Note that these hash maps not only
store one triple for one key, but a list of triples with
these keys. The construction of each of these three
ICEIS 2007 - International Conference on Enterprise Information Systems
16