tains results from SPARQL queries, and transforms
the results into QEF tuples [3]. Currently, QEF-LD
returns results in XML, JSON or HTML.
QEF-LD offers a set of Linked Data algebraic
operators, which capture the application semantics.
Complementarily, QEF-LD includes a set of control
operators, which access the data sources or cache in-
termediate results. The QEF-LD operators were im-
plemented using a consumer-producer strategy, defin-
ing a pipeline of results from one operator to another.
In more detail, QEF-LD implements the following
operators: SPARQL Endpoint Data Source, Service
operator, Project operator, BindJoin operator, Set-
BindJoin operator, Union operator. The SetBindJoin
operator offers scalability to large result sets by the
dynamic partitioning of result sets and parallel eval-
uation. It outputs results from the parallel process-
ing of tuple sets generated by its left producer. The
grouping of tuples obtained from the left producer of
the join in sets allows a reduction in the number of
remote requests to SPARQL Endpoints related to the
right producer of the join. It also limits the number
of returned tuples, since the binding of common vari-
ables used in producers leads to the formulation of a
query with lower selectivity, i.e. a more restrictive
query.
QEF-LD stores a federated query plan, as an XML
file, represented by a URI. A plan may have named
parameters, extracted from a URI, and used to filter
the query executionresults. QEF-LD also permits that
stored plans be pre-loaded into a cache during startup,
or on demand, when the plan is requested for the first
time.
4 ALGORITHMS
SetBindJoin Algorithm
The SetBindJoin algorithm outputs results from the
parallel processing of tuple sets generated by its left
producer. The grouping of tuples obtained from the
left producer of the join in sets allows a reduction in
the number of remote requests to SPARQL Endpoints
related to the right producer of the join. It also lim-
its the number of returned tuples, since the binding of
common variables used in producers leads to the for-
mulation of a query with lower selectivity, i.e. a more
restrictive query.
The processing of each set can be briefly divided
into the following steps:
(i) Create a tuple set S with elements retrieved
from the left producer of the join.
(ii) Retrieve tuples from the right producer of the
join that are related with tuples from the tuple set S.
(iii) Return the join results between tuples from
the set S and tuples retrieved from the right producer.
The steps are detailed below:
(i) Create a Tuple Set S with Elements Re-
trieved from the Left Producer of the Join. The
SetBindJoin algorithm (algorithms 1 and 2) groups
the tuples retrieved from the left producer of the join
in sets (Lines 6–16 of Algorithm 2). The sets have
a maximum number of tuples that is pre-configured
in the SetBindJoin operator in the query plan. That
configuration is represented in our algorithm by the
variable le ftTuplesSetSize.
(ii) Retrieve Tuples from the Right Producer of the
Join that are Related with Tuples from the Tu-
ple Set S. The right producer of the join is cloned
and existing queries in the right producer are refor-
mulated to bind the values of common variables be-
tween the left and right producers of the join. The
reformulation ensures that the right producer will
only retrieve results related to tuples from the tuple
set S. Clone and reformulation are performed by
the cloneAndReformulate method on line 17 of Al-
gorithm 2. The reformulation changes the original
query using UNION and FILTER features from the
SPARQL query language in order to bind variables.
Other reformulation strategies were tested, but
they were not feasible either due to some incompat-
ibility with most available SPARQL Endpoints or be-
cause their performance was worse than the adopted
strategy.
All the tuples retrieved by the left producer
of the join are stored in a hash table called
le ftTupleHashTable (Lines 4, 8, 11 and 17 of Al-
gorithm 2). The hash table key is a representation of
the values of the common variables between the join
producers and its value is a list of tuples that share the
key.
(iii) Return the Join Results between Tuples from
the Set S and Tuples Retrieved from the Right Pro-
ducer. For each tuple from the right producer of the
join, we retrieve a list with all left side tuples from the
le ftTupleHashTable that share the same key. Next,
we go over the list to join each of its elements with
the element retrieved from the right in order to return
the final result of the operation (Lines 20–30 of Algo-
rithm 2).
The resulting tuples from all sets processed in
parallel are stored in a single linked blocking queue
called resultBu f fer. The take method from the
resultBu f fer queue (Line 7 of Algorithm 1) retrieves
and removes its first element if the queue is not empty.
If the queue is empty, the take method waits until a
new element is added. The put method is used to in-
sert an element at the end of the queue (Lines 27 and
QEF-LD-AQueryEngineforDistributedQueryProcessingonLinkedData
187