D(w, sqi ) = Π(a), according to information given by
term index of wq.
-if qi ∈ ΠA then we add an edge (с,s) with an
edge s. Inserted edge will be related exactly to w ∈
Sc and labeled as D(w, s) = Π(w) ∈ Ei.
-if qi ∈ Er / Qi ∈ Ec then we verify if ∃ qi=Sln. If
yes, then we add an edge (w,s) with a vertice s and
labeled D (w,s) = Sln.
-∀qi ∈ Er, if ∃ property ∈ UN, then eliminate
triple from graph.
The graph which will be kept, is the less
connected of matched elements qi ∈ с with the
lowest score SCO. SCO is defined as average
distance between matched elements pairs qi. SCO =
∑ distance (qi, qj) ∀i, j = 1, ..., n and i ≠ j. To reduce
SCO, we look for the shortest path in graph D.
However, by calculating this shortest path, in some
cases, qi may be an edge, so the distance between
two matched elements is computed including edges
and vertices leading to qi. Finally, we combine all
shortest distances forming a connected sub-graph.
To find relevant graph, we keep only the lowest
scores.
We can, at this level, generate a SPARQL query
basing on the kept graph. Prior to this, we have to
apply preprocessing of keywords, as result of
preprocessing, we may add a derived terms to
originals ones. The entire terms and keywords will
be submitted for querying.
4.2 Query Construction
We generate SPARQL queries for every graph while
considering:
-if (s,o) ∈ A, where s & o ∈ Ss, L(o) ∈ Es &
Π(s,o) ∈ Ei, then we generate the skeleton of the
following triple: var(s) Π(s,o) var (o) FILTER
(var(o)=Sln) (For Storylink triples)
-∀ (s,o) ∈ A, where s & o ∈ Ss and Π(s,o)=cx,
then we generate the skeleton of the following triple:
var(s) Π(s,o) var (o) FILTER (var(o)= (cxi,.. cxn) ∈
{cx}) (For Context triples)
For all the 6 semantic relationships defined in
our Contextual Schema:
-if s ∈ Sc then we associate the vertice s with a
new variable var(s) and generate the skeleton of the
following triple: var(s) rdf :type Π(s) where E(s) ∈
Ec.
-if (s,o) ∈ A, from vertice s ∈ Sc to vertice o ∈
Sc (inter-entities property) where Π(s,o) ∈Er, then
we generate the skeleton of the following triple:
var(s) L(e) var(o)
-if (s,o) ∈ A, from vertice s ∈ Sc to vertice o ∈ Ss
(entity-attribute property) where L(o) ∈ Es & Π(s,o)
∈ Ei, then we generate the skeleton of the following
triple: var(s) Π(s,o) var (o) FILTER (var(o)=L(o)).
-if (s,o) ∈ A, from vertice s ∈ Sc to vertice o ∈ Ss
(entity-attribute property) where Π(s,o) ∈ Ei, then :
we generate the skeleton of the following triple:
var(s) Π(s,o) var (o).
Last step before query generation is adding more
restrictions about shared context of keywords and
about uncertainty property.
5 EVALUATION
We present in this section, a brief evaluation of this
work.
First of all, we have to make distinguish between
different results. In fact, they are divided into two
groups. Results including exact user’s terms and
results including and considering user intents. This
implies the existence of correct results and those
entirely relevant. For this reason, we distinct number
of keywords best matching to elements from
Triplestrore in order to find the best sub-graph
considering all user requirements. In addition, we
have to distinct correctness degree of results. For
this, we introduce a measures already used in this
kind of evaluation (Shekarpour; 2013).
Let’s take the same example described
previously. Generated query may return (correct,
partially correct, or wrong) results. User keywords
could match exactly with resources but those latter
may not be the intent of user in terms of meaning.
On that view, results may deviate towards resources
just talking about Hillary in other context.
We introduce those three measures:
Correctness Rate (CR) : This measure will let us
assign a score to results matching to the user's
expectations of his query q. CRq() is the fraction of
correct terms over all terms.
(
)
=
(2)
(Average CR): It represents arithmetic of CR of
their individual answers A for query q.
(
)
=
|
|
∗
∑
()
∈
(3)
Fuzzy precision (FP): The ACR is the basis of
this measure, which measures the overall correctness
of graphs matching answers Aq.
The ACR is the basis for the fuzzy precision
metric (FP), which measures the overall correctness
of a template’s corresponding answers Aq with
respect to a set of keyword queries Q.