taxonomy-based semantic vectors are used as inputs
for this analysis. As in taxonomy-based semantic
vector creation, there are two processes involved on
the ontological relationship analysis: the first boosts
weights belonging to concepts within the input
semantic vector, depending on the ontology relations
between them; the second adds concepts that are not
present in the input vector, according to ontological
relations they might have with concepts belonging to
the vector (Nagarajan et al., 2007).
As in taxonomy-based semantic vector creation,
the new concept is added to the semantic vector only
if the ontological relation importance is greater than
or equal to a pre-defined threshold, for the same
constraint purposes. The ontological relation’s
importance, or relevance, is not automatically
computed; rather, it is retrieved from an ontological
relation vector which is composed by a pair of
concepts and the weight associated to the pair
relation.
In the case of the second process (ontological
relation between one concept within the input
semantic vector and another concept not comprised
in that vector), and again as in the taxonomy-based
semantic vector creation process,
is not modified
and
is added to the semantic vector.
4 ASSESSMENT
This chapter illustrates the assessment process of the
proposed approach within this work. First, the
knowledge source indexation process will be
assessed. And finally, an example of a query and its
results is exemplified.
4.1 Treating Queries
As mentioned before, queries are treated like
pseudo-documents, which means that all queries
suffer an indexation process similar to the one
applied to documents.
For the purpose of this assessment, it was used a
corpus of sixty five knowledge sources randomly
selected but all having a strong focus on the building
and construction domain. Just as an example, a test
query search for “door”, “door frame”, “fire
surround”, “fireproofing” and “heating” is inserted
in the interface’s keyword-based search field,
meaning that the user is looking for doors and
respective components that are fireproof or that
provide fire protection. In this case, keyword “door”
is matched with concept “Door”, “door frame” is
matched with “Door Component”, and so on, as
shown in Table 4. Weights for matched ontological
concepts are all equal to 0.2, because each concept
only matches with one keyword. Hence, the
semantic vector for this query will be the one of
Table 4.
Table 4: Example of a query's semantic vector.
# Keyword Ontology concept Weight
1
Door Door 0.2
2
door frame Door Component 0.2
3
fire surround Fireplace And Stove 0.2
4
Fireproofing Fireproofing 0.2
5
Heating Complete Heating System 0.2
4.2 Comparing and Ranking
Documents
Our approach for vector similarity takes into account
the cosine similarity (Deza and Deza, 2009) between
two vectors, i.e. its cosine, which is calculated by the
Euclidian dot product between two vectors, and the
sparse-matrix multiplication method, which is based
on the observation that a scalar product of two
vectors depends only on the coordinates for which
both vectors have nonzero values.
The cosine of two vectors is defined as the inner
product of those vectors, after they have been
normalized to unit length. Let be the semantic
vector representing a document and the semantic
vector representing a query. The cosine of the angle
between and is given by:
cos
‖
‖
∙
‖
‖
∑
∑
∑
(2)
where m is the size of the vectors,
is the weight
for each concept that represents and
is the
weight for each concept present on the query vector
(Castells et al., 2007) (Li, 2009).
A sparse-matrix multiplication approach is
adopted here, such as the cosine similarity, because
the is one of the most commonly used similarity
measures for vectors and and it can be
decomposed into three values: one depending on the
nonzero values of , another depending on the
nonzero values of , and the third depending on the
nonzero coordinates shared both by and .
Document ranking is based on the similarity
between documents and the query. More
specifically, and because the result of the cosine
function is always 0 and 1, the system extrapolates
the cosine function result as a percentage value.
The first results for the documents’ test set is
very satisfactory: The first search-resultant
knowledge source presents a relevance of 84% to the
InformationRetrievalinCollaborativeEngineeringProjects-AVectorSpaceModelApproach
237