relations with semantic associations among
concepts. Hence, the main contribution of this work
is consequently not trying to develop new or
improving any of the current classification
algorithms but to affect the document term vectors
in a way that we could and measure the effect of
such semantic enrichment on existing classifiers.
This paper is structured as follows. Section 2
presents the related work. Section 3 illustrates the
domain ontology used under this work. Section 4
describes the process of enrichment of KSs. Section
5 illustrates the empirical evidences of the work
addressed so far. Finally, section 6 concludes the
paper and points out the future work to be carried
out.
2 RELATED WORK
The presented work is the continuation of the work
presented in (Figueiras et al., 2012) and (Costa et al.,
2012). In terms of the issue addressed here, Castells
et al. (Castells et al., 2007) propose an approach
based on an ontology and supported by an
adaptation of the Vector Space Model, similarly to
our approach. It uses the tf-idf (term frequency–
inverse document frequency) algorithm, matches
documents’ keywords with ontology concepts,
creates semantic vectors, and uses the cosine
similarity to compare created vectors. A key
difference between this approach and the presented
work is that Castells’ work does not consider
semantic relations or the hierarchical relations
between concepts (both taxonomic and/or
ontological relations).
Li (Sheng, 2009) presents a way of
mathematically quantifying such hierarchical or
taxonomic relations between ontological concepts,
based on relations’ importance and on the co-
occurrence of hierarchically related concepts, and
reflects this quantification in documents’ semantic
vectors. Li’s work aims at creating an Information
Retrieval (IR) model based on semantic vectors to
apply over personal desktop documents, and it has
no relation to Web IR applications, as is the case of
the presented work.
On the other hand, Nagarajan et al. (Nagarajan
et al., 2007) propose a document indexation system
based on the VSM and supported by Semantic Web
technologies, just as we do here. They also propose
ways of quantifying ontological relations between
concepts, and represent that quantification in
documents’ semantic vectors. There are some
differences between Nagarajan’s work and our
approach. For instance, Nagarajan et al. do not
distinguish between taxonomic and ontological
relations, also our work doesn’t not include terms
from documents within semantic vectors, such terms
previously semantically mapped to ontology
concepts.
Focusing on more recent works, Xia et al. (Xia
and Du, 2011) propose a document classification
mechanisms based on title vector based document
representations, in which is assumed that terms in
documents’ titles represent main topics in those
documents, and therefore the weights for title terms
should be amplified.
Finally, the work of García et al. (García et al.,
2010) aims to propose some new metrics to measure
relationships among classes in an ontology.
Relationships among classes in an OWL ontology
are given by the object properties that are defined as
a binary relation between classes in the domain with
classes in the range. The proposal of García et al. is
based on the coupling metric defined in the software
engineering field, adapting it to the Semantic Web’s
needs.
3 THE ONTOLOGY
The domain-specific ontology used in this work was
entirely developed using Protégé ontology editor
(Stanford Center for Biomedical Informatics
Research, s.d.), and it is written in OWL-DL
language (Sean et al., s.d.). The ontology
comprehends two major pillars, namely, concepts
and relations. The first relates to specific elements
(classes) of building and construction related areas
which cover for example, type of project, project
phase, and similar data. The other specifies how
such concepts are related to each other.
Several levels of specificity are given for all
concept families, as described for the ‘Actor’
concept. These specificity levels represent concepts
hierarchies and, ultimately, taxonomic relations such
as ‘Architect’ <is_a> ‘Design Actor’ and ‘Design
Actor’ <is_a> ‘Actor’. All classes, or concepts, have
an instance, which corresponds to the class, and
comprises the keywords or expressions gathered and
related to each concept, through an ontological
datatype property designated ‘has Keyword’.
All concepts are themselves keywords, because
they are expressions or terms that may occur in a
knowledge source. In addition to themselves,
concepts also possess equivalent terms that are terms
or expressions relevant for capturing different
semantic aspects of such concepts. For instance, the
ClassificationofKnowledgeRepresentationsusinganOntology-basedApproach
185