Authors:
Ruben Costa
1
;
Paulo Figueiras
1
;
Pedro Maló
1
and
Celson Lima
2
Affiliations:
1
Universidade Nova de Lisboa, Portugal
;
2
Federal University of Western Pará, Brazil
Keyword(s):
Ontology Engineering, Unsupervised Document Classification, Vector Space Models, Semantic Vectors.
Related
Ontology
Subjects/Areas/Topics:
Applications
;
Artificial Intelligence
;
Data Engineering
;
Enterprise Information Systems
;
Enterprise Ontology
;
Information Systems Analysis and Specification
;
Knowledge Engineering and Ontology Development
;
Knowledge Representation
;
Knowledge-Based Systems
;
Natural Language Processing
;
Ontologies and the Semantic Web
;
Ontology Engineering
;
Pattern Recognition
;
Symbolic Systems
Abstract:
One of the primary research challenges in the knowledge representation domain relates to the process of formalization of document contents using dependent metadata and in particular how the classifiers are derived. Most approaches to determining appropriate classifiers are limited and only take account of the explicit, word-based information in the document. The research described in this paper explores the potential classifier enrichment through incorporation of implicit information derived from the complex relationships (Semantic Associations) in domain ontologies with the addition of information presented in documents for unsupervised document classification. The paper introduces a novel conceptual framework for representation of knowledge sources, where each knowledge source is semantically represented (within its domain of use) by a Semantic Vector (SV), which is enriched using the classical vector space model approach extended with ontological support, employing ontology concep
ts and their relations in the enrichment process. The test domain for the assessment of the approach is Building and Construction, using an appropriate available Ontology. Preliminary results were collected using a clustering algorithm for document classification, which indicates that the proposed approach does improve the precision and recall of classifications. Future work and open issues are also discussed.
(More)