Authors:
Saeed Sarencheh
and
Andrea Schiffauerova
Affiliation:
Concordia University, Canada
Keyword(s):
Ontology, Web Mining, Data Mining, Crawling, Machine Learning, TF-IDF, NLP, Concepts, Taxonomy, Non-taxonomy.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Data Engineering
;
Domain Analysis and Modeling
;
Enterprise Information Systems
;
Information Systems Analysis and Specification
;
Knowledge Engineering and Ontology Development
;
Knowledge Representation
;
Knowledge-Based Systems
;
Ontologies and the Semantic Web
;
Ontology Engineering
;
Symbolic Systems
Abstract:
Scientists use knowledge representation techniques to transfer knowledge from humans to machines. Ontology is the well-known representation technique of transferring knowledge to machines. Creating a new knowledge ontology is a complex task, and most proposed algorithms for creating an ontology from documents have problems in detecting complex concepts and their non-taxonomic relationships. Moreover, previous algorithms are not able to analyze multidimensional context, where each concept might have different meanings. This study proposes a framework that separates the process of finding important concepts from linguistic analysis to extract more taxonomic and non-taxonomic relationships. In this framework, we use a modified version of Term Frequency – Inverse Document Frequency (TF-IDF) weight to extract important concepts from an online encyclopedia. Data mining algorithms like labeling semantic classes are used to connect concepts, categorize attributes, and label them and an onlin
e encyclopedia is used to create a structure for the knowledge of the given domain. Part Of Speech tagging (POS) and dependency tree of sentences are used to extract concepts and their relationships (i.e. taxonomic and non-taxonomic). We then evaluate this framework by comparing the results of our framework with an existing ontology in the area of “biochemy”. The results show that the proposed method can detect more detailed information and has better performance.
(More)