tagging technique and dependency tree of sentences
are used to analyze the dependency relationships
between sentence components to identify the
taxonomic and non-taxonomic relationships between
concepts. Finally, the measured TF-IDF weight of
concepts and status of concept in dependency tree are
then used to create the ontology structure.
This paper is organized as follows: in Section 2,
we conduct a literature review of previous studies and
asses the gaps in research. In Section 3, we illustrate
our framework and explain the algorithms in detail.
We describe our experiment and implementations and
evaluate our method in Section 4 and present the
results of the experiment in Section 5 to compare it
with previous research. Finally, in Section 6, we
conclude by revisiting our research goals and discuss
the results of the experiment.
2 LITERATURE REVIEW
Many studies have looked at extracting ontology from
plain texts. SnowBall (Agichtein and Gravano, 2000),
Textrunner (Aroyo et al., 2002), OntoGen (Fortuna et
al., 2006), OntoLearn (Navigli and Velardi, 2004),
OntoLT (Buitelaar et al., 2004), and Mo’k (Bisson et
al., 2000), for example, all attempted to generate
domain ontology from plain texts, with some using
machine learning to identify concepts (i.e. OntoGen,
SnowBall, and OntoLearn). However, none of these
studies have focused on extracting the non-taxonomic
relationships of concepts.
Some studies have used the frequent-based
technique to extract concepts from plain texts.
Maedche et al., (2001) introduced a new framework
– “Text-To-Onto” – a semi-automatic algorithm, to
extract ontology from plain texts. In Text-To-Onto,
concepts are extracted using the term frequency
algorithm. In this framework, hierarchy clustering is
used to link related concepts and a modified version
of association rules algorithm is used to extract the
non-taxonomic relationships between concepts. In
their study, the TF-IDF algorithm was used to identify
concepts, but TF-IDF detects a single noun as concept
only. In a similar work, Anantharangachar et al.,
(2013) proposed a new approach for extracting an
ontology from unstructured texts. In their study,
Anantharangachar et al., (2013) use a Natural
Language Processing (NLP) technique to extract
concepts, the taxonomic, and non-taxonomic
relationships from documents. In NLP, the document
theme is extracted applying the equation below:
_∩
∩
This algorithm is not be able to detect the correct
theme for descriptive documents because most
writers explain the main topics in the first paragraph
and describe sub-topics in other paragraphs.
Moreover, in their study, Anantharangachar et al.,
(2013) also consider the noun as concept, which
decreases algorithm performance. Some nouns
phrases do address a concept but the proposed
algorithm extracts various concepts from all noun
phrases.
Zavitsanos et al., (2010) introduced a new
framework for extracting an ontology from plain text.
In this framework, stopwords are removed from
documents and feature vectors are created for the
remaining words. Afterwards, the Latent Dirichlet
Allocation (LDA) algorithm is applied to extract
latent topics from documents, and mutual information
rate is used to create a hierarchy structure in iterative
processing. This framework is not properly efficient
since in this case, document and paragraph length is
shorts.
Drymonas et al., (2010) proposed a new multi-
layer framework to extract an ontology from
unstructured text. In this framework, noun phrases are
extracted in the first layer. Then, association rule and
probabilistic techniques are applied to extract the
taxonomic and non-taxonomic relationships. The
technique proposed in this study has an ability to
extract more complex phrases.
Serra et al., (2013) developed an algorithm to
extract non-taxonomic relationships. They categorize
information into three different groups: the sentence
rule (SR), the sentence rule with verb phrase (SR),
and the apostrophe rule (AR). An intelligent
algorithm is used to detect noun or verb phrases
around concepts and refine extracted phrases and the
algorithm is used to specify the regular expression in
each step in order to extract non-taxonomic
relationships between concepts. An ontology
specialist has to evaluate the non-taxonomic
relationships, but it should be noted that this
algorithm cannot be used to create an ontology based
on the huge amount of documents and relationships
within the document. However, here, the non-
taxonomic relationship is extracted independent from
the verb, illustrating the type of relationship. As
Villaverde et al., (2009) have illustrated, two phrases
which do not have any similar words might be related
by one verb. Thus, the verb is an important factor in
identifying a non-taxonomic relationship when
creating an ontology that uses as an inferring
algorithm.
Villaverde et al., (2009) proposed a solution to
this problem. They extracted concepts from plain