method to extract non-taxonomic relations, and is
used in ontology learning tools such as Text2Onto
(Cimiano et al. 2005) and OntoLearn (Velardi et al.
2005). These ontology learning tools use association
rule mining with traditional confidence measure to
extract non-taxonomic relations. However, there are
noted limitations about confidence measure is that it
(a) is sensitive to the frequency of the concepts in
the data set and may return pairs of concepts even if
there is no association between them, and (b) suffers
from rare itemset problem whereby even if an
association rule representing an important
relationship between concepts exists but since it is
rare it is pruned altogether (Sheikh et al. 2005).
In this paper, we pursue the extraction of concept
pairs, from unstructured text, that has a non-
taxonomic relation between them. We present a
concept correlation search framework that employs a
statistical approach that is an extension to the
traditional association rule mining approach used in
ontology learning tools for non-taxonomic relation
extraction. Our approach to search for correlated
concepts has three distinct elements: (i) we
investigate the use of the lift measure (Sheikh et al.
2005), as opposed to the traditional support and
confidence measures, to establish the interestingness
between correlated concepts. The key advantage of
our use of the lift measure is that it determines how
many times more often concept X and concept
Y occurs together than expected if they were
statistically independent. Lift does not suffer from
the rare item problem (Sheikh et al. 2005); (ii) when
searching for correlated concept pairs we look
beyond the traditional one-sentence window to
include multiple adjacent sentences. Our approach is
based on the observation that quite often scientific
authors discuss correlated concepts across multiple
sentences, therefore we search correlated concepts
across two adjoining sentences; (iii) we employ a
domain ontology, as background knowledge, to filter
out the correlated concepts that have a taxonomic
relationship between them. This leaves us with a set
of non-taxonomic concept pairs that serve as
candidates for non-taxonomic relations during
ontology learning. We apply our framework to
search for non-taxonomic concept pairs for the
domain of marine biology—we worked with 374
Fisheries Oceanography journal publications over a
period of 10 years (1999-2008). We extracted 130
concept pairs out of which 108 non-taxonomic
concept pairs were identified. The results were
validated by domain experts.
2 LITERATURE REVIEW-
RELATED WORK
Ontology learning involves Machine Learning (ML)
and advance Natural Language Processing (NLP)
technologies, starting from term extraction and
concept definition to more complex tasks such as
learning taxonomic and non-taxonomic relations. In
this section, we review the state-of-the-art in
ontology learning tools specific to non-taxonomic
relation extraction.
From a statistical perspective, the pioneer
research work in non-taxonomic relation extraction
was performed by Maedche & Staab (2000) using
association rule mining. Subsequently, ontology
learning tools such as Text2Onto (Cimiano et al.
2005) and OntoLearn (Velardi et al. 2005) also
approach the non-taxonomic relation extraction task
from the statistical point of view using association
rule mining with traditional confidence measure.
Hasti (Shamsfard & Barforoush 2004), another
ontology learning tool, extracts non-taxonomic
relations from the semantic analysis point of view.
Hasti combines logical, linguistic-based, template
driven and semantic analysis methods in their non-
taxonomic relation extraction. A hybrid of both
approaches is taken by RelExt (Schutz & Buitelaar
2005) in their non-taxonomic relation extraction
where relevant terms and verbs are extracted from a
given text collection. Then, a combination of both
linguistic and statistical processing is used to
compute relations between them. The problem with
these methods is that they are dependent on sentence
structure. Thus, the search window size for
correlated concepts is short and constrained to a
single sentence. Short search window size used often
proves to be deficient in discovering relations
(Chagnoux et al. 2008).
From the literature review, it is clear that
ontology learning, especially the extraction of non-
taxonomic relations from unstructured text is a
challenging, yet much pursued area. Our work is an
extension to the traditional association rule mining
used in some of the abovementioned tools. We
pursue to look beyond single-sentence window and
use lift as the interestingness measure to yield
interesting concept pairs that represent potential
non-taxonomic relations in ontology learning
context.
WEBIST 2011 - 7th International Conference on Web Information Systems and Technologies
708