generated tool named KAON uses inference engine
for answering conjunctive queries in the form of
SPARQL. The search is based on WordNet and
lexical matching of element names providing more
intuitive searching (Mädche, 2008).
Another study is SEWISE, which is an ontology
based web information search engine. This study is
experimented in the financial domain where
ontology is developed for web finance news.
Statistical text mining techniques (text indexing, text
categorization, text summarization, keyword
extraction) are used to refine and extract HTML
pages to XML files which are semantically tagged.
These XML files include semantic knowledge of the
finance news which enriches textual information.
XML repository can be queried using rich Xqueries
to gather relevant information (Gardarin, 2003).
Another research conducted by Buitelaar et al.
focuses on ontology based information extraction
from soccer web pages. In that study, a tool named
SmartWeb Ontology Based Annotation (SOBA)
component, which automatically populates a
knowledge base by information extraction from
soccer match reports found on the web, has been
developed. They extract information from
heterogeneous sources such as tabular structures,
text and image captions in a semantically integrated
way by using SProUT which multilingual NLP
platform. It implements a novel paradigm in which
information extraction, knowledge base updates and
reasoning are tightly interleaved (Buitelaar, 2006).
Another study is conducted by Alan et al. describes
a video annotation and querying system which is
capable of semi-automatic annotation of videos from
text. The domain is soccer domain and video
segments are enriched with metadata (textual
content). This approach provides semantic search in
videos. This study helps user to query videos
according to important parts of games (e.g. all goals
by Hakan or Nihat) (Alan, 2008).
Similar research conducted by Kara et al.(2012) is
about ontology based information extraction and
retrieval. This study is conducted in soccer domain
from UEFA and SporX. The extraction process is
achieved by using domain specific ontology. The
retrieval system is enhanced using semantic
indexing where the domain specific information
extraction is used. The domain specific queries (e.g.
fouls committed by Daniel) give better results than
traditional keyword based information retrieval.
Also this system can answer the queries without
need of SPARQL. This study is a base of our
research with finance domain.
5 METHODOLOGY
In this section, steps of proposed ontology
development procedure will be explained. After
ontology development step, ontology population and
semantic search methodologies will be explained.
5.1 Ontology Building
The ontology extraction methodology includes many
migrations from words to the ontology. Firstly,
domain knowledge is gathered by discussing key
terms that are used in budgeting and accounting.
Then these terms (more than 1500) are simplified to
key terms (about 120) that are generally used in this
domain. The key terms are defined and all these
terms are discussed in detail through regular
meetings with domain experts. This process also
includes grouping the key terms into manageable
categories so that ontology can be easily divided into
small and easily definable sub ontologies.
Afterwards, we develop audit, budgeting,
control, accounting, performance, planning, process,
risk ontologies according to the clustering of key
terms and relations between these terms. We discuss
the key terms and relations within ontologies, in case
of necessity we simplify ontologies by further
consolidation of terms or relations. After
determining all groups of ontologies, we merge
relations and obtain a small set of relations. Lastly,
we combine all groups of ontologies into one
consolidated ontology and define new relations
between these sub-ontologies.
While extracting budgeting and accounting
ontologies, some groups seem to pose less
importance at first sight. For example “risk”, “audit”
or “control” has more general terms. The reason
behind this is to focus on budget and account
groups. Moreover, ontology extraction is an
evolving process, new terms can always be added to
these groups while decision making and
performance management models are developed.
Similarly, the relation between performance and
budgeting can be extended. The main aim of this
initial ontology is to capture domain knowledge and
to identify the core relations in finance.
While developing the ontology, a number of
domains exist such as budgeting, accounting, etc.
The main problem here is the way of construction of
a combined ontology including all the related
domains. As a solution, instead of one big and
unreadable ontology, related ontologies are
developed for each domain separately.
OntologybasedKnowledgeExtractionwithApplicationtoFinance
47