Exploiting Users’ Feedbacks
Towards a Task-based Evaluation of Application Ontologies Throughout
Their Lifecycle
Perrine Pittet and Jérôme Barthélémy
Articque Software, 149 avenue Général de Gaulle, 37230 Fondettes, France
Keywords: Application Ontology, Task-based Ontology Evaluation, Ontology Revision, Semantic Annotation,
Ontology Lifecycle, Crowdsourcing.
Abstract: This paper presents the basis of our approach for evaluation of application ontologies. Adapting an existing
task-based evaluation, this approach explains how crowdsourcing, involving application users, can
efficiently help in the improvement of an application ontology all along the ontology lifecycle. A real case
experiment on an application ontology designed for the semantic annotation of geobusiness user data
illustrates the proposal.
1 INTRODUCTION
Ontology development is becoming a common task,
which is nowadays not just a matter for ontologists.
In the literature many ontology development
methodologies have been proposed to help non-
experts build their own ontologies such as (Noy and
McGuinness, 2001), (Sure et al., 2002), (Sure et al.,
2009) and (Suarez-Figueroa et al., 2012). However,
according to (Neuhaus et al., 2013), currently, there
is no agreement on a methodology for development
of ontologies, and there is no consensus on how
ontologies should be evaluated.
As said by (Brank et al., 2005), ontology
evaluation is the problem of assessing a given
ontology from the point of view of a particular
criterion of application. It could help ontology
developers evaluating their results and possibly
guiding the construction process and any refinement
steps. According to (Vrandečić, 2009) this would
make them feel more confident about their results,
and thus encourage them to share their results with
the community and reuse the work of others for their
own purposes. Though, because of the lack of a
consensus, evaluation techniques and tools are not
widely utilized in the development of ontologies
(Neuhaus, et al., 2013). This can lead to ontologies
of poor quality and is an obstacle to the successful
deployment of ontologies as a technology.
In this paper we focus on the evaluation of
application ontologies. Application ontologies
describe the domain of specific applications (Malone
and Parkinson, 2010). They are built from scratch to
make it stick to applications specific requirements.
Consequently a pertinent evaluation consists in
assessing their effectiveness against the different
tasks they have to solve within the application for
which they have been built (Porzel and Malaka,
2004). This evaluation step is crucial to guide their
refinement. But in practice it is often skipped. It may
be due to the difficulty of distinguishing which part
of the application outputs really depends on the
ontology itself and not on the application. Also few
studies have addressed the evaluation of application
ontologies through their specific uses within the
application (Brewster et. al., 2004). Therefore we
attempt to promote the systematic effectiveness
evaluation of application ontologies by proposing a
simple adaptation of the task-based approach of
(Porzel and Malaka, 2004) using crowdsourcing all
along their lifecycle.
The rest of the paper is articulated in 4 sections.
The second section presents a state of the art on
application ontologies and ontology evaluation and
the related works. The third section presents our
proposal and an application experience. The fourth
section presents our conclusions.
Pittet, P. and Barthélémy, J..
Exploiting Users’ Feedbacks - Towards a Task-based Evaluation of Application Ontologies Throughout Their Lifecycle.
In Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2015) - Volume 2: KEOD, pages 263-268
ISBN: 978-989-758-158-8
Copyright
c
2015 by SCITEPRESS Science and Technology Publications, Lda. All rights reserved
263
2 BACKGROUND
This section introduces the related works about
application ontologies, ontology evaluation and the
deadlocks identified for the evaluation of application
ontologies.
2.1 Application Ontologies
According to (Malone and Parkinson, 2010), an
application ontology is an ontology engineered for a
specific use or application focus and whose scope is
specified through testable use cases. Application
ontologies usually reuse, derive or reference
recognized ontologies to construct ontological
classes and relationships between classes (Shaw et
al., 2008). According to (Guarino, 1998), this top-
down approach promotes the reuse of ontologies.
However, in practice, building reusable ontologies is
a costly process. Consequently a frequent alternative
consists in building application ontologies from
scratch and then generalizing them to domain and
task ontologies (bottom-up approach). For instance a
bottom-up approach called Goal-Oriented
Application Ontology Development Technique,
presented in (Santos et al., 2013), has been designed
to guide the development of application ontologies
from the explicit specification of their application
goals translated into rules and facts. Generally, both
top-down and bottom-up approaches require well
defining the tasks the application ontology has to
solve through the application. These tasks can be
multiple (conceptual similarity calculation,
disambiguation, knowledge extraction, semantic
annotation, etc.) but are generally closely related to
the application processes.
2.2 Ontology Evaluation
According to (Gómez-Pérez, 2001), ontologies, such
as any other resources used in software applications,
should be evaluated before (re)using it in other
ontologies or applications. Evaluation of the
ontology content, i.e. its concepts definitions, its
taxonomy and its axioms, as well as evaluation of
the software environment are therefore critical
before integrating them in final applications. Many
ontology evaluation methodologies have been
proposed since 1995, among those, the well-known
works of (Gómez-Pérez, 1995) and (Guarino and
Welty, 2002). In 2005, (Brank et al., 2005)’s survey
identified the main ontology evaluation approaches
types: those based on comparing the ontology to a
gold standard (Maedche and Staab, 2002), those
based on using the ontology in an application and
evaluating the results (Porzel and Malaka, 2004),
those involving comparisons with a source of data
about the domain to be covered by the ontology
(Brewster et al., 2004), and those where evaluation
is done by humans (Lozano-Tello and Gómez-Pérez,
2004). In addition, authors grouped these approaches
based on the level of evaluation: vocabulary level,
taxonomy level, semantic relations level, context or
application level, syntactic level, and structure,
architecture and design level evaluation. They are all
suited for the three first levels but only human-based
evaluation can cover the three other levels. Several
issues are still addressed today, such as the need to
have a detailed methodology to allow performing
evaluation throughout the entire ontology lifecycle
(Staab and Studer, 2013). Also evaluation of
application ontologies has been little addressed
(Malone and Parkinson, 2010).
2.3 Application Ontology Evaluation
Authors of (Malone and Parkinson, 2010) state that
application ontologies should be evaluated against a
set of use cases and competency questions, which
represent the scope and requirements of the
particular application. This approach, called
application-based evaluation, is founded on the fact
that the outputs of the application or its performance
on a given task might be better or worse depending
partly on the ontology used in it. According to
(Brank et al., 2005), one might argue that a good
ontology is one, which helps the application in
question produce good results on the given task.
However authors identify some drawbacks
concerning application-based evaluation approaches:
(1) as an ontology is good or bad considering a
particular task, it is difficult to generalize the
approach, (2) if the ontology is only a small
component of the application, its effect on the output
may be relatively small and indirect, (3) comparison
of different ontologies cannot be handled if they
cannot all be plugged in the same application. In
(Vrandečić, 2009) the author minimizes these
drawbacks by stating that ontologies are often tightly
interwoven with an application, and, that the user
never accesses an ontology directly but always
through this application. Therefore the application
often needs to be evaluated with the ontology,
regarding the ontology as merely another component
of the used tool. He adds that such a situation has the
advantage that well-known software evaluation
methods can be applied.
KEOD 2015 - 7th International Conference on Knowledge Engineering and Ontology Development
264
2.4 The Task-based Approach
One of the major works addressing application-
based ontology evaluation is the task-based
approach of (Porzel and Malaka, 2004). This
approach provides an evaluation scheme on three
basic ontological levels: vocabulary (concepts),
taxonomy and semantic relations. For these three
levels, the authors define three shortcomings the
evaluation results have to show: insertion, deletion
and substitution errors. Insertion errors indicate
superfluous concepts, isa- and semantic relations,
deletion errors indicate missing ones and
substitution errors indicate off-target ones. Then
given appropriate tasks and maximally independent
algorithms operating on the ontology in solving
these tasks, and, given a gold standard, this
approach allows calculating the error rates
corresponding to specific ontological shortcomings
at each ontology level. A gold standard is the set of
“perfect” outputs the task is expected to provide on
the corpus on which it is run. Therefore a gold
standard must be defined for each task as such as
the explicit definition of the tasks, the ontology, and
the application. A task is required to be sufficiently
complex to constitute a suitable benchmark for
examining a given ontology. The performance
results must substantially depend on the way
relations are modelled within the ontology. One
ontology is sufficient as it is evaluated in terms of its
performance on a given task within the application.
The application is defined as a specific algorithm
that uses the ontology to perform the task. However
the algorithm can have more or less influence in the
application output and it is sometimes difficult to
distinguish which of the algorithm or the ontology is
in cause. Therefore to obtain meaningful
performances results, the algorithm must be kept
constant within the ontology evaluation/revision
cycle. Once all these components are defined, the
evaluation can be launched. Then the revision phase
of the ontology simply consists in undertaking the
changes corresponding to the to the identified errors
before running another evaluation round.
In the next section, we propose an adaptation of
this approach taking into account user contributions
to improve the evaluation accuracy all along the
ontology use.
3 PROPOSAL
This section presents the basis of our proposal and
illustrates it through a real case experiment.
3.1 A Task-based Evaluation through
Ontology Lifecycle
Our approach consists in adapting the task-based
approach of (Porzel and Malaka, 2004) by
delegating the evaluation job to the users, using the
application as a crowdsourcing platform, instead of
using a gold standard. In (Porzel and Malaka, 2004),
the gold standard is actually produced by means of
annotators (trained by humans) agreeing on mutually
satisfactory solutions for the cases of disagreement.
Therefore its accuracy clearly depends on the
performance of these annotators. Besides, if the
application domain changes and the ontology needs
to evolve, a new gold standard has to be produced,
which is not convenient to evaluate an ontology
during its entire lifecycle. Instead, a crowdsourced
evaluation, in which users participate, can be
realized all along the application lifecycle.
Crowdsourcing is defined in (Hosseini et al., 2014)
as a business model, where tasks are accomplished
by a general public, called the crowd. This model
has
recently been promoted for the domain of
information systems analysis and design namely
through the involvement of users in evaluating the
software (Ali et al., 2012) (Pagano and Brügge,
2013). In our approach, we consider the application
users as the crowd, the evaluation procedures as the
crowdsourcing tasks and the application as the
crowdsourcing platform.
Our task-based evaluation methodology
comprises the 6 following steps:
1. Definition of the application used as a
crowdsourcing platform: distinction between
tasks driven by the ontology and other tasks,
2. Definition of the (consistent) ontology: domain
and scope, role within the application, the level
of contribution regarding each task it drives,
3. Definition of the tasks driven by the ontology.
4. For each task, choice of the ontology levels to
evaluate: vocabulary, taxonomy, semantic
relations.
5. For each ontological level, definition of the
corresponding error types: deletion, addition, and
substitution errors.
6. For each task to evaluate, definition of the
crowdsourcing task: explicit process each user
participating in the evaluation has to follow,
guidelines for the qualification of the tasks
outputs and their evaluation w.r.t. error types,
indications for potential revision. Within this
step users can contribute to the ontology revision
by proposing a change (ex: addition of a missing
Exploiting Users’ Feedbacks - Towards a Task-based Evaluation of Application Ontologies Throughout Their Lifecycle
265
concept, etc.). Therefore if a majority of users
agrees on a proposed revision, it can be
translated into ontological change and applied on
the ontology. Then a reasoner can assess the
ontology consistency after the application of the
change in order to decide to commit or rollback
it. Inversely if a consensus cannot be reached, the
different points of view can be taken into account
by applying each suggestion of change one by
one until the ontology is no more consistent.
We believe this approach has several advantages.
First, the ontology effectiveness is tested in the real
production environment by real users on real data.
Second, evaluation can be done during the
application use if the users are given the ability of
revising the tasks outputs; then a decision algorithm
can be implemented to assess the user suggestions
and translate them into ontological changes (Klein,
2004). Third as users point of view on the
application domain can change over time, they may
revise application outputs they used to consider
correct. In this case the ontology is able to evolve
and stay up-to-date with the application community
of users all along its lifecycle.
3.2 Experience of Semantic Annotation
Task Evaluation
Here we describe the application of this
methodology on a real case experiment.
1. The application considered is the SaaS geo-
business decision software, CD7Online, within
which users build maps by processing and
representing statistical data on geographical
basemaps stored in their workspaces.
2. The ontology considered describes the
CD7Online application domain including
descriptive statistics data files (tables of data),
geographical data files (base maps) and maps
projects (organization charts), but also the
platform related business processes and uses. It is
specified with the OWL DL language and
contains about 10000 axioms.
3. The ontology has been developed to drive a task
of semantic annotation, supporting a
recommender system suggesting users relevant
data and processes. This task allows extracting
metadata such as geographical level, year, theme,
statistical indicator, etc., from the user data
workspaces and to represent their relations with
user data within RDF annotations. Built on the
top of a triplestore containing these annotations a
visualization tool, available in the form of a
graphical 2D interactive graph (cf. Fig. 1) allows
users intuitively browsing their own workspaces,
navigating through the annotations, starting from
their "Home" (cf. “Mes cartes et mes données”
on Fig.1 step 1). We chose to use this existing
tool to establish and perform the evaluation
procedure.
4. The evaluation Level: Like in (Porzel and
Malaka, 2004) performance of the ontology can
be evaluated at the semantic relation level as
annotations are semantic relations between data
and metadata instantiated from the ontology
model.
5. Three semantic relation error types are defined.
A correct annotation corresponds to a correct
non-taxonomic relation between a data and a
metadata. An annotation with one of the
following errors is assessed incorrect: missing
annotation (deletion), superfluous annotation
(insertion), and wrong annotation (substitution).
6. The evaluation procedure given to the users is
articulated in 5 steps: (1) choosing a data to
search, (2) navigating in the graph according to
the metadata they consider the most related to
this data, (3) assessing the accuracy, (4) if not
accurate, identifying the error type and
eventually (5) proposing a revision. During the
process, if the users manage to find it within the
first direct path they take within the graph, the
corresponding annotation of this data is
considered accurate. If they need to use at least
another path to find it, it means the annotation
corresponding to the expected data is missing,
this is a deletion error. If they find the data but if
this data is only related to a wrong metadata, this
is a substitution error. And if they find an
unexpected metadata linked to the data in
addition to accurate ones, it is an insertion error.
Fig.1 describes an application example of the
process. The first step shows the default graph on
which the user can see the metadata extracted from
his workspace (cf. Fig.1 step 1). If the user considers
his data is related to population, he selects the
“th_population” category. Second, the graph morphs
to show the statistical indicators related to this
category (cf. Fig.1 step 2). Among these, the user
chooses the “Femme” indicator (i.e. “Woman”),
because the data he is looking for is related to a
population of women. Third, the graph morphs to
show the statistical data corresponding to this
indicator (cf. Fig.1 step 3). The user finds the data
“Femmes chômage” (i.e. “Unemployed women”),
and validates the annotation.
KEOD 2015 - 7th International Conference on Knowledge Engineering and Ontology Development
266
Figure 1: Data search test within the interactive semantic visualization of a working group.
3.3 Results
A first evaluation round, conducted on a sample of
ten voluntary users, showed a percentage of 35,2%
accurate annotations with statistical indicators
obtained on a set of 571 real data series (against
38,4% deletions, 26,5% substitutions, 0%
insertions). These results showed the ontology
needed to be enriched and refined to obtain better
outcomes. After the revision guided by these results,
the percentage of accurate annotations resulted from
a second evaluation conducted on a sample of ten
other users, reached 82,8% of good annotations.
4 CONCLUSIONS
This paper presented a simple task-based ontology
evaluation approach adapted from (Porzel and
Malaka, 2004)’s methodology. Dedicated to
application ontologies, it aims at facilitating
evaluation and revision of application ontologies
during their entire lifecycle, by delegating these
tasks to voluntary users given an explicit procedure.
Following (Staab and Studer, 2013)’s
recommendation, evaluation can be done all along
the ontology lifecycle and fit to the users’ evolving
Exploiting Users’ Feedbacks - Towards a Task-based Evaluation of Application Ontologies Throughout Their Lifecycle
267
points of view. Also first results of the experiment
on a semantic annotation task, conducted on small
samples of voluntary users, are promising. Today we
are working on extending this evaluation experience
to a true crowdsourcing one, by opening it to all
users of CD7 in order to assess the relevance of the
approach across the application lifecycle. The next
step will consist in automating the inclusion of users
revision suggestions by implementing a decision
algorithm and translating the results to ontological
changes. A future step would be to generalize this
approach on different types of tasks in order to
establish template procedures.
REFERENCES
Ali, R., Solis, C., Omoronyia, I., Salehie, M., & Nuseibeh,
B. (2012). Social adaptation: when software gives
users a voice.
Brank, J., Grobelnik, M., & Mladenic, D. (2005, October).
A survey of ontology evaluation techniques. In
Proceedings of the conference on data mining and
data warehouses (SiKDD 2005) (pp. 166-170).
Brewster, C., Alani, H., Dasmahapatra, S., & Wilks, Y.
(2004). Data driven ontology evaluation.
Gómez-Pérez, A. (1995, February). Some ideas and
examples to evaluate ontologies. In Artificial
Intelligence for Applications, 1995. Proceedings, 11th
Conference on (pp. 299-305). IEEE.
GómezPérez, A. (2001). Evaluation of ontologies.
International Journal of intelligent systems, 16(3),
391-409.
Guarino, N. (1998). Formal ontology in information
systems: Proceedings of the first international
conference (FOIS'98), June 6-8, Trento, Italy (Vol.
46). IOS press.
Guarino, N., & Welty, C. (2002). Evaluating ontological
decisions with OntoClean. Communications of the
ACM, 45(2), 61-65.
Hosseini, M., Phalp, K., Taylor, J., & Ali, R. (2014, May).
The four pillars of crowdsourcing: A reference model.
In Research Challenges in Information Science
(RCIS), 2014 IEEE Eighth International Conference
on (pp. 1-12). IEEE.
Klein, M. C. A. (2004). Change management for
distributed ontologies.
Lozano-Tello, A., & Gómez-Pérez, A. (2004). Ontometric:
A method to choose the appropriate ontology. Journal
of Database Management, 2(15), 1-18.
Maedche, A., & Staab, S. (2002). Measuring similarity
between ontologies. In Knowledge engineering and
knowledge management: Ontologies and the semantic
web (pp. 251-263). Springer Berlin Heidelberg.
Malone, J., & Parkinson, H. (2010) Reference and
Application Ontologies. Ontogenesis. http://
ontogenesis.knowledgeblog.org/295.
Neuhaus, F., Vizedom, A., Baclawski, K., Bennett, M.,
Dean, M., Denny, M., & Yim, P. (2013). Towards
ontology evaluation across the life cycle. Applied
Ontology, 8(3), 179-194.
Noy, N. F., & McGuinness, D. L. (2001). Ontology
development 101: A guide to creating your first
ontology.
Pagano, D., & Brügge, B. (2013, May). User involvement
in software evolution practice: a case study. In
Proceedings of the 2013 international conference on
Software engineering (pp. 953-962). IEEE Press.
Porzel, R., & Malaka, R. (2004, August). A task-based
approach for ontology evaluation. In ECAI Workshop
on Ontology Learning and Population, Valencia,
Spain.
Santos, L. E., Girardi, R., & Novais, P. (2013, April). A
Case Study on the Construction of Application
Ontologies. In Information Technology: New
Generations (ITNG), 2013 Tenth International
Conference on (pp. 619-624). IEEE.
Shaw, M., Detwiler, L. T., Brinkley, J. F., & Suciu, D.
(2008). Generating application ontologies from
reference ontologies. In AMIA Annual Symposium
Proceedings (Vol. 2008, p. 672). American Medical
Informatics Association.
Staab, S., & Studer, R. (Eds.). (2013). Handbook on
ontologies. Springer Science & Business Media.
Suarez-Figueroa, M. C., Gomez-Perez, A., & Fernandez-
Lopez, M. (2012). The NeOn methodology for
ontology engineering. In Ontology engineering in a
networked world (pp. 9-34). Springer Berlin
Heidelberg.
Sure, Y., Erdmann, M., Angele, J., Staab, S., Studer, R., &
Wenke, D. (2002). OntoEdit: Collaborative ontology
development for the semantic web (pp. 221-235).
Springer Berlin Heidelberg.
Sure, Y., Staab, S., & Studer, R. (2009). Ontology
engineering methodology. In Handbook on ontologies
(pp. 135-152). Springer Berlin Heidelberg.
Vrandečić, D. (2009). Ontology evaluation (pp. 293-313).
Springer Berlin Heidelberg.
KEOD 2015 - 7th International Conference on Knowledge Engineering and Ontology Development
268