proportion of terms that overlap between the on-
tology and the corpus. This is a particularly well-
suited measure for evaluating ontology learning
algorithms. Our methods expand this measure-
ment approach to cover term relations through
both the violation and volatility measures.
This collection of evaluation paradigms and con-
textual backdrops allows us finally to consider the
type of information content being evaluated. A
“computational ontology”, such as the InPhO, is a
formally-encoded specification of the concepts and a
collection of directed taxonomic and non-taxonomic
relations between them (Buckner et al., 2010; Gruber,
1995; Noy and McGuinness, 2001). When evaluat-
ing information content, we must be careful to delin-
eate those which are node-centric (focusing on con-
cepts) from those which are edge-centric (focusing
on relations). Many authors (Maedche and Staab,
2002; Guarino and Welty, 2004; Brewster et al., 2004;
G
´
omez-P
´
erez, 1999; Velardi et al., 2005) focus upon
node-centric techniques, asking “Are the terms speci-
fied representative of the domain?” These investigate
the lexical content of an ontology. However, the se-
mantic content of an ontology is not defined solely
by the collection of terms within it, but rather by the
relations of these terms. Maedche & Staab (2002)
take this initial lexical evaluation and extend it to an
edge-based approach which measures the number of
shared edges in two taxonomies. The proposed viola-
tion and volatility scores (Section 4) are novel edge-
based measures which address the semantic content of
an ontology by comparing them to statistics derived
from a relevant corpus as a proxy for domain knowl-
edge. Additionally, these scores can provide insight
to the ontology design process by showing the con-
troversy of domain content and convergence towards
a relatively stable structure over time.
3 OUR DYNAMIC ONTOLOGY
A wide variety of projects can benefit from the de-
velopment of a computational ontology of some sub-
ject domain. Ontology science has evolved in large
part to suit the needs of large projects in medicine,
business, and the natural sciences. These domains
share a cluster of features: the underlying structures
of these domains have a relatively stable consensus,
projects are amply funded, and a primary goal is of-
ten to render interoperable large bodies of data. In
these projects, the best practices often require hir-
ing so-called “double experts” – knowledge modelers
highly trained in both ontology design and the sub-
ject domains – to produce a representation in the early
stages of a project which is optimally comprehensive
and technically precise.
There is another cluster of applications, however,
for which these practices are not ideal. These involve
projects with principles of open-access and domains
without the ample funding of the natural sciences.
Additionally, ontologies for domains in which our
structural understanding is controversial or constantly
evolving, and projects which utilize computational
ontologies to enhance search or navigation through
asynchronously updated digital resources must ac-
count for the dynamic nature of their resources –
whether it is in the underlying corpus or in the judg-
ments of the experts providing feedback on domain
structure. On the positive side, these areas often have
more opportunities to collect feedback from users
who are domain experts but lack expertise in ontol-
ogy design.
For the latter type of project we have recom-
mended an approach to design which we call dynamic
ontology. While a project in the former group prop-
erly focuses the bulk of its design effort on the pro-
duction of a single, optimally correct domain repre-
sentation, the latter cluster is better served by treating
the domain representation as tentative and disposable,
and directing its design efforts towards automating as
much of the design process as possible. Dynamic on-
tology, broadly speaking, tries to take advantage of
many data sources to iteratively derive the most useful
domain representation obtainable at the current time.
Two primary sources of data are domain experts and
text corpora. Domain experts provide abstract infor-
mation about presently-held assumptions and emer-
gent trends within a field from a source, namely their
own ideas, that is hard to examine directly. Text cor-
pora make it possible to quantify what is meant by
“domain” by providing a concrete encoding of the se-
mantic space that is available for empirical analysis,
in contrast to the ill-defined abstraction of “the do-
main is what the experts conceive of it as”. From both
kinds of sources many types of data may be gathered:
statistical relationships among terms, feedback from
domain experts, user search and navigation traces, ex-
isting metadata relationships (e.g. cross-references or
citations), and so on. As more data become available
and our understanding of the subject domain contin-
ues to evolve, the domain representation will be be
dynamically extended, edited, and improved.
In dynamic ontology, problems of validation loom
especially large due to the combination of heteroge-
nous data sources. Each step in the design process
presents modelers with a panoply of choices for in-
consistency mitigation – e.g., which sources of data
to favor over others, how to settle feedback disagree-
KEOD 2010 - International Conference on Knowledge Engineering and Ontology Development
112