EVALUATING ONTOLOGIES WITH RUDIFY

Amanda Hicks and Axel Herold

Berlin-Brandenburgische Akademie der Wissenschaften, J

agerstr. 22/23, 10117 Berlin, Germany

Keywords:

Ontology, Ontology development, Ontology evaluation, Rigidity, Type, Role, Wordnet.

Abstract:

In this paper we present Rudify, a set of tools designed for the semi-automatic evaluation of ontological

meta-properties based on lexical realizations of these meta-properties in natural language. We describe the

development of Rudify, provide an evaluation of initial output, and describe how this output can be used in

conjunction with OntoClean (Guarino and Welty, 2002) to produce clean ontological hierarchies. In particular

we show how a Rudify evaluation of concepts for the meta-property of rigidity can facilitate modelling types

and roles.

1 INTRODUCTION

Developing an ontology requires paying especial at-

tention to the hierarchical relations. In particu-

lar, taking into consideration certain meta-properties

of the concepts modelled in the ontology can help

the developer avoid formal contradiction and un-

sound inheritance of properties (Guarino and Welty,

2004). However, manually determining ontological

meta-properties of concepts within large ontologies

is time consuming and has been shown to produce

a low level of agreement amongst human annotators

olker et al., 2005). A further difﬁculty around the

annotation of meta-properties is that evaluating the

meta-properties of concepts can be difﬁcult for non-

ontologists while evaluating technical concepts from

a speciﬁc domain may be difﬁcult for ontologists who

are not trained in this domain.

In this paper we present Rudify, a set of tools

for the semi-automatic determination of ontological

meta-properties. Rudify has been used for ontology

development within the Kyoto project (Herold et al.,

2009a; Herold et al., 2009b).

Section 2 of this paper provides an overview of the

Kyoto project with particular emphasis on the role of

the ontology. Section 3 contains a brief description

of OntoClean, a method for evaluating hierarchical

relations in an ontology (Guarino and Welty, 2002).

Section 4 discusses the meta-property of rigidity and

its relation to the type–role distinction. Section 5 dis-

cusses the development of Rudify. In section 6 the

notion of base concepts is brieﬂy introduced. A set

of base concepts was used for the evaluation of the

Rudify output (section 7). Finally, section 8 provides

speciﬁc examples of how Rudify output can be used

to “clean up” hierarchical relations within an ontol-

ogy.

2 THE KYOTO PROJECT

The Kyoto project is a content enabling system that

performs deep semantic analysis and searches and

that models and shares knowledge across different do-

mains and different language communities. Seman-

tic processors are used for concept and data extrac-

tion, and the resulting knowledge can be used across

the different linguistic communities. A wiki environ-

ment allows domain specialists to maintain the sys-

tem. Kyoto is currently being targeted toward the en-

vironmental domain and will initially accommodate

seven languages, namely, English, Dutch, Spanish,

Italian, Basque, Chinese, and Japanese. The system

depends on an ontology that has been linked to lex-

ical databases (wordnets) for these languages. The

role of the ontology is to provide a coherent, stable

and uniﬁed frame of reference for the interpretation

of concepts used in automatic inference. For more

information on the Kyoto project see (Vossen et al.,

2008) and http://www.kyoto-project.eu/.

Kyoto should be able to accommodate, not only a

variety of languages and domains of knowledge, but

also changes in scientiﬁc theories as both the world

and our knowledge of the world change. We, there-

fore, require an ontology that is not idiosyncratic, but

Hicks A. and Herold A. (2009).

EVALUATING ONTOLOGIES WITH RUDIFY.

In Proceedings of the International Conference on Knowledge Engineering and Ontology Development, pages 5-12

DOI: 10.5220/0002275800050012

 SciTePress

rather one that can accommodate

1. a variety of languages and their wordnets,

2. a variety of scientiﬁc domains,

3. a variety of research communities,

4. future research in these domains, and

5. can serve as the basis of sound, formal reasoning.

Because the end users will be able to maintain and

extend the ontology, it is crucial that the ontology is

extended in a clean and consistent manner by non-

ontology experts.

With this aim in mind we have developed Ru-

dify. We are using Rudify in conjunction with Onto-

Clean in order to build and maintain a clean ontology.

By evaluating the ontological meta-properties of con-

cepts, Rudify facilitates a major step in the construc-

tion and maintenance of clean hierarchies.

3 OntoClean

OntoClean (Guarino and Welty, 2002) is a method for

evaluating ontological taxonomies. It is based on on-

tological meta-properties of the concepts that appear

in the ontological hierarchy. These meta-properties –

namely, rigidity, unity, identity, and dependence – are

both highly general and based on philosophical no-

tions. Although OntoClean uses meta-properties to

evaluate ontological taxonomies, it is not intended

to provide a way of determining the meta-properties

themselves. Instead it shows the logical consequences

of the users modelling choices, most notably on-

tological errors that may result in taxonomies af-

ter modelling choices have been made (Guarino and

Welty, 2004). Rudify helps ﬁll this gap by semi-

automatically assigning meta-properties to concepts

based on how the concepts are expressed in natural

language.

Of the four types of ontological meta-properties

used by OntoClean, we focus on rigidity. There are

several reasons for this choice. First – and most im-

portant in the context of the Kyoto project, the no-

tion of rigidity plays a large role in the distinction be-

tween types and roles, since every type is a rigid con-

cept and every role is a non-rigid concept. Second, it

is relatively easy to ﬁnd lexical patterns for rigidity.

The lexical patterns are a crucial prerequisite for the

programmatical determination of meta-properties as

done by Rudify (see section 5). Third, AEON (V

olker

et al., 2008) also concentrated on rigidity, so there is

a basis of comparison of data.

4 RIGIDITY

The notion of rigidity relies on the philosophical no-

tion of essence. An essential concept is one that nec-

essarily holds for all of its instances. For example,

being an animal is essential to being a cat since it is

impossible for a cat to not be an animal, while being

a pet is not essential because any cat can, in theory,

roam the streets and, thereby, not be a pet. The idea

of essence contains an idea of permanence; Fluffy the

cat is an animal for the entire duration of his life.

However, the notion of essence is stronger than per-

manence. While Fluffy can be a pet for his entire life,

it nevertheless remains possible for him to cease be-

ing a pet.

Armed with the notion of essence, we can now

deﬁne rigidity. A rigid concept is a concept that is es-

sential to all of its possible instances, i. e., every thing

that could be a cat is in fact a cat. Therefore, “cat” is a

rigid concept. However, “pet” is a non-rigid concept

since there are individual pets that do not have to be a

pet.

Non-rigidity further subdivides into two meta-

properties: semi-rigidity and anti-rigidity. Those con-

cepts that are essential to some, but not all, of their

instances are semi-rigid, while those that are not es-

sential to any of their instances are anti-rigid. We do

not focus on this distinction in our work although Ru-

dify can be used to evaluate these meta-properties as

well.

Roles vs. Types

We are currently using Rudify to develop a central on-

tology, to separate type- and role-hierarchies in On-

toWordNet (Gangemi et al., 2003), and also to help

the end user keep type- and role-hierarchies in the do-

main ontology separate. This section provides a dis-

cussion of the relation between rigidity and type–role

hierarchies.

Types and roles are the two main subdivisions of

sortal concepts. A sortal concept is a concept that

describes what sort of thing an entity is. For ex-

ample “cat,” “hurricane,” and “milk” are sortal con-

cepts while “red,” “heavy,” and “singing” are not. In

an ontology, sortal concepts are those concepts that

carry the meta-property identity (for a discussion of

identity, see (Guarino and Welty, 2004)). Further-

more, sortals usually correspond to nouns in natural

language. We work on the assumption that the con-

cepts represented in the noun hierarchy of WordNet

(Fellbaum, 1998, see also section 6) are sortal terms,

since this is generally the case. Types are rigid sor-

tals, while non-rigid sortals are generally roles. Fur-

thermore, roles cannot subsume types.

KEOD 2009 - International Conference on Knowledge Engineering and Ontology Development

In order to see that roles should not subsume

types, we can consider the following (erroneous) hi-

erarchy:

animal

pet

cat

According to this hierarchy, if Fluffy ceases to be a

pet, then Fluffy also ceases to be a cat, which is im-

possible.

From this last point in conjunction with the above

assumption that nouns usually represent sortals, it fol-

lows from the OntoClean principles that amongst sor-

tal terms, non-rigid sortals should not subsume rigid

sortals. In other words, non-rigid nouns generally

should not subsume rigid nouns. There are excep-

tions to this rule. However, this general conclusion

allows us to evaluate concepts only for rigidity and

non-rigidity, which in turns saves us the computation-

ally expensive task of evaluating non-rigid terms as

either semi- or anti-rigid over large sets of concepts.

5 RUDIFY DEVELOPMENT

The general idea behind Rudify is the assumption that

a preferred set of linguistic expressions is used when

talking about ontological meta-properties. Thus, one

can deduce a concept’s meta-properties from the us-

age of the concept’s lexical representation (LR) in

natural language. This idea has been developed and

programmatically exploited ﬁrst in the AEON (Au-

tomatic Evaluation of ONtologies) project (V

olker

et al., 2008). AEON was developed for the auto-

matic tagging of existing ontologies in terms of Onto-

Clean meta-properties. The Kyoto project decided to

rewrite the software based on the principles published

by (V

olker et al., 2005) for several reasons: there

was no active development of the tool any more and

the software was released as a development snapshot

only, the web service interface had to be changed due

to the maintenance stop of the originally implemented

one by Google, and a more ﬂexible input facility was

needed instead of the purely OWL based one.

In the following technical description and discus-

sion of Rudify we focus on the meta-property of rigid-

ity as this has been the most important property in the

context of the Kyoto project so far.

The ﬁrst step in the Rudify process is the identi-

ﬁcation of adequate LRs for the concepts that are to

be tagged. Due to polysemeous word forms there is

no one to one mapping between concepts and LRs.

Also, the actual number of recorded senses for a given

LR may vary across lexical databases and across ver-

sions of a speciﬁc lexical database. The results re-

ported here are based on the English WordNet (Fell-

baum, 1998) version 3.0. A further complication are

concepts that do not have LRs at all. Typically, this

applies mostly for concepts of the top levels of ontolo-

gies, although there are some (rare) examples like the

missing English antonym for “thirsty” meaning “not

thirsty” which constitutes a lexical gap.

A set of linguistic patterns that represent positive

or negative evidence for a single meta-property needs

to be developed. Each pattern speciﬁes a ﬁxed se-

quence of word forms. For little inﬂecting languages

like English with relatively ﬁxed word order this ap-

proach works reasonably well. Further reﬁnement of

the patterns will be needed for languages with more

free word ordering. For rigidity, we found only pat-

terns representing evidence against rigidity. Thus, the

default assumption when tagging for rigidity is that

rigidity applies. A concept C is considered non-rigid

only if enough evidence against rigidity has been col-

lected for C. Obviously, sparse data for occurrences

the LR for C will distort the results and produce a

skew in the direction of rigidity.

For rigidity, a typical pattern reads “would make

a good X” where X is a slot for a concept’s LR. This

may be a single token, a multiword or even a com-

plex syntactic phrase (as is frequently the case in Ro-

mance languages). Over-generation of patterns is pre-

vented by enumerating and excluding extended pat-

terns. The non-rigid pattern “is no longer (—/a/an) X”

over-generates phrases like “there is no longer a cat

(in the yard/that could catch mice/. . . )” from which

we cannot deduce non-rigidity for “cat.”

Another frequent over-generation is found for LRs

that occur as part of a more complex compound noun

as in “is no longer an animal shelter” where animal

is not an instance of the concept “animal.” As the re-

sults returned from web search engine are often mere

fragments of sentences such instances can only be ex-

cluded based on part-of-speech tagging and not based

on (chunk) parsing.

Learning to Detect Rigidity

Rudify currently uses 25 different patterns as evi-

dence against rigidity. The results of the web search

queries based on these patterns form a feature vector

for each LR that is then used for classiﬁcation, i. e.

the mapping from the feature vector to the appropri-

ate rigidity tag. Technically this is a ternary decision

between rigid, non-rigid and uncertain.

All classiﬁers were trained on a hand crafted and

hand tagged list of 100 prototypical LRs of which 50

EVALUATING ONTOLOGIES WITH RUDIFY

denote rigid concepts and 50 denote non-rigid con-

cepts. They cover a broad range of domains and are

recorded as monosemeous (having a single sense) in

WordNet.

Four different algorithms have been used for clas-

siﬁcation:

• decision tree (J48, an implementation ov C4.5)

• multinomial logistic regression

• nearest neighbor with generalization (NNge)

• locally weighted learning, instance based

In evaluating the output we considered the results of

all four classiﬁers and ranked the results according the

degree of consensus amongst them (see section 6 for

more details).

Both Rudify and AEON rely on the World Wide

Web as indexed by Google as the hugest repository of

utterances that is accessible to the research commu-

nity. This is done in order to minimize sparse data

effects. We are aware of the theoretical implications

that data extracted from Google or other commercial

web search engines entails. The most crucial prob-

lems are:

• Results are unstable over time. The indexing pro-

cess is rerun regularly and results retrieved at any

given point in time may not be exactly repro-

ducible later.

• The query syntax may be unstable over time and

implements boolean searches rather than linguis-

tic searches.

• There are arbitrary limitations of the maximum

number of results returned and of the meta-data

associated with each result. These may also

change over time.

• The data repository is in principle uncontrolled as

write access to the World Wide Web and other

parts of the Internet is largely unrestricted. Com-

mercial search engines work as additional ﬁlters

on the raw data with their ﬁlter policy often left

undocumented and subject to changes as well.

From a linguist’s point of view, the ﬁrst three of these

problems are discussed in more detail by (Kilgarriff,

2007).

Rudify now is a highly conﬁgurable modular tool

with parameter sets developed for English and Dutch.

Work is under way for the development of parame-

ter sets for the remaining European languages of the

Kyoto consortium (Italian, Spanish, Basque). The

software is written in Python and NLTK (Bird et al.,

2009) is used as the linguistic backend. Classiﬁer cre-

ation, training and application is done using Weka 3

(Witten and Frank, 2005), but can be easily delegated

to any software suite capable of manipulating ARFF

ﬁles. Rudify will be released as free and open source

software.

6 BASE CONCEPTS

(Rosch et al., 1976) empirically showed the presence

of basic level concepts (BLC) in human cognition. In

a conceptual taxonomy, for each concept C its subor-

dinate concepts C

are typically more speciﬁc than C.

The increase in speciﬁcity is due to at least one added

feature for C

that is compatible with C but allows for

discrimination between all C

. BLCs mark the border

between the most general concepts comprising only

few features and the most feature rich concepts.

Base concepts (BC) are described by (Izquierdo

et al., 2007) as those concepts within a semantically

structured lexical data base that “play the most im-

portant role” in that data base. This intuitive but

vague notion is effectively a rephrase of the BLC.

BCs, though, are conceived as a purely computation-

ally derived set based on semantic relations encoded

in hierarchical lexical databases. BCs are those con-

cepts that are returned by the following algorithm:

for each path p from a leaf node (a node with no

hyponym relation to other nodes) up to a root node

(a node with no hypernym relation to other nodes)

choose the ﬁrst node C with a local maximum of

speciﬁc relations to other nodes as a BC. This al-

gorithm can be adapted by deﬁning the set of spe-

ciﬁc relations (e. g. only hyponymy, all encoded re-

lations including lexical relations) and by deﬁning a

minimally required number of subsumed concepts a

possible BC must contain. BC sets depend from the

speciﬁed parameters and the hierarchical structure of

the lexical database. Thus, different sets are com-

puted for different versions of WordNet and for other

national wordnets. Software and data for comput-

ing BCs from WordNet are freely available online at

http://adimen.si.ehu.es/web/BLC.

WordNet (Fellbaum, 1998) is an electronic lex-

ical database for English. It is organizes words

in terms of semantic relations including synonymy

(“car”–“automobile”), hyponymy (the relation among

general and speciﬁc concepts, like “animal” and

“cat,” that results in hierarchical structures), and

meronymy (the part-whole relation, as between “cat”

and “claw”). Linking words via such relations results

in a huge semantic network.

We have added a set of BCs to the middle level

of the Kyoto ontology thereby providing the ontol-

ogy with a generic set of concepts that can be used

for inter-wordnet mappings and wordnet to ontology

KEOD 2009 - International Conference on Knowledge Engineering and Ontology Development

mappings.

Rudify was evaluated on the set of BCs derived

from WordNet 3.0 considering only hypernym rela-

tions and with a minimum of 50 subsumed concepts

for each BC (BC-50). These parameters result in a

set of 297 concepts. Inspecting the BC-50 set we

found LRs that are highly unlikely BLCs though they

fulﬁll the formal criteria for BCs. A striking exam-

ple is “moth.” In WordNet, much effort was spent

to record a high number of different insects as dis-

tinguished concepts thus effectively shifting the basic

level downwards in the taxonomic tree. (Tanaka and

Taylor, 1991) report on a similar effect of basic level

shifts for BLCs that can be shown for experts in their

respective domain.

7 EVALUATION OF OUTPUT

We tested Rudify on four different English language

data sets:

• 50 region terms (handcrafted by environmental

domain specialist)

• 236 Latin species names (selected by environmen-

tal domain specialist)

• 201 common species names (selected by environ-

mental domain specialist)

• 297 basic level concepts (BLC-50)

7.1 Domain Speciﬁc Terms

Classiﬁers correctly classiﬁed all region terms and all

Latin species names as rigid concepts. This holds also

for the common English species names with three ex-

ceptions: “wildcat” was misclassiﬁed as denoting a

non-rigid concept by all four classiﬁers and “wolf”

and “apollo” (a butterﬂy) were mis-classiﬁed by all

classiﬁers except NNge. This mis-classiﬁcation is due

to the fact that those LRs are not monosemeously de-

noting a single concept (a species) but are polysemous

and also frequently used in ﬁgurative language (exam-

ples are taken from our log ﬁles):

• “Mount Si High School teacher Kit McCormick

is no longer a Wildcat.” (generalization from a

school mascot to a school member)

• “Also the 400 CORBON is no longer a wildcat.”

(a handgun)

• “He nearly gave in and became a Wildcat before

ﬁnally deciding to honor his original commitment

to the Ducks.” (a football team’s (nick)name)

• “For example, the dog is no longer a wolf, and

is now a whole seperate species.” (example dis-

cusses changing relations between concepts over

time)

• “For four years, the space agency had been plan-

ning, deﬁning, or defending some facet of what

led up to and became Apollo.”

(a space mission’s name)

• “Others ﬁguring prominently in the county’s his-

tory were Edward Warren, who established a trad-

ing post near what is now Apollo [. . . ]”

(a geographical name)

• “The patron of the city is now Apollo, god of light,

[. . . ]”

(a Greek deity)

7.2 BLC-50

We classify the Rudify output on the BC-50 set ac-

cording to the agreement amongst the four classiﬁers

used. We refer to those cases in which all four clas-

siﬁers reached agreement as decisive. Rudify yielded

decisive output for 215 BCs. Whenever there is dis-

agreement amongst the classiﬁers, we refer to this

output as difﬁcult. There are 82 difﬁcult cases that

subdivide into two further cases. When three out of

four classiﬁers reached agreement, we refer to this

output as indecisive. Rudify yielded indecisive out-

put for 56 BCs. When two classiﬁers evaluate a term

as rigid and two as non-rigid, we refer to this as un-

decided. Rudify is undecided with respect to 26 BCs.

These ﬁgures are summarized in table 1.

Table 1: General overview of the classiﬁcation on the BC-

50 set.

Rudify output number of cases

decisive 215

difﬁcult 82

difﬁcult: indecisive 56

difﬁcult: undecided 26

An evaluation of Rudify output for the 215 deci-

sive cases indicates that Rudify produces a high level

of accuracy for decisive cases (see table 2). 85 % of

the terms evaluated as rigid were correctly evaluated,

and 75 % of the terms evaluated as non-rigid are cor-

rectly evaluated. Many of the Rudify errors either

came from high level concepts, e. g., “artifact” and

“unit of measurement,” which are ordinarily dealt

with manually, or else they dealt with polysemous

words, which was an anticipated difﬁculty (see sec-

tion 5).

EVALUATING ONTOLOGIES WITH RUDIFY

In 3 % of the decisive output we used Rudify to de-

termine whether a concept is rigid or non-rigid, e. g.

for “furniture.” Since not every concept is ontologi-

cally clear cut, and since some concepts lie within ar-

eas of ontology in which the alternative theories have

not yet been properly worked out (e. g., the ontol-

ogy of artefacts), we have determined that Rudify can

be occasionally helpful in making modelling choices

based on the common sense uses of the concepts in

language. For these cases the evaluation remains un-

clear.

For 56 concepts Rudify yielded indecisive output.

Exactly 50 % of these cases are incorrect (28 out of

56). For this reason we do not regard the indecisive

output to be usable data.

The decisive Rudify output on the BCs within the

OWN hierarchy yields ﬁve OntoClean errors, if we

count the hypernyms, and 22 errors if we count in-

stances of hypernym relations. This is based only on

the Rudify output prior to evaluating the correctness

of this output, but it gives us an idea of the OntoClean

results if we uncritically use Rudify to evaluate con-

cepts in the ontology (for more details, see (Herold

et al., 2009b)). In short, Rudify output coupled with

the OntoClean methodology provides a useful tool for

drawing attention to problems in the backbone hierar-

chy.

In summary, our evaluation of Rudify output on

BCs is that Rudify is successful with respect to the

decisive output. It produces decisive output with a rel-

atively high degree of accuracy (83 %) and an overall

accuracy on the BC-50 set of 69 % (table 3). Further-

more, Rudify has also proven useful in deciding how

to model a few concepts.

Table 2: Overview of the decisively classiﬁed BC-50 con-

cepts (215 concepts).

class evaluation number of cases

rigid incorrect 20 (12 %)

correct 142 (85 %)

unclear 5 (3 %)

non-rigid incorrect 12 (25 %)

correct 36 (75 %)

Table 3: Summary of evaluation.

classiﬁcation number of cases

correct 206 (69 %)

incorrect 60 (20 %)

undecided 26 (9 %)

decision left to Rudify 5 (2 %)

8 APPLICATION OF OUTPUT

In this section we illustrate with two examples how

Rudify results can be used to inform ontology design.

The ﬁrst example uses Rudify independently, the sec-

ond uses Rudify in conjunction with OntoClean prin-

ciples.

Example 1

We consider BCs that can reasonably be considered

amouts of matter. Amounts of matter are generally re-

ferred to by mass nouns; ‘milk,’ ‘mud,’ and ‘beer’ are

a few examples. Once again we begin by provision-

ally modelling the concepts taken from WordNet as

the upper level concept “amount-of-matter” into the

following hierarchy, which includes rigidity assign-

ments from Rudify. R

indicates a rigid concept, R

−

indicates a non-rigid concept.

amount of matter

drug (R

−

)

antibiotic (R

)

chemical compound (R

)

oil (R

)

nutriment (R−)

Using the Rudify data, we can clean up this hierar-

chy. First we notice that Rudify has evaluated “nutri-

ment” as non-rigid. This indicates that it is probably

a role rather than a type. In order to verify this, we re-

fer to the deﬁnition taken from WordNet: “a source of

materials to nourish the body.” That is, the milk in my

refrigerator is a nutriment only if it nourishes a body.

If you bathe in milk, like Cleopatra, it is a cosmetic.

“Nutriment,” therefore, is a role that milk can play, so

it does not belong in the type hierarchy. We therefore,

move it to the role hierarchy as subclass of “amount of

matter role.” We pause to notice that in this case, the

decision was made using Rudify results and human

veriﬁcation of the output. This case does not invoke

OntoClean, i. e., there would be no OntoClean errors

if “nutriment” were subsumed by “amount of matter.”

This contrasts with the second example, which yields

a formal error within the hierarchy itself.

Example 2

Notice that Rudify evaluates “drug” as non-rigid, and

“antibiotic” as rigid. However, the current hierarchy

subsumes the rigid concept under the non-rigid con-

cept. This results in a formal error in the hierarchy.

Because “drug” and “antibiotic” are both sortal terms,

this means a role subsumes a type, which, as we have

seen above leads to inconsistency. Consider the an-

tibiotic penicillin. Penicillin is only a drug if it is ad-

KEOD 2009 - International Conference on Knowledge Engineering and Ontology Development

ministered to a patient, but it is always an antibiotic

due to its molecular structure. By subsuming “an-

tibiotic” under “drug,” the ontology erroneously states

that if some amount of penicillin is not administered

to a patient, then it is not an antibiotic. The solution

then, is to move “drug” out of the type hierarchy and

into the role hierarchy. “Drug” then becomes a “sub-

stance role,” and an antibiotic is subclass of “amount

of matter” that can play the role “drug.”

Because “chemical compound” and “oil” are both

evaluated as rigid we do not need make any changes

to this part of the ontology.

The result is the following hierarchy fragments

under “amount of matter” and “amount of matter

role.”

amount of matter

antibiotic

chemical compound

oil

mount of matter role

drug

nutriment

9 CONLUSIONS

We presented Rudify – a system for automatically de-

riving ontological meta-properties from large collec-

tions of text based on the lexical representation of in-

dividual concepts in natural language. This approach

yields valueable results for use in consistency check-

ing of general large scale ontologies such as the Kyoto

core ontology. On the basis of 297 basic concepts

derived from the English WordNet 69 % agreement

with human judgement could be demonstrated. This

closely matches the ﬁgures reported by (V

olker et al.,

2008) for human inter-annotater agreement. For spe-

cialized domain terms, agreement was substantially

higher: only 3 out of 201 English species terms had

been mis-classiﬁed.

The evaluation of the results reported here shows

potential for further improvement. Word sense disam-

biguation will increase the accuracy for polysemeous

words. First experiments involving hypernyms of LRs

in the retrieval of evidence for or against ontological

meta-properties give already promising results.

For future reference and stability of the results it

will be beneﬁcial to use a controlled linguistic corpus

of appropriate size instead of commercial web search

engines.

ACKNOWLEDGEMENTS

The development of Rudify and its application to the

Kyoto core ontology has been carried out in the EU’s

7th framework project Knowledge Yielding Ontolo-

gies for Transition-based Organizations (Kyoto, grant

agreement no. 211423).

The authors would like to thank Christiane Fell-

baum for many fruitful discussions and the Kyoto

members for their kind collaboration.

REFERENCES

Bird, S., Klein, E., and Loper, E. (2009). Natural Language

Processing with Python. O’Reilly.

Fellbaum, C., editor (1998). WordNet: An Electronic Lexi-

cal Database. The MIT Press.

Gangemi, A., Guarino, N., Masolo, C., and Oltramari, A.

(2003). Sweetening wordnet with dolce. AI Magazine,

24(3):13–24.

Guarino, N. and Welty, C. (2002). Evaluating ontologi-

cal decisions with ontoclean. Communications of the

ACM, 45(2):61–65.

Guarino, N. and Welty, C. (2004). An overview of onto-

clean. In Staab, S. and Studer, R., editors, Handbook

on Ontologies, pages 151–172.

Herold, A., Hicks, A., and Rigau, G. (2009a). Central on-

tology version 1. Kyoto project deliverable D6.2.

Herold, A., Hicks, A., Segers, R., Vossen, P., G. Rigau, G.,

Agirre, E., Laparra, E., Monachini, M., Toral, A., and

Soria, C. (2009b). Wordnets mapped to central ontol-

ogy version 1. Kyoto project deliverable D6.3.

Izquierdo, R., Su

arez, A., and Rigau, G. (2007). Ex-

ploring the automatic selection of basic level con-

cepts. In Proceedings of the International Conference

on Recent Advances on Natural Language Processing

(RANLP’07), Borovetz, Bulgaria.

Kilgarriff, A. (2007). Googleology is bad science. Compu-

tational Linguistics, 33:147–151.

Rosch, E., Mervis, C. B., Gray, W. D., Johnson, D. M., and

Boyes-Braem, P. (1976). Basic objects in natural cat-

egories. Cognitve Psychology, 8:382–439.

Tanaka, J. W. and Taylor, M. (1991). Object categories and

expertise: Is the basic level in the eye of the beholder?

Cognitve Psychology, 23:457–482.

olker, J., Vrandecic, D., and Sure, Y. (2005). Auto-

matic evaluation of ontologies (AEON). In Proceed-

ings of the 4th International Semantic Web Conference

(ISWC2005), volume 3729 of LNCS, pages 716–731,

Berlin/Heidelberg. Springer.

olker, J., Vrandecic, D., Sure, Y., and Hotho, A. (2008).

AEON — an approach to the automatic evaluation of

ontologies. Applied Ontology, 3(1-2):41–62.

Vossen, P., Agirre, E., Calzolari, N., Fellbaum, C., Hsieh,

S., Huang, C., Isahara, H., Kanzaki, K., Marchetti,

EVALUATING ONTOLOGIES WITH RUDIFY

A., Monachini, M., Neri, F., Raffaelli, R., Rigau, G.,

and Tescon, M. (2008). Kyoto: A system for min-

ing, structuring and distributing knowledge across lan-

guages and cultures. In Proceedings of LREC 2008,

Marrakech, Morocco.

Witten, I. H. and Frank, E. (2005). Data Mining: Practi-

cal machine learning tools and techniques. Morgan

Kaufmann, San Francisco, 2nd edition.

KEOD 2009 - International Conference on Knowledge Engineering and Ontology Development