FUZZY KEYWORD ONTOLOGY FOR ANNOTATING

AND SEARCHING EVENT REPORTS

Juhani Hirvonen, Teemu Tommila, Antti Pakonen

VTT Systems Reseach, Vuorimiehentie 3, Espoo, Finland

Christer Carlsson, Mario Fedrizzi, Robert Fullér

Institute for Advanced Management Systems Research, Abo Akademi University

ICT House A 4053, 20520 Turku, Finland

Keywords: Fuzzy ontology, Fuzzy partonomy, Fuzzy reasoning schemes, Knowledge mobilisation, Semantic web.

Abstract: This paper defines and applies a fuzzy keyword ontology to annotate and search event reports in a database.

The ontology is developed by superimposing a fuzzy partonomy on fuzzy classifications. The claim is that

fuzzy keywords will help us find event reports even if the event description is incomplete or imprecise and

that this will provide benefits in finding the relevant problem reports. This will save time and costs when

working with queries on large data- and knowledge bases.

1 INTRODUCTION

The following hypothetical situation was selected as

a starting point: A company writes and stores pieces

of knowledge, called "golden nuggets", in the form

of problem reports, models, recommendations, etc.

Nuggets are documents and they can contain data

extracted from the client’s information systems.

While creating a report the expert author annotates it

with suitable keywords. The internal structure of the

document can thus be ignored, and the problem

scales down to the definition of fuzzy keyword on-

tology.

A knowledge base of golden nuggets of different

types is a generic approach applied by many organi-

sations, for example in incident reporting and elec-

tronic diaries. While trying to preserve some general

applicability, our paper takes a narrower viewpoint

to the topic by assuming that the users are supposed

to be experts so that the meaning of the keywords

will be familiar to them.

The main goals of the paper are (i) to develop

fuzzy keyword ontology for an industrial applica-

tion; (ii) to show that fuzzy ontology will create

effective keyword combinations for database que-

ries; (iii) to introduce a tool (KnowMob) that imple-

ments (i) and (ii): The theory and methods we intro-

duce in this paper implement a new concept called

knowledge mobilisation (cf. Carlsson et al (2010

a,b); Romero (2008)). Knowledge mobilisation

represents a change of paradigm in the creation,

building, handling and distribution of knowledge.

We will show that this differs from the classical

large, complete ontology approach. We will use

fuzzy sets as a basis. This will allow imprecise que-

ries, repeated iterations and supports for learning to

understand problems which are not sufficiently un-

derstood from the beginning. Similar approaches

have been worked out by Calegari and Ciucci (2006,

2010), Lee et al (2005), and Parry (2006) but our

project is one of the first to work out the methods

and the theory for actual industry applications.

2 KEYWORD CATEGORIES

We identified the most important entities used in

searching problem reports that are relevant for de-

scribing problems in a specific engineering context

which in this case is paper making process. We

defined keyword types that are almost independent

of each other. The goal was to characterise problem

situations by a combination of events, systems and

functions affected, materials involved, and process

variables. The goal was to reduce the amount of

251

Hirvonen J., Tommila T., Pakonen A., Carlsson C., Fedrizzi M. and Fullér R..

FUZZY KEYWORD ONTOLOGY FOR ANNOTATING AND SEARCHING EVENT REPORTS .

DOI: 10.5220/0003091002510256

In Proceedings of the International Conference on Knowledge Engineering and Ontology Development (KEOD-2010), pages 251-256

ISBN: 978-989-8425-29-4

 2010 SCITEPRESS (Science and Technology Publications, Lda.)

keywords. These adopted keyword categories are

shown in Figure 1.

Figure 1: Keyword categories.

A system is considered to be a real-world entity that

is designed and built for a purpose. Systems consist

e.g. of buildings, mechanical and electrical equip-

ment, software and people. The Figure 2 below

shows some subsystems of a paper making line and

also demonstrates how the system decomposition

often is imprecise depending on the viewpoint taken,

e.g. if the viewpoint is “retention control” then the

effect of “Dry end” to paper quality is negligible.

This means that the effective size of any part of

paper machine is depended on the viewpoint taken.

Figure 2: A “decomposition” of paper line.

The various activities of a system are called

plant functions. In many cases, a function refers to a

purposeful activity. Functions can also be under-

stood as physical and chemical phenomena.

The term process variable refers to attributes of

plant systems, functions, and substances that charac-

terise their performance or state. Very often variable

is measured but it can have a very qualitative charac-

ter even without a numerical scale.

The term event refers to an “episode” in the op-

eration of the plant. Therefore, an event has a dura-

tion that is usually rather short but can continue for

weeks or even months. Quite often, an event is inter-

esting (i.e. valuable for knowledge management)

because it may be unanticipated and unwanted, i.e. a

problematic situation.

An industrial plant processes and handles mate-

rials and substances that have various chemical and

physical properties and purposes in the production

chain.

3 THE FUZZY KEYWORD

CLASSIFICATION

Keywords can be understood as representatives of

sets of real-world events, systems etc. that overlap

and are related in many ways. This complexity is

formalized in a way that serves our purpose, i.e.

finding relevant information from a knowledge base.

This is why we have instead of strict subsethood

adopted another way which is shown in Figure 3.

The set C is fully included in A but the set B con-

tains elements not included in A. Furthermore B

contains a larger part of elements of A than C. In

this way we want to show that some set C of key-

words is included in another set A of keywords; a

second set B of keywords is partly included in A.

There is another aspect to the overlapping of key-

words – the set B partly covers the set A and the set

A fully covers the set C. With the help of this intui-

tive description of inclusion and coverage (which

will be replaced with a formal description in section

4) we have been able to work out fuzzy keyword

classifications that we will show to be fuzzy key-

word ontology (cf. Carlsson et al (2010a).

We will use these inclusion and coverage rela-

tions to classify all Keyword categories.

Figure 3: Inclusion and coverage.

3.1 Event Types

Figure 4 shows a fragment of the fuzzy hierarchy of

Event_types. At the top level generic Event is classi-

fied into problems, neutral observations and suc-

cesses on the basis of the value of the Event. At

lower levels other items are used to categorise prob-

lems into more concrete Event_types.

The two numbers (not all shown) beside the ar-

rows (also not all shown) indicate the inclusion and

coverage values of the related keywords, e.g. “De-

sign_flaw” is included (with degree) 0.60 in “Sys-

tem_fault” and correspondingly 0.40 in “Func-

tion_failure”. The numbers at the lower part of the

arrow give correspondingly the coverage values, e.g.

“Technical_ problem” covers 0.80, “Operational-

_problem” 0.40, and “Quality_problem” 0.50, etc. of

“Problem” events. As a matter of fact this implies

that these keywords overlap (their sum is > 1.00).

KEOD 2010 - International Conference on Knowledge Engineering and Ontology Development

252

Figure 4: A fragment of the Event_type classification.

3.2 System Types

Figure 5 shows a few examples of generic system

types within an engineering taxonomy that classifies

the parts of a production line in a form of a precise

taxonomy. However, there are several engineering

ontologies and hence we have adopted a fuzzy clas-

sification in the style as used for Event_type.

Figure 5: Classification of System_type, examples.

We are going to need an additional decomposi-

tion of Systems. We call this decomposition parton-

omy. Partonomy fuzzifies the classical whole-part

relationship. For System_type keywords both engi-

neering classification and partonomy are important.

3.3 Function Types

Figure 6 shows some examples of Function_type

keywords and their classification into “Operations

and Management”, “Processing”, “Phenomenon”,

and “Control”. This classification clearly shows how

the independency of categories restricts the amount

of keywords. We do not have separate keywords for

e.g. pH control, retention control, formation control

etc..

Figure 6: Function type keywords, examples.

3.4 Variable Types

Variable names can be added as keywords in order

to say that an event is associated with the variable.

Their values can characterise the situation. Exact

numerical values would not support fuzzy reasoning.

The KnowMob tool (cf. section 5) cannot know

which numerical values should be considered low

and high in a given operational state. The solution is

to let the expert user associate a linguistic value

classification label like “normal”, “high” or “very

low” to a process variable name.

3.5 Dependencies between Categories

In addition to the keyword categories the fuzzy on-

tology must model functional dependencies between

keyword categories. As an example, systems play

various roles in carrying out one or more functions.

These dependencies will be expressed as fuzzy rela-

tions (cf. section 4).

4 FUZZY ONTOLOGY AND

REASONING SCHEMES

We have so far introduced our key concepts and

basic reasoning with an intuitive and “common

sense” approach. In this section we need to become

a bit more precise and introduce more formal defini-

tions of the essential parts of our fuzzy keyword

ontology.

4.1 Fuzzy Ontology

We have as a starting point a basic keyword classifi-

cation which is built on the engineering knowledge

of the paper machine; this keyword classification

can be represented as a directed graph (cf. Figures 4-

6) without loss of generality. Keywords are organ-

ized in five categories <event, system, function,

FUZZY KEYWORD ONTOLOGY FOR ANNOTATING AND SEARCHING EVENT REPORTS

253

variable, material> based on the engineering knowl-

edge; for each category the classification is built on

a specialisation/generalisation relations (i.e. inclu-

sion/coverage relations), i.e. moving to the next

lower level of the directed graph each category

(<event, system, function, variable, material>) is

specified in subclasses (and over sub-sub classes etc

down to specific concepts; i.e. “system elements” if

we follow the “System” category) and moving to the

next higher level of the directed graph sub-classes

(or individual concepts) are generalised to the next

level of sub-classes (or a class).

Keywords are going to be used to quickly find

documents through queries of (very) large databases;

this should be possible by building keyword combi-

nations without following the predefined structure of

the classification but using the relations

We superimpose a partonomy on the keyword

classification, or more precisely a fuzzy partonomy;

this will allow us to find keywords which are partly

the same for a query regardless of where they are

defined in the underlying keyword classification (or

where they are located in the directed graph).

A partonomy that is built on part-of relationships

is a primitive of the formal theory of parthood rela-

tions; parthood relations specify part-of and overlap

within a whole; part-of is reflexive, anti-symmetric

and transitive (the transitivity is sometimes difficult

to justify) and overlap between x and y is defined as

O(x, y) := {z │ z 

x and z  y} where the symbol

“” now denotes part-of.

The fuzzy keyword classification and partonomy

are built on inclusion and coverage, which are un-

derstood to be relations between fuzzy subsets. The

classifications and part-of relations are collected in

matrices of coverage/inclusion of keywords; the

cells of the matrix are numbers [0, 1] which show

the degree of coverage and inclusion.

A fuzzy ontology is a relation on fuzzy sets, i.e. a

relation associated with a membership function; let

be a finite fuzzy set of keywords identified with a

level of the directed graph and a category <event,

system, function, variable, material>, hence i = 1,

…, 5; a membership function is a mapping of K

on L, a lattice or a partially ordered set; the set of

linguistic labels {negligible, weak, moderate, strong,

perfect} is a lattice which means that a relation be-

tween two sets of keywords can be stated and de-

scribed with a linguistic label.

4.2 Fuzzy Reasoners

We need to find a way to combine linguistic labels

and numbers for the following reasoning schemes so

that we can use them to get numbers for the inclu-

sion/coverage matrix; this can be done in the follow-

ing way (the linguistic labels can be defined accord-

ing to the context; the labels can also be overlap-

ping; cf. Carlsson et al (2010b) for details). Let us

consider a domain  of keywords that have

been classified based on some property with real

numbers in [0, 1]; we will consider three fuzzy sub-

sets A, B and C of keywords (similar to K

) in the

domain D; we will first work with the fuzzy subsets

A and B. We say that A is a fuzzy subset of B (both

defined in the domain D) and write





















  





       











(1)

We can then define the two concepts inclusion and

coverage in terms of these fuzzy subsets (as both are

defined in the same domain) by following the

intuitive understanding we have in Figure 3

; it

should be noted that the min-operator is one of a

class of t-norms that can be used to express the

combinations (cf. Carlsson et al (2010b)).

Degree of subsethood (inclusion) of  in 







,







min













,





























(2)

Degree of supersethood (coverage)







,







min













,



























(3)

Now we can combine the two concepts as a

categorisation of the two subsets which can be used

to order the subsets of keywords – for this we have

several possibilities but we can use the following

simple characterisation:

Degree of similarity





,



min













,









/max



,















(4)

It is clear that ,  ,.

We will get a similar representation of the fuzzy

subset C as it is fully a subset of A (cf. Figure 3).

We can now illustrate these concepts with some

numerical examples; the numbers would be similar

to those used in Figure 4.

Let

  0.4,0.6,0.8,0.3 and

 0.5,0.4,0.8,0.6.

Then A is almost a subset of B since 









 for   1,3,4,5 but not quite since 



KEOD 2010 - International Conference on Knowledge Engineering and Ontology Development

254



. The sum of the membership degrees in the

fuzzy set  is

∑











 0.40.6 0.80.3  2.1

∑

min









,















1.9

Therefore , 0.94, , 0.826,

and , 0.76.

Let next the domain  represent the set of keywords

shown in the partial graph in Figure 5.

We can then

find a subset of keywords <Technical_Problem> in

this domain, which has the fuzzy subsets <Sys-

tem_fault> and <Function_failure> of keywords for

which we can work out the inclusion and coverage

relations. In this way we can establish a fuzzy par-

tonomy over the classification of engineering key-

words.

We can then work with the fuzzy partonomy using

so-called approximate reasoning [AR-] schemes to

find and assign summary values to the <Techni-

cal_Problem> subset of keywords to represent how

similar they are to a diagnosis used to identify prob-

lems in the Problem part of the Event partial graph

shown in

Figure 4; As we for the moment do not

have enough empirical data we will use a linear AR-

scheme (which may be too simplified for the con-

text), S_f stands for System_fault, F_f for Func-

tion_failure and T_P for Technical_Problem); then

the scheme would be something like the following:

If S_f is negligible and F_f is negligible then T_P is

negligible

If S_f is weak

and F_f is weak then T_P is weak

If S_f is moderate and F_f is moderate then T_P is

moderate

If S_f is strong

and F_f is strong then T_P is strong

If S_f is perfect

and F_f is perfect then T_P is perfect

If we now denote inclusion with [inc] and cover-

age with [cov] then we should write the ASR-

scheme in the following way using (3) and (4):

If [inc] S_f is <negligible, weak, moderate, strong,

perfect> and [inc] F_f is <negligible, weak, moderate,

strong

, perfect> then T_P is min ([inc] S_f, [inc] F_f)

If [cov] S_f is <negligible

, weak, moderate, strong,

perfect

> and [cov] F_f is <negligible, weak, moderate,

strong, perfect> then T_P is max ([cov] S_f, [cov] F_f)

Then we will have that,

[sim] T_P is = min ([inc] S_f, [inc] F_f)/ max ([cov]

S_f, [cov] F_f)

which now shows how similar (or “good”) T_P is

for identifying the problem at hand.

If we now assume for a moment that we have

collected the necessary data we can insert numbers

and get:

If [inc]S_f is 0.5 and [inc]F_f is 0.4 then T_P is 0.4

If [inc]S_f is 0.6 and [inc]F_f is 0.8 then T_P is 0.6

If [inc]S_f is 0.9

and [inc]F_f is 0.8 then T_P is 0.8

If [inc]S_f is 0.3

and [inc]F_f is 0.5 then T_P is 0.3

In a similar way we can also work out the [cov]

scheme but now we use the max instead of the min.

If [cov]S_f is 0.5 and [cov]F_f is 0.4 then T_P is 0.5

If [cov]S_f is 0.4 and [cov]F_f is 0.3 then T_P is 0.4

If [cov]S_f is 0.6

and [cov]F_f is 0.8 then T_P is 0.8

If [cov]S_f is 0.6

and [cov]F_f is 0.5 then T_P is 0.6

As we found out above (as we are using the

same numbers) then_  0.94,_ 

0.826, and _  0.76.

We should realize that in most cases we do not

have linear AR-schemes and need to have a more

general form for the conclusions. Here the_

is found as the rate of the summed min- and max-

values of the membership values of the keywords in

the fuzzy subsets.

This simple version of a fuzzy reasoner can be

developed into more complete reasoning schemes.

Straccia (2006) has worked out some classes of

reasoners in his fuzzy descriptions logics (fuzzy

DL), which has the added bonus of being part of the

OWL 2.0 standard.

Stoilos et al (2010) worked out fuzzy extensions

to the OWL – going in the opposite direction – and

showed that they will reduce to fuzzy DL.

5 KNOWMOB TOOL

The KnowMob tool implements the fuzzy ontology.

It also implements the fuzzy reasoning.

The KnowMob tool is implemented with Java.

The Protégé ontology editor was used to define and

maintain the fuzzy ontology in OWL format. The

problem solving reports on the chemistry and proc-

ess control of the “wet end” of a paper machine were

collected from our industrial partners. Industrial

experts have assisted in evaluating the results.

5.1 UI for Knowledge Base Query

When browsing the knowledge base for reports that

describe situations similar to the current (problem)

situation, the user first has to describe the situation

at hand. To facilitate an ontology-based query, the

situation must be described using the predefined

keywords. Accordingly, the user interface must help

the user to quickly find the appropriate terms.

FUZZY KEYWORD ONTOLOGY FOR ANNOTATING AND SEARCHING EVENT REPORTS

255

Figure 7: Concept of a user interface for describing the

situation at hand.

The user selects descriptive keywords in categories

such as system (e.g. "paper machine", or "head

box"), function (e.g. "water removal", "hydration"),

event (e.g. "instability" or "drift") and variable (e.g.

"pH", "Brightness"). Because the amount of avail-

able keywords can be staggering, the user is assisted

in finding the particular keyword(s), e.g. by advanc-

ing from more generic keywords to more exact sub-

classes. Since fuzzy ontologies enable multiple in-

heritances, the keyword "Web breaks" can be dis-

covered through different branches, once again mak-

ing it easier to find.

6 CONCLUSIONS

In this paper we showed that we can build a fuzzy

ontology – developed from keyword classification

and a fuzzy partonomy - as a basis for knowledge

mobilization, and we showed that we can form good

keyword combinations to retrieve relevant docu-

ments to deal with process problems in a paper mak-

ing production line.

The aim of the development work was to study

the possibilities that a fuzzy ontology can provide

for knowledge retrieval in the domain of industrial

process plants. We used a fuzzy ontology framework

to describe knowledge related to a paper mill, and

implemented a demo tool for running extended que-

ries against stored reports of knowledge.

The next steps will basically be to generalize

several parts of the results we have shown in this

paper. We need to show that fuzzy ontology – and

the fuzzy description logic that several authors now

have shown that should be used at its core - can be

enhanced with the introduction of AR-schemes to

work with real world data and observations. This

will offer a way to build a connection to the seman-

tic web standards.

ACKNOWLEDGEMENTS

The research project KnowMobile was a joint ven-

ture of IAMSR of Åbo Akademi University, and

VTT Technical Research Centre of Finland; the

project was funded by Tekes (Finnish Funding

Agency for Technology and Innovation) and indus-

trial partners. We are very grateful for the time and

help we got from the industrial partners.

REFERENCES

Calegari, S. and Ciucci, D. (2006): Integrating Fuzzy

Logic in Ontologies. In Proceedings of the 8th Inter-

national Conference on Enterprise Information Sys-

tems, pp. 66-73

Calegari, S. and Ciucci, D. (2010): Granular computing

applied to ontologies. In International Journal of Ap-

proximate Reasoning, 51(4), 391-409

Carlsson, C., Brunelli, M. and Mezei, J. (2010a). Fuzzy

Ontology and Knowledge M obilisation. Turning

Amateurs into Wine Connoisseurs. In Proceedings of

the FUZZ/IEEE 2010 Conference, Barcelona

Carlsson, C., Brunelli, M. and Mezei, J. (2010b). Fuzzy

Ontology and Information Granulation. An Approach

to Knowledge Mobilisation. In IPMU 2010 Proceed-

ings, Dortmund

Lee, C.-S., Jian, Z.-W., Huang, L.-K. (2005): A fuzzy

ontology and its application to news summarization, In

IEEE Transactions on Systems, Man and Cybernetics,

Part B, 35(5), 859-880

Parry, D. (2006): Fuzzy ontologies for information re-

trieval on the WWW. In Fuzzy Logic and the semantic

Web, Vol. 1, Bouchon-Meunier B., Gutierrez Rios J.,

Magdalena, L., Yager R. R. (ed.), Elsevier, Capturing

Intelligence Series

Romero, J. G. (2008): Knowledge Mobilization: Architec-

tures, Models and Applications. PhD Thesis, Universi-

ty of Granada

Stoilos, G., Stamou, G. and Pan, J.Z. (2010): Fuzzy Exten-

sions of OWL: Logical Properties and Reduction to

Fuzzy Description Logics, In Journal of Approximate

Reasoning 51, 656-679

Straccia, U.(2006): A fuzzy description logic for the Se-

mantic Web. In Fuzzy Logic and the Semantic Web,

Vol. 1, E. Sanchez (ed.), Elsevier, Capturing Intelli-

gence Series, 73-90 (Chapter 4)

KEOD 2010 - International Conference on Knowledge Engineering and Ontology Development

256