Attribute Value Ontology

Using Semantics in Data Mining

Tomasz Łukaszewski, Joanna J´ozefowska and Agnieszka Ławrynowicz

Poznan University of Technology, Poznan, Poland

Keywords:

Ontologies, Imperfect Data, Na¨ıve Bayesian Classiﬁer.

Abstract:

We propose a new concept to represent attribute values as an ontology that allows modeling different levels

of abstraction. In this way more or less precise values may be used instead of missing or erroneous data. The

goal is to use this representation in order to improve analysis of imperfect data. The proposed attribute value

ontology (AVO) allows to upgrade the precision of information not only from positive observations but also

from negative ones. We show how to classify a new example using a set of training examples described in the

same or more precise way. Another advantage of the proposed approach is providing an efﬁcient way to avoid

the effect of overﬁtting.

1 INTRODUCTION

The concept of using ontologies in order to enhance

data with semantics resulted in a new vision of com-

puting - semantic computing. On the one hand, se-

mantic computing allows to integrate heterogeneous

data sources. On the other hand, semantic computing

allows to improve the data analysis e.g. the analysis

of imperfect data.

Imperfect data (e.g. erroneous or missing attribute

values) are very common in the ﬁeld of Data Min-

ing and they have a negative effect on the mining re-

sults. Let us notice, that some erroneous or missing

attribute values may be introduced by users that are

required to provide very speciﬁc values, but the level

of their knowledge of the domain is only very general

and they are unable to precisely describe the obser-

vation by an appropriate value of an attribute. Even

if a person is an expert in the domain, erroneous or

missing attribute values can be introduced as a conse-

quence of lack of time or other resources to precisely

describe the observation by an appropriate value.

In this paper we present a semantic approach to

Data Mining (Witten et al., 2011) aiming at improve-

ment of the analysis of imperfect data using some

background knowledge. Most common approach to

exploit background knowledge in data mining is gen-

eralization of attribute values (Han et al., 1992). It

allows to deﬁne abstract concepts as generalizations

of the primitive ones. Background knowledge used in

this approach has a form of taxonomies, categories or

more general relationships between attributes.

We introduce an attribute value ontology (AVO)

in order to improve the expressiveness of the knowl-

edge representation language. Firstly, AVO allows

to model different levels of abstraction - precise and

imprecise values. As a result, users are allowed to

use (less or more) imprecise values instead of erro-

neous or missing values. Secondly, AVO allows to

precise values not only by indicating the subconcept

of a current concept (a positive observation) but also

rejecting some subconcepts of the current concept (a

negative observation). This is a common technique

of describing what something is by explaining what

it is not. Moreover, AVO allows to avoid overﬁtting

by analysing results not only for a current value but

also for less precise values. This approach is an anal-

ogy to the analysis of results for special cases (ex-

ceptions) and for general ones. Finally, we present a

na¨ıve Bayesian classiﬁer extended to AVO.

2 ATTRIBUTE VALUE

ONTOLOGY

Let us assume that given is an ontology, which rep-

resents the domain knowledge. In particular, it ex-

presses a multilevel subsumption hierarchy of con-

cepts (ISA hierarchy) representing different levels of

abstraction - precise and imprecise values. We deﬁne

an attribute value ontology (AVO) as follows:

329

Łukaszewski T., Józefowska J. and Ławrynowicz A..

Attribute Value Ontology - Using Semantics in Data Mining.

DOI: 10.5220/0004157003290334

In Proceedings of the 14th International Conference on Enterprise Information Systems (SCOE-2012), pages 329-334

ISBN: 978-989-8565-11-2

 2012 SCITEPRESS (Science and Technology Publications, Lda.)

Deﬁnition 1. An attribute value ontology (AVO) is

a pair A = hC, Ri, where: C is a set of concepts (prim-

itive and abstract ones), R is a subsumption relation

over C, subset C

⊆ C of concepts without predeces-

sors is a ﬁnite set of primitive concepts of A.

For simplicity of presentation we consider the hi-

erarchy of concepts such that each concept has at most

one predecessor (tree structure). In general concepts

may have multiple predecessors (DAG structure).

2.1 Levels of Abstraction

Given is an attribute A and a set V = {v

, v

, ..., v

n > 1, of speciﬁc values of this attribute. These spe-

ciﬁc values can be interpreted as a single level of ab-

straction. The user that is unable to precisely describe

the observation by a speciﬁc value of A has to manage

with a missing or erroneous value.

Introducing AVO we add new levels of abstrac-

tion. We assume that primitive concepts of AVO rep-

resent these speciﬁc values of A (the original level

of abstraction). Abstract concepts of AVO are used

when the users are unable to precisely describe the

observation by a speciﬁc value of A (the new levels

of abstraction). Therefore, we call primitive and ab-

stract concepts: precise and imprecise values of AVO,

respectively.

Infectious Agent

Bacteria

Gram-positive

Bacteria

Streptococcous

Gram-negative

Bacteria

E.Coli Salmonella

Fungi Virus

Figure 1: Example of an attribute value ontology.

Example 1. Let us consider the following medical

problem. In order to determine the correct treatment,

an agent that caused the infection needs to be speci-

ﬁed. Although, all viral infections determine the same

treatment (similarly infections cased by fungi), iden-

tiﬁcation of the bacteria type is important in order to

decide about the appropriate treatment. Thus, speciﬁc

values of this attribute are the following: Streptococ-

cus, E.Coli, Salmonella, Fungi, Virus. An AVO de-

scribing the domain of infectious agents is presented

in Fig. 1. Primitive concepts of AVO represent these

speciﬁc values. Abstract concepts are the following:

Infectious Agent, Bacteria, Gram-positive Bacteria,

Gram-negative Bacteria.

A user that is unable to precisely describe the in-

fectious agent can use the most abstract concept of

this AVO (Infectious Agent) or one of its abstract sub-

concepts (Bacteria, Gram-positive Bacteria, Gram-

negative Bacteria). In the next subsection we show

that AVO allows to precise values not only by indi-

cating the subconcept of a current concept but also

rejecting some subconcepts of the current concept.

Let us observe that Streptococcus is not the only

Gram-positive Bacteria in the real world and our hi-

erarchy, for some reasons, does not contain concepts

of the other Gram-positive Bacteria. In order to

represent this we make the open world assumption

(OWA). Therefore, the concept Gram-positive Bacte-

ria should be correctly interpreted as: Streptococcus

or other Gram-positive Bacteria. Similarly, the con-

cept Gram-negative Bacteria should be interpreted as:

E.Coli or Salmonella or other Gram-negative Bacte-

ria. In the next subsection we show the consequences

of assuming the open world.

2.2 Increasing the Precision

In the previous section we have shown that the user

that is unable to precisely describe the observation

may use abstract concepts (imprecise values) of AVO.

A person that knowsnothing can use the most abstract

concept of AVO. Considering our medical problem,

the user shall use the abstract concept Infectious Agent

- Fig. 1.

AVO allows to precise descriptions explicitly, in-

dicating the subconcept of a current concept. Con-

sidering our medical problem, let us assume, that the

user has made a positive observation that this infec-

tious agent is Bacteria - Fig. 2.

Infectious Agent

Bacteria

Gram-positive

Bacteria

Streptococcous

Gram-negative

Bacteria

E.Coli Salmonella

Fungi Virus

Figure 2: Positive observation: infectious agent is bacteria.

AVO allowsto precise descriptions also implicitly,

rejecting some subconcepts of a current concept. This

is a common technique of describing what something

is by explaining what it is not. Continuing or medi-

cal example, let us assume, that the user has made a

negative observation that this Bacteria is not Gram-

negative Bacteria - Fig. 3.

Let us notice, that making this negative observa-

tion, all but one subconcepts of Bacteria have been

ICEIS2012-14thInternationalConferenceonEnterpriseInformationSystems

330

rejected. Making the closed world assumption (CWA)

we would be allowed to say automatically, that Bac-

teria is Gram-positive Bacteria. However, we have

made the open world assumption (OWA) and we are

not allowed to precise the description in this way.

Infectious Agent

Bacteria

Gram-positive

Bacteria

Streptococcous

Gram-negative

Bacteria

E.Coli Salmonella

Fungi Virus

Figure 3: Negative observation: bacteria is not gram-

negative bacteria.

2.3 Avoidance of Overﬁtting

Many Data Mining algorithms construct rules or trees

that are overﬁtted to the training data. The best

way to avoid overﬁtting is to simplify the learned

model (Witten et al., 2011). This simpliﬁcation can

be reached by generating sensible rules or pruning

trees. Both approaches are aimed at rejecting rules

that cover too few training examples.

AVO allows to avoid overﬁtting by analysing re-

sults not only for a current level of precision but also

for less precise values (superconcepts of the current

concept). This approach is an analogy to the analysis

of results for a special case (an exception) and for the

general case.

3 EXTENDING NA

IVE BAYESIAN

CLASSIFIER BY AVO

Using AVO we are able to represent training and

testing examples with precise and imprecise descrip-

tions. In this section we show how to extend the na¨ıve

Bayesian classiﬁer by AVO in order to learn this clas-

siﬁer from precisely and/or imprecisely described ex-

amples and classify precisely and/or imprecisely de-

scribed examples.

3.1 Na

ıve Bayesian Classiﬁer

The most straightforward and widely tested method

for probabilistic induction is known as the na¨ıve

Bayesian classiﬁer. Despite its simplicity and

the strong conditional independence assumptions it

makes, the na¨ıve Bayesian classiﬁer often performs

remarkably well, competitively with other well-

known induction techniques such as decision trees

and neural networks. The na¨ıve Bayesian classiﬁer

is often used for classiﬁcation problems, in which a

learner attempts to construct a classiﬁer from a given

set T of training examples with class labels.

Assume that given is a set of n attributes

, A

, ..., A

. An (training or testing) example is rep-

resented by a vector (v

, v

, ..., v

), where v

is the spe-

ciﬁc value of A

. LetC represent the class variable and

represent the value it takes (a class label).

The Bayesian classiﬁer (and also the na¨ıve

Bayesian classiﬁer) is a classiﬁcation method, which

classiﬁes a new observation E by selecting the class

with the largest posterior probability P(C

|E), as

indicated below:

P(C

|E) =

P(C

)P(E|C

)

P(E)

. (1)

P(E) is ignored, since it is the same for all classes,

and does not affect the relative values of their proba-

bilities:

P(C

|E) ∝ P(C

)P(E|C

) . (2)

Since E is a composition of n discrete values, one

can expand this expression:

P(C

, v

, ..., v

) ∝ P(C

)P(v

, v

, ..., v

) . (3)

where P(v

, v

, ..., v

) is the conditional prob-

ability of the example E given the class C

; P(C

) is

the prior probability that one will observe class C

All these parameters are estimated from the training

set. However, a direct application of these rules is

difﬁcult due to the lack of sufﬁcient data in the train-

ing set to reliably obtain all the probabilities needed

by the model. The na¨ıve Bayesian classiﬁer assumes

that the attributes are conditionally independent given

the class variable, which gives us:

P(C

, v

, ..., v

) ∝ P(C

)

∏

P(v

) . (4)

P(v

) is the probability of an instance of class

having the observedattributeA

value v

. The prob-

abilities in the above formula must be estimated from

training examples, e.g. using relative frequency:

P(C

) =

P(v

) =

(5)

where n is the number of training examples, n

is the

number of training examples with class label C

, n

is the number of training examples with the value of

the attribute A

= v

and class label C

AttributeValueOntology-UsingSemanticsinDataMining

331

3.2 Inference with AVO

In the proposed approach with abstract concepts (with

AVO), the na¨ıve Bayesian classiﬁer needs to be gen-

eralized to estimate P(c

), where c

is a primitive

or an abstract concept of A

. Let us recall, that for a

given concept c

in A

, all the concepts that are more

speciﬁc than the concept c

are the descendants of this

concept c

. In order to estimate P(c

), e.g. by rela-

tive frequency, we use the following property:

P(c

) =

∑

∈desc(c

)

(6)

where n

is the number of training examples with

class label C

, n

is the number of training examples

with the value of the attribute A

= c

and class la-

bel C

, n

is the number of training examples with

the value of the attribute A

= c

and class label C

desc(c

) is the set of concepts that are descendants of

the concept c

The proposed approach is a generalization of the

classical approach (without abstract concepts). In the

classical approach, speciﬁc attribute values can be in-

terpreted as a single level of knowledge granularity,

and a new example is classiﬁed using training ex-

amples described by the same speciﬁc attribute value

only. In the proposed approach (with abstract con-

cepts) each descendant of a given concept ’is’ this

concept. Therefore, in the classiﬁcation of a new ex-

ample described by a concept c

we use also all train-

ing examples described by descendants of c

The algorithm has been implemented and tested

on a series of examples available in the literature. The

results are promising, although more extensive exper-

iments are necessary in order to present statistically

valid conclusions.

3.3 Illustrative Example

Let us consider the medical problem presented in Ex-

ample 1. In order to determine the correct treatment,

an agent that caused the infection needs to be spec-

iﬁed. For the simplicity of the presentation we con-

sider only one attribute. The training data is given

in Table 1. The ﬁrst column of the table presents

the number of training examples described by a given

value (precise or imprecise) of the infectious agent for

each class. For example, the imprecise value Bacteria

is used in order to describe 6 instances with the class

label C

and 7 instances with the class label C

. In the

considered example each class is described exactly

by the same number of instances, therefore the prior

probability that one will observe class C

is equal to

0.5 for C

and C

Table 1: A medical diagnosis training data.

Instances Infectious Agent Class

6 Bacteria C1

3 Gram-positive Bacteria C1

1 Gram-negative Bacteria C1

7 Bacteria C2

1 Streptococcus C2

2 Gram-negative Bacteria C2

Infectious Agent is not Known. Let us consider

the following scenario: there is a patient and the di-

agnosis is not known. Knowing nothing we are al-

lowed to use the most abstract concept of AVO - In-

fectious Agent. We estimate the posterior probabil-

ity P(C

|InfectiousAgent). Therefore, we concen-

trate on these examples, that are described by the

value Infectious Agent or its descendants. From (6)

we have: P(In fectiousAgent|C

) =

0+(6+3+1)

= 1,

P(Inf ectiousAgent|C

) =

0+(7+1+2)

= 1. From (4)

we have: P(C

|Inf ectiousAgent) ∝ 0.5∗ 1 = 0.5, and

P(C

|Inf ectiousAgent) ∝ 0.5 ∗ 1 = 0.5. As we can

see, both classes are equally probable.

Infectious Agent is Bacteria. Let us assume, that

the user has made a positive observation that this

infectious agent is Bacteria - Fig. 2. We esti-

mate the posterior probability P(C

|Bacteria). There-

fore, we concentrate on these examples, that are

described by the value Bacteria or its descendants.

From (6) we have: P(Bacteria|C

) =

6+(3+1)

= 1,

P(Bacteria|C

) =

7+(1+2)

= 1. From (4) we have:

P(C

|Bacteria) ∝ 0.5∗1= 0.5, and P(C

|Bacteria) ∝

0.5 ∗ 1 = 0.5. As we can see, both classes are still

equally probable.

Bacteria is not Gram-negative Bacteria. Let us as-

sume, that the user has made a negative observation

that Bacteria is not Gram-negative Bacteria (shortly:

Bacteria&notGnB) - Fig. 3. We estimate the pos-

terior probability P(C

|Bacteria&notGnB). There-

fore we again concentrate on these examples, that are

described by the value Bacteria or its descendants.

However, this time we do not take into the consid-

eration these training examples, that are described by

the concept Gram-negative Bacteria and its subcon-

cepts. From (6) we have: P(Bacteria&notGnB|C

) =

6+(3)

= 0.9, P(Bacteria&notGnB|C

) =

7+(1)

= 0.8.

From (4) we have: P(C

|Bacteria&notGnB) ∝ 0.5 ∗

0.9 = 0.45, and P(C

|Bacteria&notGnB) ∝ 0.5 ∗

0.8 = 0.4. As we can see, there is a small difference

between the probabilities of these classes. Therefore,

additional observations are recommended.

Avoidance of Overﬁtting. Let us assume, that

the user has made a positive observation that Bac-

ICEIS2012-14thInternationalConferenceonEnterpriseInformationSystems

332

teria is Streptococcus. We estimate the poste-

rior probability P(C

|Streptococcus). Therefore

we concentrate on these examples, that are de-

scribed by the value Streptococcus or its descendants.

From (6) we have: P(Streptococcus|C

) =

0+(0)

0, P(Streptococcus|C

) =

1+(0)

= 0.1. From (4)

we have: P(C

|Streptococcus) ∝ 0.5 ∗ 0 = 0, and

P(C

|Streptococcus) ∝ 0.5∗ 0.1 = 0.05.

The result indicates that only class C2 is probable.

However, this result is based on a very small set of

training examples. Therefore, this result should be

treated rather as the exception from a general rule.

In our example the general rule indicates that both

classes are equally probable. However, awareness of

this exception may be very valuable in some cases.

4 RELATED WORK

Data Mining with background knowledge has been

extensively studied in the past. One of the aspects

of the background knowledge are relations between

the attribute values. Generalization of attribute val-

ues is the simplest relation considered in this context.

It allows to get abstract concepts as generalizations

of the primitive ones. Background knowledge used

in this approach has a form of taxonomies, categories

or more general relationships between concepts. Ab-

stract concepts are used in the data mining tasks in

various ways.

Compactness and Generality of Results. In the

early approaches generalization was carried out in or-

der to get more compact and more general data min-

ing results. Two groups of methods may be distin-

guished in this line. The ﬁrst group consists of meth-

ods where abstract concepts replace the data values

in the original database before applying the core data

mining algorithm. This approach is used, for exam-

ple, in: (Walker, 1980), (Han et al., 1992) and (Ku-

doh et al., 2003). In the methods of the second group

generalization is integrated with the data mining al-

gorithm. Among others, this approach was applied

in: (N´u˜nez, 1991), (Almuallim et al., 1996), (Tanaka,

1996), (Taylor et al., 1997). In (N´u˜nez, 1991) an al-

gorithm EG2 (Economic Generalizer 2) was proposed

to build a decision tree. The background knowledge

contains ISA hierarchies of attribute values. At each

node of the decision tree, this algorithm builds a union

of abstract values and primitive values. In (Almual-

lim et al., 1996) an algorithm was proposed to ﬁnd a

multiple-split test on hierarchical attributes (ISA hi-

erarchies) in decision tree learning. The proposed

multiple-split test is a cut through a hierarchy, which

maximizes the gain-ratio measure. The idea of cut

was proposed in (Haussler, 1988). The cut through

a hierarchy allows to reduce multiple levels of ab-

straction to a single level of abstraction and apply

classical algorithms. However, the number of possi-

ble cuts (split tests) grows exponentially in the func-

tion of leaves of the hierarchy. It turns out that this

task is very similar to the task of decision tree prun-

ing and this allows to employ a decision tree prun-

ing technique introduced in (Breiman et al., 1984). In

(Tanaka, 1996) a very similar approach was proposed

to build decision trees using structured attributes (ISA

hierarchies) and called LASA (LearniAVT-DTng Al-

gorithm with Structured Attributes). This approach

deﬁnes the unique and complete cover node set which

corresponds to the cut through a hierarchy. A mea-

sure of generalization goodness was proposed, which

takes into account two mutually conﬂicting factors: a

generalization level and a penalty for the induced er-

rors. An algorithm to ﬁnd optimum generalization,

that transforms the original problem to the shortest

path problem, was also proposed. A simple experi-

ment showed, that the classiﬁcation results of the pro-

posed approach are better than a standard approach in

terms of classiﬁcation accuracy. (Taylor et al., 1997)

applied a tool ParkaDB to integrate databases and on-

tologies in order to generate classiﬁcation rules based

on generalized concepts from an ontology. The level

of the generalization is determined by gathering fre-

quency counts and evaluating so called strong indica-

tors for class membership.

Handling Imprecise Descriptions. A more recent

approach is to use abstract concepts (imprecise val-

ues) in order to represent real objects that can not

be precisely described by the available speciﬁc val-

ues of an attribute. The use of attribute value tax-

onomies (AVT) in the decision tree learning (AVT-

DTL) and the na¨ıve Bayesian classiﬁer (AVT-NBL)

is presented respectively in (Zhang et al., 2002) and

(Zhang et al., 2006). AVT-DTL and AVT-NBL, to

the best of our knowledge, are the only one existing

approaches for learning classiﬁers from imprecisely

speciﬁed instances and classifying imprecisely spec-

iﬁed instances. AVT-DTL and AVT-NBL use a cut

through a hierarchy of concepts. When instances are

described by abstract values below the cut through a

taxonomy, they are aggregated upwards and stored in

abstract values of the cut. When they are described by

abstract values above the cut, they are propagated to

their descendants in the cut, proportionally.

Our approach (na¨ıve Bayesian classiﬁer extended to

AVO) is an improvement of the AVT-DTL and AVT-

NBL in three directions. Firstly, AVT-DTL and AVT-

NBL use a cut through a taxonomy. This cut reduces

the considered taxonomy to a single level of abstrac-

AttributeValueOntology-UsingSemanticsinDataMining

333

tion only. This reduction was necessary in order to

apply a classical algorithm of decision tree learning

that was designed for a single level of abstraction.

The use of this cut in AVT-NBL is just a tradeoff be-

tween the complexity and accuracy of the classiﬁer

(Zhang et al., 2006). We show that na¨ıve Bayesian

classiﬁer can be extended to AVO without any reduc-

tion of a given ontology. Secondly, the semantics of

AVO allows to precise descriptions explicitly (posi-

tive observations) and implicitly (negative observa-

tions). Thirdly, the semantics of AVO allows to avoid

overﬁtting in a very effective way by analyzing results

for less precise values. The overﬁtting avoidance is a

very important issue in Data Mining and is very im-

portant from the practical point of view.

5 CONCLUSIONS

In this paper we proposed an extension of the na¨ıve

Bayesian classiﬁer by an attribute value ontology

(AVO) aiming at the improvement of the analysis of

imperfect data. In the proposed approach, every at-

tribute is a hierarchy of concepts from the domain

knowledge base (ISA hierarchy). This semantic ap-

proach to Data Mining allows to describe examples

either very precisely or, when it is not possible, in a

more general way (using a concept from higher levels

of the hierarchy). As a result, users that are unable to

precisely describe the observation by a speciﬁc value

of an attribute, are allowed to use (less or more) im-

precise values.

Let us notice, that each imprecise value of AVO,

except the most abstract concept, is more precise than

the missing value, represented by this most abstract

concept. Therefore, introducing these abstract con-

cepts we improve the analysis of imperfect data. This

improvement is increased by each upgrade of the pre-

cision of information. We showed that even negative

observations improve this precision: ”knowing what

we do not know” is already information.

We could ask a question: how far should we pre-

cise the description? There is no single answer for

this question. Each cut through a hierarchy seems to

be a tradeoff between the complexity and accuracy.

Therefore, maintaning all the levels of abstraction is

an alternative approach to this problem. It allows to

compare results for many levels of abstraction. That

can be an efﬁcient way to avoid the effect of overﬁt-

ting. Moreover, this comparison can be utilized by

a cost sensitive computing. High precision carries a

high cost. The challenge is to exploit the tolerance

for imprecision. Further research aims at experimen-

tal evaluation of the proposed approach.

ACKNOWLEDGEMENTS

Research supported by the Polish Ministry of Science

and Higher Education grant No. N N516 186437.

REFERENCES

Almuallim, H., Akiba, Y., and Kaneda, S. (1996). An

efﬁcient algorithm for ﬁnding optimal gain-ratio

multiple-split tests on hierarchical attributes in deci-

sion tree learning. In Proceedings of the Thirteenth

National Conference on Artiﬁcial Intelligence. AAAI

Press.

Breiman, L., Friedman, J. H., Olshen, R. A., and Stone,

C. J. (1984). Classiﬁcation and Regression Trees.

Wadsworth, Belmont, California, 3rd edition.

Han, J., Cai, Y., and Cercone, N. (1992). Knowledge dis-

covery in databases: An attribute-oriented approach.

In Proceedings of the 18th International Conference

on Very Large Data Bases. Morgan Kaufmann.

Haussler, D. (1988). Quantifying inductive bias: Ai learn-

ing algorithms and valiant’s learning framework. In

Artif. Intell., Vol. 36(2). Elsevier.

Kudoh, Y., Haraguchi, M., and Okubo, Y. (2003). Data

abstractions for decision tree induction. In Theoretical

Computer Science, Vol. 292(1). Elsevier.

N´u˜nez, M. (1991). The use of background knowledge in

decision tree induction. In Machine Learning, Vol.

6(3). Springer.

Tanaka, H. (1996). Decision tree learning algorithm with

structured attributes: Application to verbal case frame

acquisition. In Proceedings of the 16th International

Conference on Computational Linguistics. Center for

Sprogteknologi, Danmark.

Taylor, M. G., Stoffel, K., and Hendler, J. A. (1997).

Ontology-based induction of high level classiﬁcation

rules. In Proceedings of the Workshop on Research

Issues on Data Mining and Knowledge Discovery.

ACM.

Walker, A. (1980). On retrieval from a small version of a

large data base. In Proceedings of the Sixth Interna-

tional Conference on Very Large Data Bases. IEEE

Computer Society.

Witten, I., Frank, E., and Hall, M. (2011). Data Mining.

Practical Machine Learning Tools and Techniques.

Morgan Kaufmann, Burlington, 3rd edition.

Zhang, J., Kang, D., Silvescu, A., and Honavar, V. (2006).

Learning accurate and concise naive bayes classiﬁers

from attribute value taxonomies and data. In Knowl.

Inf. Syst., Vol. 9(2). Springer.

Zhang, J., Silvescu, A., and Honavar, V. (2002). Ontology-

driven induction of decision trees at multiple levels of

abstraction. In LNCS Vol. 2371. Springer.

ICEIS2012-14thInternationalConferenceonEnterpriseInformationSystems

334