ANALYSIS OF ONTOLOGICAL INSTANCES

A Data Warehouse for the Semantic Web

Roxana Danger and Rafael Berlanga

Department of Informatics Languages and Systems, Universitat Jaume I

Keywords:

Semantic Web, Data Warehouse, Description Logic.

Abstract:

New data warehouse tools for Semantic Web are becoming more and more necessary. The present paper

formalizes one such a tool considering, on the one hand, the semantics and theorical foundations of Description

Logic and, on the other hand, the current developments of information data generalization. The presented

model is constituted by dimensions and multidimensional schemata and spaces. An algorithm to retrieve

interesting spaces according to the data distribution is also proposed. Some ideas from Data Mining techniques

are incorporated in order to allow users to discover knowledge from the Semantic Web.

1 TOWARD PATTERN

RECOGNITION IN THE

SEMANTIC WEB

The Semantic Web is a new form of web conceived

for allowing human users and software tools to pro-

cess and share the same sources of information. The

Semantic Web relies on a set of standards which pro-

vide syntactic consistency and semantic value to all

of its content. For example, Description Logic is used

as the theoretic base for the description of web items,

and the languages RDF and OWL for their syntac-

tic representation. Description Logic deﬁnes a family

of knowledge representation languages which can be

used to represent, in a well-understood formal way,

the knowledge of an application domain. This knowl-

edge, known as ontology, ranges over the termino-

logical cognition of the domain (the interesting object

classes, or concepts, its Tbox) and its examples (the

instances of the object classes, its Abox).

Data analysis in the Semantic Web will be the

most important process when the population of on-

tologies

becomes a reality. Its ﬁnal goal is the recog-

nition of patterns amongst the values of the attributes

The population of ontologies is the process of adding

instances to an ontology in order to enrich it with examples

of its domain knowledge.

of the ontological instances, which could turn into

knowledge or allow its discovery. OWL tools allow

users to create new concepts related to one or more

existing ontologies, and to determine the instances as-

sociated to such concepts.

An additional tool is needed if we want to perform

in a versatile way customized data analysis, either full

or partial, so that each object can be studied from dif-

ferent points of view focusing on distinct particular

features. Such a tool should allow users to navigate

through an instance set and its properties, being able

to discriminate between relevant and superﬂuous in-

formation. Moreover, it should compute and display

statistical indexes able to describe and report about

the extracted patterns.

The formalization of such a tool, following the

framework described below, is the purpose of the

present work. Starting from an available ontology,

this is enriched with information provided by the data

analyst, for example specifying the atomic data com-

bination functions. These functions provide the way

for combining atomic data to form a generalized in-

stance representing a set of instances. Then, two main

structures have to be built: the conceptual dimensions

and the multidimensional conceptual spaces. The

conceptual dimensions are partial order speciﬁcations

between objects, which allow to browse through their

semantic relations. The multidimensional conceptual

Danger R. and Berlanga R. (2007).

ANALYSIS OF ONTOLOGICAL INSTANCES - A Data Warehouse for the Semantic Web.

In Proceedings of the Second International Conference on Software and Data Technologies - Volume ISDM/WsEHST/DC, pages 13-20

DOI: 10.5220/0001332200130020

 SciTePress

spaces can be seen as “intelligent object containers”.

They make use of a subset of conceptual dimensions,

a speciﬁcation of relevant abstraction levels and a

set of atomic data combination functions in order to

(re)construct appropriate generalized instances. Dif-

ferent statistical indexes (e.g. frequencies), associated

to the conceptual dimensions, can be used to char-

acterize patterns in the conceptual spaces. The most

suitable data analysis technique for carrying out this

proposal is data warehousing.

The present work is not the ﬁrst attempt to formal-

ize a data warehouse for the Semantic Web. Within

the Data Warehouse Quality (DWQ) project (Hacid

and Sattler, 1998) a formalization for the multidimen-

sional modeling based on an extension of the con-

structors of description logic is proposed. In this way,

new object classes could be described by specify-

ing aggregability operations, and the traditional rea-

soning over ontological instances could be applied.

However, the demonstration of the undecidability of

minimal languages that operate with aggregate opera-

tors(Baader and Sattler, 2003) makes the proposal of

the DWQ project unfeasible.

On the other hand, the ideas of the traditional data

warehouse (and OLAP techniques) has been extended

to object oriented modeling, (Buzydlowski et al.,

1998; Trujillo et al., 2001; Nguyen et al., 2000; Binh

and Tjoa, 2001; Abell

o, 2002). Considering that de-

scription logic was designed as an extension to frames

and semantic networks, the basis of object-oriented

data warehouse could be applied in order to deﬁne

a data warehouse for the Semantic Web. However,

the ﬂexibility of object-oriented formalization causes

a more sparse structure in object-oriented databases

that in traditional ones. Moreover, the restrictions of

OLAP implementations drastically reduce the useful

set of objects to be used in the analysis.

Unlike these previous works, this paper proposes

a multidimensional model for the analysis of ontolog-

ical instances that merge both approaches. The idea

is the creation of meta-ontologies in order to enrich

the knowledge of ontologies with data analysis infor-

mation. This data analysis information focuses on the

description of interesting object classes and on the ag-

gregation process. The reasoning of description logic

is used in a preliminary phase to 1) recover the satisﬁ-

able

object classes that can be used on analysis pro-

cesses, 2) discover the hierarchical and aggregate or-

ders between the classes, and 3) assign each instance

to the set of object classes to which it belongs.

This paper describe our proposal in detail. Firstly,

A concept (or object class) is satisﬁable if it is consis-

tent and there exists an interpretation on which appears at

least an instance of this concept.

the data analysis information is introduced. Then, the

proposed model is described, starting from the deﬁni-

tion of dimensions and their operators (section 3) and

following with the speciﬁcation of the multidimen-

sional conceptual space (section 4). The two follow-

ing sections are focused on the extraction of interest-

ing conceptual spaces and their use, respectively. The

last section gives some conclusions and future work.

2 ANALYSIS METADATA

Information descriptions useful for the analysis are

those available in the ontologies in form of instances.

However, they are not enough to analyze data and dis-

cover patterns. New interesting concepts and partic-

ular issues related to the generalization process are

essential in order to generate descriptions that repre-

sent relevant and realistic visions of the application

domains of the analyzed ontologies. We call all this

information analysis metadata, which comprises the

following elements:

• description of new concepts, which it is used to in-

troduce additional levels of abstraction in the con-

cept hierarchies expressed in an ontology, and/or

to link concepts from different ontologies. New

concepts may be obtained extending old ones with

paths to previously unrelated concepts. They can

also semantically represent hierarchical clusters

obtained using clustering algorithms.

• description of the combination functions (see def-

inition below); it is used to specify ways for gen-

eralizing sets of data of the same type during the

instance generalization process. The data analyst

is responsible for deciding the combination func-

tions that are semantically suitable for a given data

set. For example, the combination function which

computes the average of a set of values is seman-

tically suitable for a temporal sequence of temper-

atures of a town, but not for a set of temperatures

of different towns.

Although it is perfectly plausible to deﬁne such

descriptions for every new multidimensional concep-

tual space, a better solution is to keep this semantic

information always available and to apply it accord-

ing to the requirements of each case. This goal can be

achieved building a meta-ontology containing the sort

of information described above, again using Descrip-

tion Logic. In this way, analysts can proceed more ef-

ﬁciently as they can reuse the analysis metadata. Even

more importantly, in this way the coherence of differ-

ent studies is granted, providing an ontology with an

intrinsic robustness toward analysis processes. Thus,

ICSOFT 2007 - International Conference on Software and Data Technologies

further studies can be more easily performed by com-

paring different analysis on the same knowledge do-

main and/or the point of views of different analysts.

The description of the combination functions can

be speciﬁed through instances associated to the notion

of Combinable Concept of this meta-ontology:

CombinableConcept ≡

≡ ∃hasConcept.URI ⊓ ∀hasRelation.URI⊓

∃hasCombinationFunction.CombinationFunction

CombinationFunction ⊑

⊑ ∃hasName.String⊓ ∃hasImplementation.URI

Combinable concepts are those for which a com-

bination function can be deﬁned. A combinable con-

cept can be a datatype, a named concept (deﬁned via

a URI), or a concept derived from a composition of

relations beginning with a named concept (speciﬁed

by the URI where the start concept is deﬁned and the

relations from it).

3 DIMENSIONS AND THEIR

OPERATIONS

A dimension is described by a set of concepts and the

way to browse through them. Such browsing is per-

formed using the operators of abstraction and general-

ization between ontological instances, and the selec-

tion operators deﬁned below.

We consider that an abstract ontology is consti-

tuted only by the terminological knowledge. An on-

tology that contains a set of instance axioms (the Abox

of the ontology, composed by axioms specifying the

class C of an instance a -C(a)- or the relations be-

tween two instances a and b -R(a,b)-) is called con-

crete ontology. As it is usual in description logic, the

interpretation of the ontology is (I = ∆

), where ∆

denotes the set of instances belonging to an ontology

O , and .

the interpretation of the concepts deﬁned on

O ; I  x represents that x is deduced from I; ⊤ rep-

resents the top concept: thing. We denote with

A the

Abox of

O , with R the set of axioms associated with

the relations of an ontology, with N

and N

the set

of named concepts and relations of the ontology, re-

spectively. Besides, the interpretation of a datatype is

deﬁned by I

= (Φ

), where Φ

denotes the data

set belonging to a datatype, Φ, of the ontology, and

associates each datatype Φ with a strict subset of

data in Φ

, ∆

∩Φ

0. All representable data in the

ontology belongs to the set

U = ·

∪ ⊕

The deﬁnition of path between two concepts C

and C

′

and that of dimensional partial order are given

below. Intuitively, the former is the set of lists of

relation-concept pairs that links C and C

′

by using

consistent ontological deﬁnitions, and the latter is

used to relate concepts using both the aggregate (as

deﬁned bellow) and the hierarchical order between

concepts implicitly deﬁned in a given ontology

O .

Deﬁnition 1. Path(C,C

′

) = ⊕

1≤i≤n

is an ag-

gregation path from concept C to concept C

′

of the

ontology

O if C

= C, C

= C

′

, and there exists an in-

terpretation I = (∆

) of

O such that ∃x

∈ ∆

,0 ≤

i ≤ n, such that x

∈ C

y x

∈ C

i−1

∈ R

, for

1 ≤ i ≤ n, R

∈ RS

The process of path retrieval must be exhaustive

enough to allow the recovering of all aggregation re-

lations between two concepts. It can be performed

by using an extension of the tableau algorithm for the

SHOIQ(D) language (Danger, 2007). It is worth em-

phasizing that these paths not only describe the aggre-

gation relations between two concepts, but also the

aggregation order between all concepts of an ontol-

ogy.

Deﬁnition 2. Let

O be an ontology. A dimensional

partial order, denoted as

⊏

→ is a partial order between

all possible pairs of concepts C,C

′

∈ N

deﬁned ac-

cording to the following constraints:

• C

⊏

→ C

′

if C

′

⊏ C, or

• C

⊏

→ C

′

if ∃Path(C,C

′

)

The symbol

⊏

→ ∗ is the reﬂexive and transitive closure

for relation

⊏

→.

Deﬁnition 3. Let

O be an ontology. The pair D =

⊏

→) is a conceptual dimension, being C

a set of

satisﬁable concepts in

O , ⊤ ∈ C

and

⊏

→ the relation

of dimensional partial order for the elements in C

Example 1. In Figure 1 a workplace dimension

which combines hierarchical and aggregation rela-

tions is shown. This dimension can be used to identify

a speciﬁc place with different levels of granularity.

Workplace

Sector

Stratigraphic

Unit

Archaeological

Site

Municipality

Country

Continent

Figure 1: Workplace. dimension.

Operations

The ontological instances of each dimension can

be represented by using different point of views of

ANALYSIS OF ONTOLOGICAL INSTANCES - A Data Warehouse for the Semantic Web

(concepts associated with) the dimension. It is thus

necessary to deﬁne two different kinds of operations

over such instances. The ﬁrst one is the selection op-

erator, used to specify the interest portion of the in-

stance that must be shown (for example, when the

concept represented by a dimension is replaced by a

concept related to the ﬁrst one by an aggregate re-

lation). The second important operation is the gen-

eralization, used to generalize a set of instances (for

example, when the concept represented by a dimen-

sion is replaced by a concept related with the ﬁrst one

by a hierarchical relation). The following deﬁnitions

formalize these operators.

Deﬁnition 4. Let

O be a concrete ontology with Abox

A ; the description of an instance a ∈ A is the set

d(a) = {R(a,b), R(a,b) ∈

A }. This instance is said

to be of type C if C is the most speciﬁc concept that

can be deducted from I for a, i.e., ∀C∗ such that

I  C ∗ (a), C ⊏ C∗.

Deﬁnition 5. Instance a

′

is called the specialization

of an instance a of class C towards class C

′

, if its de-

scription d(a ↑

′

) is not undeﬁned, and if

A \C(a) ∪

′

),d(a)}∪d(a ↑

′

) is consistent. The descrip-

tion of d(a ↑

′

), is deﬁned as follows:

′

)|R(a,b) ∈ d(a),

fe(C,C

′

)(R) = {

′

d(b ↑

′′

′

) 6= unde fined},

if ∃ f

(C,C

′

d(a ↑

′

)

d(a)

unde fined, otherwise

where f

is a specialization function of the con-

cept C to the conceptC

′

. This function deﬁnes how to

transform each relation on the abstract concept to the

appropriate relation on the specialized concept (Dan-

ger, 2007).

The operation of abstraction of an instance, de-

noted by d(a ↓

′

), can be deﬁned in a similar way.

Deﬁnition 6. Let

O be an ontology. Let ℓ be an un-

deﬁned data that represent any data in

U . A pseudo-

instance a

of type C is a selector if its description,

d(a), satisﬁes that:

∀b

∃{R

,...,R

} ⊆ N

(a,a

),...,R

n−1

,b) ∈ d(a) ⇒

b ∈ {ℓ}

∀C

′

,∃Path(C,C

′

)and R

,..., R

is the order of the relations on Path(C,C

′

)

′I

Deﬁnition 7. Let

O be a concrete ontology with Abox

A , a ∈ U . Let ℓ be an undeﬁned data that represent

any data in

U . An instance a

′

∈ U is selected by an

instance selector a if:

We call a pseudo-instance because ℓ does not belong to

the ontology, although in order to improve the clarity of the

explanation we will call it instance selector.

• a

′

∈ Φ, a

′

∈ {a,ℓ} or

• a ∈ C

,C ∈ N

y a

′′

computed for d(a

′

↓

′′

) is

such that:

∀b ∈ Φ∪ {ℓ} such that ∃R

(a,a

),...,

n−1

),R

,b) ∈ d(a) ⇒

∃R

′′

),...,R

n−1

′′

n−1

′′

),R

′′

) ∈

∈ d(a

′′

) ∧ (b

′′

= b∨ b = ℓ)

Example 2. In Figure 2 a fragment of an arche-

ology ontology is represented. A selector in-

stance a constituted by the set {has

morphology(a, a’),

has

group(a’, “open”), has order(a’, ℓ), has decoration(a,

ℓ), has

color(a, “gray”) } allows the users to recover

from the ontology the descriptive fragments of ce-

ramic artifacts instances according to the properties

group, order, decoration and color, but notice that the

morphologic group of the ceramic must be open, and

its color gray.

Deﬁnition 8. Let c

be a function (called a com-

bination of simple data) which allows each datatype

Φ to be mapped to another function rep

, which in

turn maps subsets of Φ in a compact representation

of the input subset

. Let d, d

′

be two data in ∆

∪ Φ

A complete combination of data d and d

′

is the data

∪

′

computed as follows:

(Φ)({d, d

′

}), if {d, d

′

} ⊆ Φ,

′

∪

if d ∈

∗

C, d

′

∈

∗

′

C ⊑ C

′

d(d ↑

′

) ∪ d(d

′

\{R(d

′

),...,R(d

′

,...,b

} ⊆

∗

′′

}∪

∪{R(d

′

∪

...

∪

{R(d

′

),...,R(d

′

,...,b

} ⊆

∗

′′

}

if d ∈

∗

C, d

′

∈

∗

′

⊑ C and

during the process

no indeﬁnitions

are obtained

unde fined,

otherwise

In particular, two types of functions for data com-

bination can be identiﬁed:

• uniﬁcation functions which map data from 2

Φ. They can be oriented to statistical indexes,

such as means or deviations. Of special interest

is the function restrictive uniﬁcation deﬁned for

all types of data as:

f({d

,...,d

}) =



d, if d

= ... = d

unde fined, otherwise

For example, if Φ

= {Z }, c

{

Z , rangeOfIntegerSets

}, where the function

rangeOfIntegerSets has domain Z and as images the most

compact representations of integer sets, 2

, using integer

range sets, then rangeOfIntegerSets({1,2, 3, 4,5, 7}) =

[1,5] ∪ [7, 7].

ICSOFT 2007 - International Conference on Software and Data Technologies

Archaeological

Site

Municipality

Country

Archaeological

Context

Workplace

Artifact

Id_Wp

Id_SU

Stone Artifacts

Morphology

Ceramic Part

Body

Neck

Base

ovoid, cylinder,

cone, ...

open,

closed

deep, flat

gray, ochre,

Black,...

weight

Excavation

date

Period

Interpretation

Culture

…

Ceramic

Artifacts

Id_Sector

Sector

Continent

polished, stamping,

incision,...

Figure 2: Fragment of an archeology ontology. The concepts are represented in ellipses, the shady ones correspond to root

concepts of different hierarchies of the ontology and the dashed-line ones correspond to datatypes. The thick lines represent

the hierarchical relations between concepts, the thin lines aggregation relations and the dashed lines represent hierarchies

between relations.

• generalization functions: which map data from 2

to a compact notation.

Example 3. Let d be an instance represented by the

set {has

color(a, “black”), has decoration(“incisions”),

has

weight(20g)} and d

′

an instance represented by

{has

color(a, “ochre red”), has decoration(“incisions”),

has

weight(12g)}. The outcome of the combination

of d and d

′

by using the union of sets as combina-

tion function for the has

color and has decoration

relations and the maximum of values as com-

bination function for has weight is {has color(a,

{“black”, “ochre red”}), has

decoration({“incisions”}),

has

weight(20g)}. However, if all relations have to be

combined using uniﬁcation functions, the result is un-

deﬁned, because d and d

′

have different values for the

same relations.

4 MULTIDIMENSIONAL

CONCEPTUAL SPACES

The deﬁnitions of multidimensional conceptual

schema and multidimensional conceptual space are

given below. The former can be seen as the structure

which deﬁnes how to analyze the information. The

latter is the container where the analyzed instances

are described according to the speciﬁcations of the

schema.

Deﬁnition 9. Let

O be an ontology, C ∈ N

and

CC = path

,..., path

a set of paths from concept C

toward concepts C

∗

,...,C

∗

, respectively, (i.e., path

⊕

1≤ j≤n

i−1

∗

⊕

∗

). The tuple E =

,...,D

,...,c

) is an n-dimensional (or sim-

ply multidimensional) conceptual schema of

O as-

sociated to C using paths CC, where it is satisﬁed

that∀D

= (Cd

⊏

→), ∀C

∈ Cd

, C

∗

⊏

→ ∗C

and c

is a function which assigns a combination function to

each simple type of data that can be reached from a

concept in Cd

Deﬁnition 10. Let

O be a concrete ontology with

Abox

A . Let E be an m-dimensional conceptual

schema of

O associated to C using the paths in CC.

The set of tuples {t

,...,t

}, t

= (d

,...,d

), t

,∀i, j ∈ {1,..., n} is called m-dimensional concep-

tual space of

O with respect to E, if for each data:

1. d

∈ C

∈ Cd

, or

2. ∃d ∈ C

such that d

is an instance selected with

respect to a selector instance of class C

, or

3. d

is a generalized data of a dataset selected with

respect to a selector instance of class C

∈

Each t

represents a generalized instance of a set of

instances of A selected with respect to a selector in-

stance of class C which contains all paths in CC.

ANALYSIS OF ONTOLOGICAL INSTANCES - A Data Warehouse for the Semantic Web

Table 1: Analysis and generalization of ceramic artifacts.

Each row maintains the number of instances represented by

the associated description (values between parenthesis).

Decoration Morphology Weight [g]

incisions

group:{open}

order:{ovoid, spheroid} 20 (3)

polished

group:{open, closed}

order:{cone, ovoid} 500 (3)

stamping

group:{open}

order:{cone} 5 (2)

without dec.

group:{closed}

order:{spheroid} 12 (2)

Example 4. The multidimensional schema shown

in Table 1 has been constructed by using the

following multidimensional conceptual space as-

sociated to the concept CeramicArtif act, E =

union

max

), where:

= ({Decoration},

0),

= ({Morphology},

0),

= ({Weight},

0),

union

= {

Φ,

∪

|Φ ∈ Φ

max

= {

ℜ,Max

∪ is deﬁned by

∪(d

,...,d

) = {d

} ∪ ... ∪ {d

} and

Max represent the maximum function for real num-

bers (in this case to compute the maximum weight in

each generalization).

Algorithm for the generation of multidimensional

conceptual spaces

Algorithm 1 describes how an m-dimensional

conceptual space is obtained from a given ontology

O and a conceptual schema of O associated to the

class C, E. The technique of attribute-oriented induc-

tion (Carter and Hamilton, 1998; Han et al., 1998)

was taken as inspiration for its simplicity and ﬂexi-

bility. One remarkable common feature between such

technique and this one is that no restrictions are put

on the data. The ﬁrst step of the algorithm is to de-

ﬁne a mapping between each data of each dimension

and its generalized value, according to the conceptual

schema for instances of typeC. Then, the generalized

instances are formed, substituting each m-tuple with a

generalized m-tuple that constitutes the generalization

of the instances of the same type for each dimension.

5 INTERESTING CONCEPTUAL

SPACES

A conceptual space as previously deﬁned allows users

to freely browse the conglomerate of objects and re-

view the aspects they consider more interesting. If the

Algorithm 1 Generation of multidimensional concep-

tual spaces.

Require: O , E, C, β

, β

{ O , instantiated ontology with Abox A ,

C, reference class to generate multidimensional spaces,

E, multidimensional schema,

, β

, minimum and maximum percentage of different

values in each dimension}

Ensure: E

′

, multidimensional space associated to the schema E

and the ontology O }

First part: Eliminate the irrelevant dimensions and create

the generalization mappings between the data.

= {a|a ∈ C

Iterate A

and group different data associated to each di-

mension of E.

if β

≥

≥ β

must be removed) then

E = (D

,..., D

i−1

i+1

,..., D

,..., c

i−1

i+1

,..., c

)

else

Generate mappings (d,d

′

) for each value d collected

in D

, d(d ↓

sup

′

) with C

sup

being one of the classes

direct ancestors of C

, d ∈ C

Second part: Creation of the multidimensional space

′

= {}

for all a ∈ A

Let t = (d

,...,d

) be the tuple of data associated to a,

according to E.

′

= (d

′

,...,d

′

), computed from the mapping gener-

ated in the previous step, being C

′

the type of data d

if ∃t

′′

= (d

′′

,...,d

′′

) ∈ E

′

, being C

′′

the type of data d

′′

such that ∀i ∈ {1,..,n} then

′′

= (d

′′

∪

′

,...,d

′′

∪

′

)

else

′

= E

′

∪ {t

′

}

data analyst were not informed on the features of the

object distribution in the domain, or if the number of

such features were too high, her analysis capabilities

would be strongly affected. Nevertheless, this prob-

lem can be overcome with an analysis tool able to

suggest to the user some interesting analysis dimen-

sions. This can be obtained with a customized feature

selection process. The concept of feature selection

was introduced for the task of dimensionality reduc-

tion originally deﬁned in Statistics and widely studied

in Machine Learning.

Anyway, it is necessary to deﬁne a way of assess-

ing the importance of a given feature subset. The

measures more used in the literature are the informa-

tion gain , the Gini index , the uncertainty and the

correlation coefﬁcients. Nevertheless, the large num-

ber of studies that argue in favor of decision trees and

information gain (like ID3 and C4.5), made us decide

to choose such a combination for our feature selec-

tion process. More exactly, in this work we propose

ICSOFT 2007 - International Conference on Software and Data Technologies

Let Cp be an ancestor concept in a dimension,

,...C

} concepts directly specialized of Cp, and

objs a function which associates each concept with its

objects set. Let β be the percentage of maximum cor-

relation between the number of objects of descendant

and ancestor concepts.

If ∃C

such that |objs(C

)| ≥ β|objs(C

Promote concepts C

,...,C

to level of concept C

Delete C

from the dimension.

Figure 3: Rules for ﬁltering out uninteresting concepts.

to compute interesting conceptual multidimensional

schemata associated to a concept C by way of algo-

rithm 2, an adaptation of the one proposed by (Han

and Kamber, 2001). The purpose of such customized

algorithm is that of using the distributions of a set of

concepts in relation to a set of classes, in order to se-

lect the compositions of relations (paths) that assure

the highest information gain with respect to the distri-

bution. The main block of the algorithm is procedure

ComputePseudoSchemata which selects, as a ﬁrst

step, the paths with highest information gain. Then,

for each path, the initial distribution is subdivided ac-

cording to the possible values of the data associated to

the objects through such path, and the process of sub-

division of the clusters is repeated while the informa-

tion gain is maintained in a desirable range. A parallel

task performed during this process is the computation

of the weights which indicate the interest estimation

for each conceptual schema that may be generated for

each path set.

A further ﬁltering step can be done for each di-

mension taking into account the relation between the

quantity of objects associated to a concept and to its

ancestor concept. In this way, an uninteresting an-

cestor concept can be removed from the dimension,

following the rule described in Figure 3.

6 USING A MULTIDIMENSIONAL

CONCEPTUAL SPACE

As explained in the introduction, the major advantage

of a multidimensional space is that a user can see her

data from different points of view. Tabular models in

3D, function graphs, histograms and relational graphs

are the most natural tools to use for the analysis of

results. The possibility of realizing generalizations

and selections at each level also represents a powerful

analysis skill. In this way, it is possible to character-

ize object classes in relation to others, allowing for

the comparison and discovering of class features.

Although these are the analysis methods that have

traditionally been used, an analyst may be interested

Algorithm 2 Generation of interesting schemata of

multidimensional conceptual spaces.

Require: O , C, I, γ, α

{O , ontology with Abox A I

C, reference class for generating a schema of multidimen-

sional spaces,

γ, minimum allowed information gain

α, minimum number of objects in a description}

Ensure: SP

{SP, set of paths associated to C whose subsets can be

used to form interesting multidimensional schemata}

Let Paths

be the dictionary of paths starting fromC with

key in the destination concept.

S = {S

= {a ∈ C

},∀C

,i ∈ {1,...,n},C ⊑ C

6= C

, j ∈ {1,..., n}, i 6= j}

SP = ComputePseudoSchemata(Paths

,S,γ)

function ComputePseudoSchemata(Paths

,S,γ) :

{Outputs a pair, in which the ﬁrst element is a set of

paths and the second a real value indicating the impor-

tance of the set of paths for the generation of interesting

schemata}

First phase: Compute the importance of the current clus-

tering

ve =

∑

∈S

imp(S

), where

imp(S

) =



1, if |S

| > α

1− |S

|/α, otherwise

Second phase: Retrieve the subsets with highest informa-

tion gains

if |S| = 1 then

Output (

0,ve)

else

SP =

Compute information gain, G, for each destination

concept Paths

according to the classiﬁcation in S

Let {path

,..., path

} be the set of paths which allow

a high discrimination between objects ordered accord-

ing to the gain value: G(path

) ≥ ... ≥ G(path

) >

γ and C

the destination concept associated to path

path

, k ∈ {1, ..., m}.

for all k ∈ {1,..., m} do

CD = {C

′

⊑ C

}

for all C

′

∈ CD do

S = {S

′

= {a ∈ C

∈ {1, ...,n},C ⊑ C

; a is

related to some data d according to path

∧ d ∈

′I

}}

SP = SP ∪

(sp,v)∈ComputePseudoSchemata(Paths

−{path

},S,γ)

{

∪ sp,v+ ve

}

Output SP

in other more complex insights about the behavior of

her data. Various pattern analysis tools have been de-

scribed in the literature, especially with the develop-

ment of data mining research. It is thus plausible to

create new algorithms for the extraction of interest-

ing patterns in the multidimensional conceptual envi-

ronment (Han and Kamber, 2001). Some of the most

ANALYSIS OF ONTOLOGICAL INSTANCES - A Data Warehouse for the Semantic Web

interesting patterns to extract are:

• of characterization: they represent rules for char-

acterizing a class of objects according to the val-

ues of a subset of its dimensions. They can be ex-

pressed by: class X ⇒ Condition[p

], where p

100×count(Condition)

, which means that, in class X,

Condition occurs in a p

percentage of the cases,

Count is a function that counts the number of

times in which a certain condition occurs, and n

is the total numbers of analyzed objects. p

known as characterization coefﬁcient.

• of discrimination: they represent rules for char-

acterizing a class of objects for which a given

pattern is not observable with a certain fre-

quency in any other class. They can be ex-

pressed by: class X ⇐ Condition[p

], where p

100×count(Condition∧classX)

count(Condition)

. p

is known as discrim-

ination coefﬁcient.

• association rules at different levels: they represent

rules for characterizing a class of objects in which

co-occurrence relations can be found in the at-

tributes of the multidimensional space. They can

be expressed by: Condition

⇒ Condition

[s,c],

where s =

100×count(Condition

∧Condition

)

,c =

100×count(Condition

)

count(Condition

∧Condition

)

, which means that in a

s% of the objects bothCondition

and Condition

are observed and that c% of objects satisfying

Condition

also satisfy Condition

. s is known as

the support of the rule and c as its conﬁdence.

In order to customize these results to the model

we presented, the multidimensional conceptual model

must take into account how many objects in the con-

crete ontology are characterized by the description of

each cell.

7 CONCLUSION

The proposal of this paper is the formalization of a

data warehouse tool for the Semantic Web. The tool

is based on the theoretical foundations of Description

Logic and on the current developments of information

data generalization. Besides, an algorithm to gener-

ate interesting conceptual spaces according to the data

distribution is proposed. Ideas for adapting Data Min-

ing techniques in order to allow users a better knowl-

edge discovering from the Semantic Web have also

been exposed. Implementation of the proposal frame-

work, on which we are now working, consists of two

main components: 1) a reasoner (which works in an

off-line way) that retrieve instance models and ab-

straction functions from an ontology; and 2) a data

warehouse processor that use such models and func-

tions in order to perform all the necessary generaliza-

tions. This second module has been optimized con-

sidering some of the OLAP solutions.

ACKNOWLEDGEMENTS

This research has been partially supported by the

project TIN2005-09098-C05-04 (2006-2008).

REFERENCES

Abell

o, A. (2002). YAM

: A Multidimensional Concep-

tual Model. PhD thesis, Universitat Polit

ecnica de

Catalunya.

Baader, F. and Sattler, U. (2003). Description logics

with aggregates and concrete domains. Inf. Syst.,

28(8):979–1004.

Binh, N. T. and Tjoa, A. M. (2001). Conceptual mul-

tidimensional data model based on object-oriented

metacube. In SAC ’01: Proceedings of the 2001

ACM symposium on Applied computing, pages 295–

300. ACM Press.

Buzydlowski, J. W., Song, I.-Y., and Hassell, L. (1998).

A framework for object-oriented on-line analytic pro-

cessing. In DOLAP ’98: Proceedings of the 1st

ACM international workshop on Data warehousing

and OLAP, pages 10–15. ACM Press.

Carter, C. L. and Hamilton, H. J. (1998). Efﬁcient attribute-

oriented generalization for knowledge discovery from

large databases. IEEE Transactions on Knowledge

and Data Engineering, 10(2):193–208.

Danger, R. (2007). Extracci

on y an

alisis de informaci

desde la perspectiva de la Web Sem

antica (Informa-

tion extraction and analysis from the viewpoint of

Semantic Web, in spanish). PhD thesis, Universitat

Jaime I.

Hacid, M.-S. and Sattler, U. (1998). Modeling multidimen-

sional databases: A formal object-centered approach.

In Proceedings of the Sixth European Conference on

Information Systems.

Han, J. and Kamber, M. (2001). Data Mining: Concepts

and Techniques. Morgan Kaufmann.

Han, J., Nishio, S., Kawano, H., and Wang, W. (1998).

Generalization-based data mining in object-oriented

databases using an object cube model. Data Knowl-

edge Engineering, 25(1-2):55–97.

Nguyen, T. B., Tjoa, A. M., and Wagner, R. (2000). An ob-

ject oriented multidimensional data model for OLAP.

In Web-Age Information Management, pages 69–82.

Trujillo, J., Palomar, M., G

omez, J., and Song, I.-Y. (2001).

Designing data warehouses with OO conceptual mod-

els. Computer, 34(12):66–75.

ICSOFT 2007 - International Conference on Software and Data Technologies