languages created to model ontologies like the RDF
Schema, DAML or OIL.
2.2 Data Integration
Data integration is concerned with unifying data that
shares common semantics but originates from
heterogeneous sources. The level of unification
depends on the type of heterogeneity: structural,
when the source data models are different; syntactic,
when the source data models use different
languages; or semantic, when there are different
concepts with similar meaning or similar concepts
with different meanings.
Most of the problems related to syntactic and
semantic heterogeneity have been solved with
ontologies that are used for mapping concepts
between different data models. In these cases, the
ontologies allow the translation between different
sources so that they arrive unified to the destined
data model. An example of this type of integration is
showed in Kedad, & Métais (2002) where a domain
ontology was defined to unify data from sources
with different syntactic terminology but semantically
related. In this type of problems the use of the
ontology is not to conceptualize the entire domain,
but only those zones that have syntactic or semantic
problems.
For the problem of structural heterogeneity the
ontology is used not as a translator but as a reference
data model in which all sources must stay within.
One of the areas that has used a lot this type of
ontologies to integrate knowledge is bio-informatics in
which the semantic and structural
heterogeneity is
solved as shown in Clusters & Smith Fielding,
(2004) through a case study.
In this context, data warehouses been task
independent and defining a reference model that
allows to integrate multiple sources can be seen like
an ontology that solves the problem of structural
integrity of organizational databases. The
compatibility between data warehouses and
ontologies is so close, that the concept of data
warehouse can be materialized through an ontology.
3 PROPOSED ARCHITECTURE:
ONTOLOGY - BASED DATA
WAREHOUSES
The principle of the architecture is that the design of
a data warehouse must be done looking to reflect the
domain of the world most close to reality,
independently of the complexity of the resultant
model, because for presentation purposes this can be
reduced to the level of simplicity required by the
final user.
In the proposed architecture (shown in figure 1)
the data warehouse is filled with data from operating
systems and data obtained from external ontologies
that are treated in an intermediate preparation layer.
The objective of this layer is to transform and
generate the correct structures so that they can be
loaded to the warehouse. It is in this layer that the
taxonomic ontologies, that allows the integration of
semantic and syntactic heterogeneity, are located.
The data warehouse is built upon ontologies that
allow representing the world through structures of
great semantic power, obtaining as a result a model
much closer to reality than the dimensional model.
The warehouse is accessed through a mediator
which generates the correct views (virtual or
materialized) based on the level of comprehension
and detail required by each type of user. Depending
on the type of tool that each of them uses, the data
warehouse will be accessed directly or using the
mediator.
The data warehouse is constituted by a descriptive
ontology (a kind of ontology that according to
Kedad & Métais 2002 contains instances of their
classes that are stored in a database or other semi-
structured store media) that represents the world
domain. This ontology is administered by an
Ontology Management System (OMS) which
according to Cullot & Parent et al (2003) offers four
functionalities: allow data modeling, provide
efficient store services and instance management,
provide tools of reasoning, and allow queries over
the model and its instances. The OMS provides
inference engines that enrich the model even more,
because from facts originated in the sources they can
infer additional facts called derived facts Lee &
Goodwin et al (2003).
The warehouse can be built incrementally adding
more classes, properties and restrictions to the
ontology in accordance to the business process that
is been modeled. The data integration of the
different business processes is guaranteed by the
preparation layer and the equivalence properties
provided by OWL-like equivalentClass,
equivalentProperty and class consructors like
unionOf and intersectionOf, among others.
ICEIS 2006 - DATABASES AND INFORMATION SYSTEMS INTEGRATION
188