obviously faces a double complexity:
- data sources’ organization;
- elaboration process for multidimensional
schemas.
We propose a hybrid-driven approach to assist
the decision-makers in elaborating his
multidimensional schemas himself and its evolution.
3 DATA-SOURCES
AND DECISION-MAKERS’
REQUIREMENTS
The source is a conceptual schema, represented with
a UML class diagram (a widely recognized schema
in the database community). Figure 1 presents the
source schema (our running example). This example
describes products stock and sales.
Decision-makers, who want to analyse data, can
express their requirements in informal terms,
without making reference directly to the data source
schema. For example, it is possible to analyse:
- the number of Orders by Families and
Products;
- the turnover (sum of amounts of orders) by
month and by product;
- the number of orders with a product that has
a sales price between two values.
Requirements are expressed here in natural
language in terms of analysis subjects and analysis
axes. This type of expression is used in the industrial
domain as shown in a field study (Annoni et al.
2006).
4 THE ELABORATION PROCESS
Our work aims at allowing a decision-maker to
elaborate data-mart schemas himself from available
date-sources and his analysis requirements. Our
objective is to eliminate, as much as possible, the
need of an administrator or a computer specialist
who would be responsible for elaborating data-marts
from specifications provided by the decision-maker.
In this paper, we do not address issues related to
multiple sources. Our process is based on a hybrid
approach. It starts from a source schema and
integrates gradually the requirements (in terms facts,
dimensions and hierarchies) for generating a
multidimensional schema.
The Class Diagram (CD), that corresponds to the
source schema is analysed and transformed to make
it useable. Many-to-one associations are kept as they
are. Many-to-many associations become a class
(with no attributes) linked to its related classes.
Association-classes attached to a link become a
standard class linked to each of the related classes.
Composite-aggregation are considered as
associations and treated as such. For generalizations,
the sub-class is separated to generate classes.
Figure 2: Our design process that allows a decision-maker
to build data-mart schema.
The process includes four successive steps; each
step produces a new schema more complete than the
one of the previous step. The last schema
corresponds to the expected data-mart. Thus user
requirements are incrementally added.
The first step
consists in extracting from the
source CD a limited set of candidate facts and
display them in the first of three intermediate
schemas noted IS
1
. The choice of the facts is based
on personalization techniques (see § 4.4).
In IS
1
, the decision-maker chooses the fact that
he wants to analyse from the ones proposed in the
intermediate schemas, he then specifies the required
aggregation functions. He can designate several facts
and thus elaborate a constellation schema.
In a second step
, the system automatically
elaborates the second intermediate schema noted
IS
2
; it proposes all possible dimensions associated
with the chosen fact.
In IS
2
, the decision-maker is able to indicate
dimensions which are the analysis axes according to
which he wishes to analyse the fact.
The third step
generates the third intermediate
schema noted IS
3
presenting the decision maker with
all possible hierarchies for each dimension.
In IS
3
, the decision-maker chooses each
hierarchy that correspond to his needs.
In the final fourth step
of the process, the system
allows elaborating the data-mart schema which
corresponds to the decision-makers’ requirements.
Personalization meta-data will be memorized here.
The interest of this incremental process is in the
meta-data which the system saves progressively.
These meta-data will allow the correspondence
COMPUTER-AIDED DATA-MART DESIGN
241