work and providing means for decision support. The term has been popularized since
the late 1980’s by Howard Dresner and the Gartner Group [6].
The data managed by Business Intelligence systems have certain specific
characteristics, reflecting in the way they are gathered, stored and retrieved, whitch
will be briefly explained in the next sessions.
3.1 Extract, Transform and Load
The process of obtaining and modifying the data for feeding a Business Intelligence
database is called ETL, in respect to the three steps it involves: Extract, Transform
and Load.
In the Extract step, data are typically queried from other systems of the company,
the so-called OLTP – On Line Transactional Processing – that supports the day-by-
day organization operations. Spreadsheets and plain text files can also be used as data
sources for Extraction.
In the Transform step, the data are handled aiming to fit in the view the users of
the decision support system have about the process and facts they represent. This
means unity conversions, codes standardization and data filtering, categorization and
so forth.
In the Load step the data produced by the prior steps are stored in a special
database structure called Data Warehouse, which is described as follows.
3.2 Data Warehousing and Data Mart
The Data Warehouse (DW) is a large data repository [7], obtained from all the
relevant sections of the organization. “The Data Warehouse contains the raw material
for management’s decision support system”. [8]
When the Data Warehouse is updated from ETL, no data is deleted or overwritten.
Instead, the data are accumulated, constructing the history of the data involved on the
company’s operations.
The data structure of a DW often does not follow the common database systems
techniques, that uses normalization to ensure data integrity and less storage space.
Instead, the data are de-normalized and arranged in such a way that helps to query for
reports and analysis.
From the organization’s Data Warehouse, the information related to each area of
interest is separated, forming Data Marts. Each Data Mart (DM) is a subset of the
whole DW focused on a special subject of the decision support systems’ users.
Data on a DM is arranged according to the star schema: each relevant data from
the organization, known as fact, is stored in a structure called cube, which can be
viewed or decomposed according to pre-defined categorization of the data, known as
dimensions of the cube. The dimensions are typically some meaningful attributes of
the fact, such as time, customer type, products and geography of the organization.
The dimensions can have several hierarchies for classifying the facts according to
many points of view, each hierarchy being composed by different levels of
granularity.
157