Authors:
Rodrigo Rocha Silva
1
;
Celso Massaki Hirata
1
and
Joubert de Castro Lima
2
Affiliations:
1
Aeronautics Institute of Technology, Brazil
;
2
Federal University of Ouro Preto, Brazil
Keyword(s):
OLAP, Data Cube, Inverted Index, High Dimension, and External Memory.
Related
Ontology
Subjects/Areas/Topics:
Data Engineering
;
Data Warehouses and OLAP
;
Databases and Data Security
;
Databases and Information Systems Integration
;
Enterprise Information Systems
;
Large Scale Databases
Abstract:
Approaches based on inverted indexes, such as Frag-Cubing, are considered efficient in terms of runtime
and main memory usage for high dimension cube computation and query. These approaches do not compute
all aggregations a priori. They index information about occurrences of attributes in a manner that it is time
efficient to answer multidimensional queries. As any other main memory based cube solution, Frag-Cubing
is limited to main memory available, thus if the size of the cube exceeds main memory capacity, external
memory is required. The challenge of using external memory is to define criteria to select which fragments
of the cube should be in main memory. In this paper, we implement and test an approach that is an
extension of Frag-Cubing, named H-Frag, which selects fragments of the cube, according to attribute
frequencies and dimension cardinalities, to be stored in main memory. In our experiment, H-Frag
outperforms Frag-Cubing in both query response time and main memory usage
. A massive cube with 60
dimensions and 109 tuples was computed by H-Frag sequentially using 110 GB of RAM and 286 GB of
external memory, taking 64 hours. This data cube answers complex queries in less than 40 seconds. Frag-
Cubing could not compute such a cube in the same machine.
(More)