A Hybrid Memory Data Cube Approach for High Dimension Relations
Rodrigo Rocha Silva, Celso Massaki Hirata, Joubert de Castro Lima
2015
Abstract
Approaches based on inverted indexes, such as Frag-Cubing, are considered efficient in terms of runtime and main memory usage for high dimension cube computation and query. These approaches do not compute all aggregations a priori. They index information about occurrences of attributes in a manner that it is time efficient to answer multidimensional queries. As any other main memory based cube solution, Frag-Cubing is limited to main memory available, thus if the size of the cube exceeds main memory capacity, external memory is required. The challenge of using external memory is to define criteria to select which fragments of the cube should be in main memory. In this paper, we implement and test an approach that is an extension of Frag-Cubing, named H-Frag, which selects fragments of the cube, according to attribute frequencies and dimension cardinalities, to be stored in main memory. In our experiment, H-Frag outperforms Frag-Cubing in both query response time and main memory usage. A massive cube with 60 dimensions and 109 tuples was computed by H-Frag sequentially using 110 GB of RAM and 286 GB of external memory, taking 64 hours. This data cube answers complex queries in less than 40 seconds. Frag- Cubing could not compute such a cube in the same machine.
References
- Brahmi, H., Hamrouni, T., Messaoud, R., and Yahia, S. “A new concise and exact representation of data cubes,” Advances in Knowledge Discovery and Management, Studies in Computational Intelligence (vol. 398), Springer, Berlin-Heidelberg, 2012, pp. 27- 48.
- Codd, E. F. “Relational completeness of data base sublanguages,” R. Rustin (ed.), Database Systems, Prentice Hall and IBM Research Report (RJ 987), San Jose, California, 1972, 65-98.
- Gray, J., Chaudhuri, S., Bosworth, A., Layman, A., Reichart, D., Venkatrao, M., Pellow, F., and Pira-hesh, H. “Data cube: a relational aggregation operator generalizing group-by, cross-tab, and sub-totals,” Data Mining and Knowledge Discovery (1), 1997, 29-53.
- Han, J. Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2011.
- Li, X., Han, J., and Gonzalez, H. “High-dimensional OLAP: a minimal cubing approach,” Proceedings of the International Conference on Very Large Data Bases, 2004, pp. 528-539.
- Lima, J. d. C. and Hirata, C. M. “Multidimensional cyclic graph approach: representing a data cube without common sub-graphs,” Information Sciences 181 (13), July 2011, 2626-2655.
- Ruggieri, S., Pedreschi, D., and Turini, F. “Dcube: discrimination discovery in databases,” Proceedings of ACM SIGMOD International Conference on Management of Data, New York, NY, USA, 2010, pp. 1127-1130.
- Silva, R. R., Lima, J. d. C., and Hirata, C. M. “qCube: efficient integration of range query operators over a high dimension data cube,” Journal of Information and Data Management 4 (3), 2013, 469-482.
- Sismanis, Y., Deligiannakis, A., Roussopoulos, N., and Kotidis, Y. “Dwarf: shrinking the petacube,” Proceedings of ACM SIGMOD International Conference on Management of Data, New York, NY, USA, 2002, pp. 464-475.
- Xin, D., Shao, Z., Han, J., and Liu, H. “C-cubing: efficient computation of closed cubes by aggregation-based checking,” International Conference on Data Engineering, Atlanta, Georgia, USA, 2006, pp. 4.
Paper Citation
in Harvard Style
Silva R., Hirata C. and Lima J. (2015). A Hybrid Memory Data Cube Approach for High Dimension Relations . In Proceedings of the 17th International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 978-989-758-096-3, pages 139-149. DOI: 10.5220/0005371601390149
in Bibtex Style
@conference{iceis15,
author={Rodrigo Rocha Silva and Celso Massaki Hirata and Joubert de Castro Lima},
title={A Hybrid Memory Data Cube Approach for High Dimension Relations},
booktitle={Proceedings of the 17th International Conference on Enterprise Information Systems - Volume 1: ICEIS,},
year={2015},
pages={139-149},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005371601390149},
isbn={978-989-758-096-3},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 17th International Conference on Enterprise Information Systems - Volume 1: ICEIS,
TI - A Hybrid Memory Data Cube Approach for High Dimension Relations
SN - 978-989-758-096-3
AU - Silva R.
AU - Hirata C.
AU - Lima J.
PY - 2015
SP - 139
EP - 149
DO - 10.5220/0005371601390149