Implementing Multidimensional Data Warehouses into NoSQL

Max Chevalier, Mohammed El Malki, Arlind Kopliku, Olivier Teste, Ronan Tournier

Abstract

Not only SQL (NoSQL) databases are becoming increasingly popular and have some interesting strengths such as scalability and flexibility. In this paper, we investigate on the use of NoSQL systems for implementing OLAP (On-Line Analytical Processing) systems. More precisely, we are interested in instantiating OLAP systems (from the conceptual level to the logical level) and instantiating an aggregation lattice (optimization). We define a set of rules to map star schemas into two NoSQL models: column-oriented and document-oriented. The experimental part is carried out using the reference benchmark TPC. Our experiments show that our rules can effectively instantiate such systems (star schema and lattice). We also analyze differences between the two NoSQL systems considered. In our experiments, HBase (column-oriented) happens to be faster than MongoDB (document-oriented) in terms of loading time.

References

  1. Chaudhuri, S., Dayal, U., 1997. An overview of data warehousing and olap technology. SIGMOD Record, 26, ACM, pp. 65-74.
  2. Colliat, G., 1996. Olap, relational, and multidimensional database systems. SIGMOD Record, 25(3), ACM, pp. 64-69.
  3. Cuzzocrea, A., Bellatreche, L., Song, I.-Y., 2013. Data warehousing and olap over big data: Current challenges and future research directions. 16th Int. Workshop on Data Warehousing and OLAP (DOLAP), ACM, pp. 67-70.
  4. Dede, E., Govindaraju, M., Gunter, D., Canon, R. S., Ramakrishnan, L., 2013. Performance evaluation of a mongodb and hadoop platform for scientific data analysis. 4th Workshop on Scientific Cloud Computing, ACM, pp. 13-20.
  5. Dehdouh, K., Boussaid, O., Bentayeb, F., 2014. Columnar nosql star schema benchmark. Model and Data Engineering, LNCS 8748, Springer, pp. 281-288.
  6. Golfarelli, M., Maio, D., and Rizzi, S., 1998. The dimensional fact model: A conceptual model for data warehouses. Int. Journal of Cooperative Information Systems, 7, pp. 215-247.
  7. Gray, J., Bosworth, A., Layman, A., Pirahesh, H., 1996. Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total. Int. Conf. on Data Engineering (ICDE), IEEE Computer Society, pp. 152-159.
  8. Han, D., Stroulia, E., 2012. A three-dimensional data model in hbase for large time-series dataset analysis. 6th Int. Workshop on the Maintenance and Evolution of Service-Oriented and Cloud-Based Systems (MESOCA), IEEE, pages 47-56.
  9. Jacobs, A., 2009. The pathologies of big data. Communications of the ACM, 52(8), pp. 36-44.
  10. Kimball, R. Ross, M., 2013. The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling. John Wiley & Sons, Inc., 3rd edition.
  11. Lee, S., Kim, J., Moon, Y.-S., Lee, W., 2012. Efficient distributed parallel top-down computation of R-OLAP data cube using mapreduce. Int conf. on Data Warehousing and Knowledge Discovery (DaWaK), LNCS 7448, Springer, pp. 168-179.
  12. Li, C., 2010. Transforming relational database into hbase: A case study. Int. Conf. on Software Engineering and Service Sciences (ICSESS), IEEE, pp. 683-687.
  13. Malinowski, E., Zimányi, E., 2006. Hierarchies in a multidimensional model: From conceptual modeling to logical representation. Data and Knowledge Engineering, 59(2), Elsevier, pp. 348-377.
  14. Morfonios, K., Konakas, S., Ioannidis, Y., Kotsis, N., 2007. R-OLAP implementations of the data cube. ACM Computing Survey, 39(4), p. 12.
  15. Simitsis, A., Vassiliadis, P., Sellis, T., 2005. Optimizing etl processes in data warehouses. Int. Conf. on Data Engineering (ICDE), IEEE, pp. 564-575.
  16. Ravat, F., Teste, O., Tournier, R., Zurfluh, G., 2008. Algebraic and Graphic Languages for OLAP Manipulations. Int. journal of Data Warehousing and Mining (ijDWM), 4(1), IGI Publishing, pp. 17-46.
  17. Stonebraker, M., 2012. New opportunities for new sql. Communications of the ACM, 55(11), pp. 10-11.
  18. Vajk, T., Feher, P., Fekete, K., Charaf, H., 2013. Denormalizing data into schema-free databases. 4th Int. Conf. on Cognitive Infocommunications (CogInfoCom), IEEE, pp. 747-752.
  19. Vassiliadis, P., Vagena, Z., Skiadopoulos, S., Karayannidis, N., 2000. ARKTOS: A Tool For Data Cleaning and Transformation in Data Warehouse Environments. IEEE Data Engineering Bulletin, 23(4), pp. 42-47.
  20. TPC-DS, 2014. Transaction Processing Performance Council, Decision Support benchmark, version 1.3.0, http://www.tpc.org/tpcds/.
  21. Wrembel, R., 2009. A survey of managing the evolution of data warehouses. Int. Journal of Data Warehousing and Mining (ijDWM), 5(2), IGI Publishing, pp. 24-56.
  22. Zhao, H., Ye, X., 2014. A practice of tpc-ds multidimensional implementation on nosql database systems. 5th TPC Tech. Conf. Performance Characterization and Benchmarking, LNCS 8391, Springer, pp. 93-108.
Download


Paper Citation


in Harvard Style

Chevalier M., El Malki M., Kopliku A., Teste O. and Tournier R. (2015). Implementing Multidimensional Data Warehouses into NoSQL . In Proceedings of the 17th International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 978-989-758-096-3, pages 172-183. DOI: 10.5220/0005379801720183


in Bibtex Style

@conference{iceis15,
author={Max Chevalier and Mohammed El Malki and Arlind Kopliku and Olivier Teste and Ronan Tournier},
title={Implementing Multidimensional Data Warehouses into NoSQL},
booktitle={Proceedings of the 17th International Conference on Enterprise Information Systems - Volume 1: ICEIS,},
year={2015},
pages={172-183},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005379801720183},
isbn={978-989-758-096-3},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 17th International Conference on Enterprise Information Systems - Volume 1: ICEIS,
TI - Implementing Multidimensional Data Warehouses into NoSQL
SN - 978-989-758-096-3
AU - Chevalier M.
AU - El Malki M.
AU - Kopliku A.
AU - Teste O.
AU - Tournier R.
PY - 2015
SP - 172
EP - 183
DO - 10.5220/0005379801720183