A Performance Analysis for Efficient Schema Design in Cloud-Based Distributed Data Warehouses

Fred Ferreira, Robson do Nascimento Fidalgo

2024

Abstract

Data Warehouses (DWs) have become an indispensable asset for companies to support strategic decision-making. In a world where enterprise data grows exponentially, however, new DW architectures are being investigated to overcome the deficiencies of traditional relational Database Management Systems (DBMS), driving a shift towards more modern, cloud-based DW solutions. To enhance efficiency and ease of use, the industry has seen the rise of next-generation analytics DBMSs, such as NewSQL, a hybrid storage class of solutions that support both complex analytical queries (OLAP) and transactional queries (OLTP). We under-stand that few studies explore whether the way the data is denormalized has an impact on the performance of these solutions to process OLAP queries in a distributed environment. This paper investigates the role of data modeling in the processing time and data volume of a distributed DW. The Star Schema Benchmark was used to evaluate the performance of a Star Schema and a Fully Denormalized Schema in three different market solutions: Singlestore, Amazon Redshift and MariaDB Columnstore in two different memory availability scenarios. Our results show that data denormalization is not a guarantee for improved performance, as solutions performed very differently depending on the schema. Furthermore, we also show that a hybrid-storage (HTAP) NewSQL solution can outperform an OLAP solution in terms of mean execution time.

Download


Paper Citation


in Harvard Style

Ferreira F. and do Nascimento Fidalgo R. (2024). A Performance Analysis for Efficient Schema Design in Cloud-Based Distributed Data Warehouses. In Proceedings of the 26th International Conference on Enterprise Information Systems - Volume 1: ICEIS; ISBN 978-989-758-692-7, SciTePress, pages 39-49. DOI: 10.5220/0012546200003690


in Bibtex Style

@conference{iceis24,
author={Fred Ferreira and Robson do Nascimento Fidalgo},
title={A Performance Analysis for Efficient Schema Design in Cloud-Based Distributed Data Warehouses},
booktitle={Proceedings of the 26th International Conference on Enterprise Information Systems - Volume 1: ICEIS},
year={2024},
pages={39-49},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012546200003690},
isbn={978-989-758-692-7},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 26th International Conference on Enterprise Information Systems - Volume 1: ICEIS
TI - A Performance Analysis for Efficient Schema Design in Cloud-Based Distributed Data Warehouses
SN - 978-989-758-692-7
AU - Ferreira F.
AU - do Nascimento Fidalgo R.
PY - 2024
SP - 39
EP - 49
DO - 10.5220/0012546200003690
PB - SciTePress