Assessing the Lakehouse: Analysis, Requirements and Definition

Jan Schneider, Christoph Gröger, Arnold Lutsch, Holger Schwarz, Bernhard Mitschang

2023

Abstract

The digital transformation opens new opportunities for enterprises to optimize their business processes by applying data-driven analysis techniques. For storing and organizing the required huge amounts of data, different types of data platforms have been employed in the past, with data warehouses and data lakes being the most prominent ones. Since they possess rather contrary characteristics and address different types of analytics, companies typically utilize both of them, leading to complex architectures with replicated data and slow analytical processes. To counter these issues, vendors have recently been making efforts to break the boundaries and to combine features of both worlds into integrated data platforms. Such systems are commonly called lakehouses and promise to simplify enterprise analytics architectures by serving all kinds of analytical workloads from a single platform. However, it remains unclear how lakehouses can be characterized, since existing definitions focus almost arbitrarily on individual architectural or functional aspects and are often driven by marketing. In this paper, we assess prevalent definitions for lakehouses and finally propose a new definition, from which several technical requirements for lakehouses are derived. We apply these requirements to several popular data management tools, such as Delta Lake, Snowflake and Dremio in order to evaluate whether they enable the construction of lakehouses.

Download


Paper Citation


in Harvard Style

Schneider J., Gröger C., Lutsch A., Schwarz H. and Mitschang B. (2023). Assessing the Lakehouse: Analysis, Requirements and Definition. In Proceedings of the 25th International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 978-989-758-648-4, SciTePress, pages 44-56. DOI: 10.5220/0011840500003467


in Bibtex Style

@conference{iceis23,
author={Jan Schneider and Christoph Gröger and Arnold Lutsch and Holger Schwarz and Bernhard Mitschang},
title={Assessing the Lakehouse: Analysis, Requirements and Definition},
booktitle={Proceedings of the 25th International Conference on Enterprise Information Systems - Volume 1: ICEIS,},
year={2023},
pages={44-56},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011840500003467},
isbn={978-989-758-648-4},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 25th International Conference on Enterprise Information Systems - Volume 1: ICEIS,
TI - Assessing the Lakehouse: Analysis, Requirements and Definition
SN - 978-989-758-648-4
AU - Schneider J.
AU - Gröger C.
AU - Lutsch A.
AU - Schwarz H.
AU - Mitschang B.
PY - 2023
SP - 44
EP - 56
DO - 10.5220/0011840500003467
PB - SciTePress