loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Authors: Jan Schneider 1 ; Christoph Gröger 2 ; Arnold Lutsch 2 ; Holger Schwarz 1 and Bernhard Mitschang 1

Affiliations: 1 Institute of Parallel and Distributed Systems, University of Stuttgart, Universitätsstraße 38, 70569 Stuttgart, Germany ; 2 Robert Bosch GmbH, Borsigstraße 4, 70469 Stuttgart, Germany

Keyword(s): Lakehouse, Data Warehouse, Data Lake, Data Management, Data Analytics.

Abstract: The digital transformation opens new opportunities for enterprises to optimize their business processes by applying data-driven analysis techniques. For storing and organizing the required huge amounts of data, different types of data platforms have been employed in the past, with data warehouses and data lakes being the most prominent ones. Since they possess rather contrary characteristics and address different types of analytics, companies typically utilize both of them, leading to complex architectures with replicated data and slow analytical processes. To counter these issues, vendors have recently been making efforts to break the boundaries and to combine features of both worlds into integrated data platforms. Such systems are commonly called lakehouses and promise to simplify enterprise analytics architectures by serving all kinds of analytical workloads from a single platform. However, it remains unclear how lakehouses can be characterized, since existing definitions focus al most arbitrarily on individual architectural or functional aspects and are often driven by marketing. In this paper, we assess prevalent definitions for lakehouses and finally propose a new definition, from which several technical requirements for lakehouses are derived. We apply these requirements to several popular data management tools, such as Delta Lake, Snowflake and Dremio in order to evaluate whether they enable the construction of lakehouses. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 18.191.103.144

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Schneider, J.; Gröger, C.; Lutsch, A.; Schwarz, H. and Mitschang, B. (2023). Assessing the Lakehouse: Analysis, Requirements and Definition. In Proceedings of the 25th International Conference on Enterprise Information Systems - Volume 1: ICEIS; ISBN 978-989-758-648-4; ISSN 2184-4992, SciTePress, pages 44-56. DOI: 10.5220/0011840500003467

@conference{iceis23,
author={Jan Schneider. and Christoph Gröger. and Arnold Lutsch. and Holger Schwarz. and Bernhard Mitschang.},
title={Assessing the Lakehouse: Analysis, Requirements and Definition},
booktitle={Proceedings of the 25th International Conference on Enterprise Information Systems - Volume 1: ICEIS},
year={2023},
pages={44-56},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011840500003467},
isbn={978-989-758-648-4},
issn={2184-4992},
}

TY - CONF

JO - Proceedings of the 25th International Conference on Enterprise Information Systems - Volume 1: ICEIS
TI - Assessing the Lakehouse: Analysis, Requirements and Definition
SN - 978-989-758-648-4
IS - 2184-4992
AU - Schneider, J.
AU - Gröger, C.
AU - Lutsch, A.
AU - Schwarz, H.
AU - Mitschang, B.
PY - 2023
SP - 44
EP - 56
DO - 10.5220/0011840500003467
PB - SciTePress