Integrating Lightweight Compression Capabilities into Apache Arrow

Juliana Hildebrandt; Dirk Habich; Wolfgang Lehner

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

Integrating Lightweight Compression Capabilities into Apache Arrow

Topics: Big Data Infrastructure; In-Memory Databases; New Data Standards

In Proceedings of the 9th International Conference on Data Science, Technology and Applications DATA - Volume 1, 55-66, 2020

Authors: Juliana Hildebrandt ; Dirk Habich and Wolfgang Lehner

Affiliation: Technische Universität Dresden, Database Systems Group, Dresden, Germany

Keyword(s): Columnar Data, Data Formats, Apache Arrow, Lightweight Compression, Integration.

Abstract: With the ongoing shift to a data-driven world in almost all application domains, the management and in particular the analytics of large amounts of data gain in importance. For that reason, a variety of new big data systems has been developed in recent years. Aside from that, a revision of the data organization and formats has been initiated as a foundation for these big data systems. In this context, Apache Arrow is a novel cross-language development platform for in-memory data with a standardized language-independent columnar memory format. The data is organized for efficient analytic operations on modern hardware, whereby Apache Arrow only supports dictionary encoding as a specific compression approach. However, there exists a large corpus of lightweight compression algorithms for columnar data which helps to reduce the necessary memory space as well as to increase the processing performance. Thus, we present a flexible and language-independent approach integrating lightweight com pression algorithms into the Apache Arrow framework in this paper. With our so-called ArrowComp approach, we preserve the unique properties of Apache Arrow, but enhance the platform with a large variety of lightweight compression capabilities. (More)

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 3.129.70.104

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Hildebrandt, J., Habich, D. and Lehner, W. (2020). Integrating Lightweight Compression Capabilities into Apache Arrow. In Proceedings of the 9th International Conference on Data Science, Technology and Applications - DATA; ISBN 978-989-758-440-4; ISSN 2184-285X, SciTePress, pages 55-66. DOI: 10.5220/0009820100550066

@conference{data20,
author={Juliana Hildebrandt and Dirk Habich and Wolfgang Lehner},
title={Integrating Lightweight Compression Capabilities into Apache Arrow},
booktitle={Proceedings of the 9th International Conference on Data Science, Technology and Applications - DATA},
year={2020},
pages={55-66},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0009820100550066},
isbn={978-989-758-440-4},
issn={2184-285X},
}

TY - CONF

JO - Proceedings of the 9th International Conference on Data Science, Technology and Applications - DATA
TI - Integrating Lightweight Compression Capabilities into Apache Arrow
SN - 978-989-758-440-4
IS - 2184-285X
AU - Hildebrandt, J.
AU - Habich, D.
AU - Lehner, W.
PY - 2020
SP - 55
EP - 66
DO - 10.5220/0009820100550066
PB - SciTePress