nar in-memory format, organized for efficient analyt-
ical operations in modern hardware. It already in-
cludes dictionary coding to map arbitrary data types
to integer values. However, Apache Arrow lacks sup-
port for more sophisticated lightweight integer com-
pression algorithms which are suitable to (i) reduce
the memory footprint and (ii) to speedup the columnar
data processing. Thus, we presented an approach to
integrate the large corpus of lightweight into Apache
Arrow in this paper. We experimentally showed that
this integration leads to a decreased memory footprint
an increased performance of an aggregation function
compared to uncompressed data.
The next step in our ongoing research work is the
integration of more compression algorithms with dif-
ferent properties to generalize and optimize our inte-
gration approach. Another point of ongoing work is
to deduce the decompression abstraction correspond-
ing to compression metamodel. Thus, the genera-
tion of decompression code can be automated with-
out the explicit knowledge of a decompression algo-
rithm. Future work also includes the integration of
some data hardening algorithms respectively error-
detecting codes. This can be done by applying the
metamodel as well. Last but not least, we plan to ex-
haustively evaluate the benefit of our approach with
big data systems using Apache Arrow.
REFERENCES
Abadi, D., Boncz, P. A., Harizopoulos, S., Idreos, S., and
Madden, S. (2013). The design and implementation
of modern column-oriented database systems. Foun-
dations and Trends in Databases, 5(3):197–280.
Abadi, D. J., Madden, S., and Ferreira, M. (2006). Integrat-
ing compression and execution in column-oriented
database systems. In SIGMOD, pages 671–682.
Ahmad, T., Peltenburg, J., Ahmed, N., and Al Ars, Z.
(2019). Arrowsam: In-memory genomics data pro-
cessing through apache arrow framework. bioRxiv,
page 741843.
Apache (2020). Apache avro: the smallest, fastest columnar
storage for hadoop workloads. https://orc.apache.org/.
Accessed: 2020-03-06.
Beazley, D. M. (2012). Data processing with pandas. ;lo-
gin:, 37(6).
Begoli, E., Camacho-Rodr
´
ıguez, J., Hyde, J., Mior, M. J.,
and Lemire, D. (2018). Apache calcite: A founda-
tional framework for optimized query processing over
heterogeneous data sources. In SIGMOD, pages 221–
230.
Binnig, C., Hildenbrand, S., and F
¨
arber, F. (2009).
Dictionary-based order-preserving string compression
for main memory column stores. In SIGMOD, page
283–296.
Boncz, P. A., Kersten, M. L., and Manegold, S. (2008).
Breaking the memory wall in monetdb. Commun.
ACM, 51(12):77–85.
Boncz, P. A., Zukowski, M., and Nes, N. (2005). Monetd-
b/x100: Hyper-pipelining query execution. In CIDR.
Chaudhuri, S., Dayal, U., and Narasayya, V. R. (2011). An
overview of business intelligence technology. Com-
mun. ACM, 54(8):88–98.
Damme, P., Habich, D., Hildebrandt, J., and Lehner, W.
(2017). Lightweight data compression algorithms: An
experimental survey (experiments and analyses). In
EDBT, pages 72–83.
Damme, P., Ungeth
¨
um, A., Hildebrandt, J., Habich, D.,
and Lehner, W. (2019). From a comprehensive ex-
perimental survey to a cost-based selection strategy
for lightweight integer compression algorithms. ACM
Trans. Database Syst., 44(3):9:1–9:46.
Damme, P., Ungeth
¨
um, A., Pietrzyk, J., Krause, A., Habich,
D., and Lehner, W. (2020). Morphstore: Analytical
query engine with a holistic compression-enabled pro-
cessing model. CoRR, abs/2004.09350.
Goldstein, J., Ramakrishnan, R., and Shaft, U. (1998).
Compressing relations and indexes. In ICDE, pages
370–379.
Google (2019). Snappy - a fast compressor/decompressor.
https://github.com/google/snappy.
Habich, D., Damme, P., Ungeth
¨
um, A., Pietrzyk, J.,
Krause, A., Hildebrandt, J., and Lehner, W. (2019).
Morphstore - in-memory query processing based on
morphing compressed intermediates LIVE. In SIG-
MOD, pages 1917–1920.
Hildebrandt, J., Habich, D., Damme, P., and Lehner, W.
(2016). Compression-aware in-memory query pro-
cessing: Vision, system design and beyond. In
ADMS@VLDB, pages 40–56.
Hildebrandt, J., Habich, D., K
¨
uhn, T., Damme, P., and
Lehner, W. (2017). Metamodeling lightweight data
compression algorithms and its application scenarios.
In ER, pages 128–141.
Huffman, D. A. (1952). A method for the construction of
minimum-redundancy codes. Proceedings of the In-
stitute of Radio Engineers, 40(9):1098–1101.
Kornacker, M., Behm, A., Bittorf, V., Bobrovytsky, T.,
Ching, C., Choi, A., Erickson, J., Grund, M., Hecht,
D., Jacobs, M., Joshi, I., Kuff, L., Kumar, D., Leblang,
A., Li, N., Pandis, I., Robinson, H., Rorke, D., Rus,
S., Russell, J., Tsirogiannis, D., Wanderman-Milne,
S., and Yoder, M. (2015). Impala: A modern, open-
source SQL engine for hadoop. In CIDR.
Lemire, D. and Boytsov, L. (2015). Decoding billions of in-
tegers per second through vectorization. Softw., Pract.
Exper., 45(1):1–29.
M
¨
uller, I., Ratsch, C., and F
¨
arber, F. (2014). Adaptive
string dictionary compression in in-memory column-
store database systems. In EDBT, pages 283–294.
Navarro, G. (2016). Compact Data Structures - A Practical
Approach. Cambridge University Press.
Peltenburg, J., van Straten, J., Wijtemans, L., van Leeuwen,
L., Al-Ars, Z., and Hofstee, P. (2019). Fletcher: A
Integrating Lightweight Compression Capabilities into Apache Arrow
65