4 SUMMARY AND FUTURE
WORK
In this article we have proposed a programming
model and system architecture for a data management
system that integrates multiple specialized data do-
mains and programming models into an efficient and
flexible end to end solution.
In the language layer, at the top of our architec-
ture, we extend a flexible host language with a set of
embedded domain specific languages that cover dif-
ferent use cases.
Underneath the language layer, we use a compiler
framework to translate abstract multi-domain pro-
grams into efficient physical workloads. The frame-
work incorporates domain knowledge which makes it
possible to apply powerful domain specific optimiza-
tion rules. In addition, the compiler generates format
transformation code to enable the composition of op-
erations that are defined on different physical repre-
sentations.
At the bottom of the architecture, we propose to
use a storage and processing engine that supports
multiple optimized physical data formats. Actual data
access is implemented by a set of processing opera-
tors that apply user defined functions to data objects
according to predefined data access patterns. These
patterns enable data parallel operator implementa-
tions and advanced optimizations such as operator fu-
sion.
We are currently in the process of developing a
prototypical implementation of the proposed architec-
ture. In order to fully focus on the multi domain as-
pects of our approach, we reuse previous work as ex-
tensively as possible possible. Most importantly, we
use the in-memory data store ERIS (Kissinger et al.,
2014) as the starting point for our multi format data
management engine. ERIS uses data parallel process-
ing operators to achieve vertical scaleability on large
shared memory multi-processor machines. In addi-
tion, ERIS already provides column and row formats
for the storage of relations and is therefore designed
to support multiple physical storage formats. We plan
to extend ERIS and add an additional matrix format to
support the important use case of relational and linear
algebra integration.
The compilation layer uses the DSL compilation
framework LMS (Rompf and Odersky, 2010). LMS
uses Scala as host language for embedded DSLs and
provides the generation of a tree IR with custom node
types. Further, LMS defines a flexible component
based extension mechanism, which allows the inte-
gration of new DSL elements, IR nodes, optimization
rules, and code generation.
We propose the multi-domain architecture as our
vision for a data management system that provides
tight and efficient integration of multiple processing
domains. In future work, we want to evaluate the im-
pact of frequent format transformations on the perfor-
mance of the system and expect to find specific multi-
domain optimizations to mitigate these effects.
ACKNOWLEDGEMENT
This work is partly funded by the German Fed-
eral Ministry of Education and Research (BMBF)
in VAVID project under grant 01IS14005 as well as
by the German Research Foundation (DFG) in the
Collaborative Research Center 912 Highly Adaptive
Energy-Efficient Computing.
REFERENCES
Abadi, D. et al. (2014). The beckman report on database
research. ACM SIGMOD Record, 43(3):61–70.
Alexandrov, A., Kunft, A., Katsifodimos, A., Sch
¨
uler, F.,
Thamsen, L., Kao, O., Herb, T., and Markl, V. (2015).
Implicit parallelism through deep language embed-
ding. In Proceedings of the 2015 ACM SIGMOD
International Conference on Management of Data,
pages 47–61. ACM.
Beckmann, O., Houghton, A., Mellor, M., and Kelly, P. H.
(2004). Runtime code generation in c++ as a foun-
dation for domain-specific optimisation. In Domain-
Specific Program Generation, pages 291–306.
Duggan, J., Elmore, A. J., Stonebraker, M., Balazinska, M.,
Howe, B., Kepner, J., Madden, S., Maier, D., Mattson,
T., and Zdonik, S. (2015). The bigdawg polystore sys-
tem. ACM SIGMOD Record, 44(2):11–16.
Kissinger, T., Kiefer, T., Schlegel, B., Habich, D., Molka,
D., and Lehner, W. (2014). ERIS: A numa-aware in-
memory storage engine for analytical workload. In
ADMS Workshop at VLDB, pages 74–85.
Newburn, C. J., So, B., Liu, Z., McCool, M., Ghuloum,
A., Toit, S. D., Wang, Z. G., Du, Z. H., Chen, Y.,
Wu, G., et al. (2011). Intel’s array building blocks:
A retargetable, dynamic compiler and embedded lan-
guage. In Code generation and optimization (CGO),
2011 9th annual IEEE/ACM international symposium
on, pages 224–235. IEEE.
Rompf, T. and Odersky, M. (2010). Lightweight modular
staging: a pragmatic approach to runtime code gen-
eration and compiled dsls. In Acm Sigplan Notices,
volume 46, pages 127–136. ACM.
DATA 2016 - 5th International Conference on Data Management Technologies and Applications
194