per, we evaluated two hardware-based optimization
opportunities using SIMD extensions and custom ar-
chitectures on FPGA for the BitWeaving scan tech-
nique (Li and Patel, 2013). With both optimizations,
we are able to improve the scan performance, whereas
the FPGA optimization is superior to SIMD optimiza-
tion from a performance and energy perspective.
REFERENCES
Abadi, D. J., Madden, S., and Ferreira, M. (2006). Integrat-
ing compression and execution in column-oriented
database systems. In SIGMOD, pages 671–682.
Balkesen, C., Alonso, G., Teubner, J., and
¨
Ozsu, M. T.
(2013). Multi-core, main-memory joins: Sort vs. hash
revisited. PVLDB, 7(1):85–96.
Boncz, P. A., Kersten, M. L., and Manegold, S. (2008).
Breaking the memory wall in monetdb. Commun.
ACM, 51(12):77–85.
Damme, P., Habich, D., Hildebrandt, J., and Lehner, W.
(2017). Lightweight data compression algorithms: An
experimental survey (experiments and analyses). In
EDBT, pages 72–83.
Feng, Z., Lo, E., Kao, B., and Xu, W. (2015). Byteslice:
Pushing the envelop of main memory data processing
with a new storage layout. In SIGMOD, pages 31–46.
He, J., Zhang, S., and He, B. (2014). In-cache query
co-processing on coupled CPU-GPU architectures.
PVLDB, 8(4):329–340.
Hildebrandt, J., Habich, D., Damme, P., and Lehner, W.
(2016). Compression-aware in-memory query pro-
cessing: Vision, system design and beyond. In ADMS
Workshop at VLDB, pages 40–56.
Istv
´
an, Z., Sidler, D., and Alonso, G. (2017). Caribou: In-
telligent distributed storage. PVLDB, 10(11):1202–
1213.
Lemire, D. and Boytsov, L. (2015). Decoding billions of in-
tegers per second through vectorization. Softw., Pract.
Exper., 45(1).
Li, Y. and Patel, J. M. (2013). Bitweaving: Fast scans for
main memory data processing. In SIGMOD, pages
289–300.
Mueller, R., Teubner, J., and Alonso, G. (2009). Data pro-
cessing on fpgas. Proc. VLDB Endow., 2(1):910–921.
Oukid, I., Booss, D., Lespinasse, A., Lehner, W., Willhalm,
T., and Gomes, G. (2017). Memory management tech-
niques for large-scale persistent-main-memory sys-
tems. PVLDB, 10(11):1166–1177.
Polychroniou, O., Raghavan, A., and Ross, K. A.
(2015). Rethinking SIMD vectorization for in-
memory databases. In SIGMOD, pages 1493–1508.
Sidler, D., Istv
´
an, Z., Owaida, M., and Alonso, G. (2017a).
Accelerating pattern matching queries in hybrid CPU-
FPGA architectures. In SIGMOD, pages 403–415.
Sidler, D., Istvan, Z., Owaida, M., Kara, K., and Alonso, G.
(2017b). doppiodb: A hardware accelerated database.
In SIGMOD, pages 1659–1662.
Stonebraker, M., Abadi, D. J., Batkin, A., Chen, X., Cher-
niack, M., Ferreira, M., Lau, E., Lin, A., Madden,
S., O’Neil, E. J., O’Neil, P. E., Rasin, A., Tran, N.,
and Zdonik, S. B. (2005). C-store: A column-oriented
DBMS. In VLDB, pages 553–564.
Teubner, J. (2017). Fpgas for data processing: Current state.
it - Information Technology, 59(3):125.
Teubner, J. and Woods, L. (2013a). Data Processing on FP-
GAs. Synthesis Lectures on Data Management. Mor-
gan & Claypool Publishers.
Teubner, J. and Woods, L. (2013b). Data Processing on FP-
GAs. Synthesis Lectures on Data Management. Mor-
gan & Claypool Publishers.
Willhalm, T., Popovici, N., Boshmaf, Y., Plattner, H., Zeier,
A., and Schaffner, J. (2009). Simd-scan: Ultra fast
in-memory table scan using on-chip vector processing
units. VLDB, 2(1):385–394.
Xilinx (2017). Zynq UltraScale+ MPSoC Data Sheet:
Overview.
Zhao, W. X., Zhang, X., Lemire, D., Shan, D., Nie, J., Yan,
H., and Wen, J. (2015). A general simd-based ap-
proach to accelerating compression algorithms. ACM
Trans. Inf. Syst., 33(3).
Zhou, J. and Ross, K. A. (2002). Implementing database op-
erations using SIMD instructions. In SIGMOD, pages
145–156.
Ziener, D., Bauer, F., Becher, A., Dennl, C., Meyer-
Wegener, K., Sch
¨
urfeld, U., Teich, J., Vogt, J.-S., and
Weber, H. (2016). Fpga-based dynamically reconfig-
urable sql query processing. ACM Trans. Reconfig-
urable Technol. Syst., 9(4):25:1–25:24.
Zukowski, M., H
´
eman, S., Nes, N., and Boncz, P. A.
(2006). Super-scalar RAM-CPU cache compression.
In ICDE, page 59.
Column Scan Optimization by Increasing Intra-Instruction Parallelism
353