is defined as a ratio of energy consumed by SQLite
over energy consumed by tested platform. Figure 5
shows the average calculated energy efficiencies.
Figure 5: Energy efficiency: higher is better.
For a better readability, only results obtained with
CuDB in ”Boost” configuration are shown. These re-
sults confirm that the energy efficiency of embedded
RDBMS can be significantly boosted by using a hy-
brid CPU/GPU query processing engine.
5 CONCLUSION AND FUTURE
WORKS
In this paper, it has been shown that GPU archi-
tectures can be exploited to speed up processing of
RDBMS. CuDB, an embedded RDBMS that is a per-
formance upgrade of SQLite in the context of par-
allel hybrid architecture, has been presented. CuDB
preserves the SQL support of SQLite as it retains its
API. It has also been shown that speedups of more
than 411x for queries on tables containing a million
entries were achieved. Here the different measures
have also shown that it is not necessary to have the
most powerful GPU to achieve satisfactory accelera-
tions. Weaknesses of GPGPU solutions for process-
ing small amounts of data were also tackled by using
a hybrid engine where light treatments remained on
the CPU. As perspectivess, the support of some ad-
ditional SQL clauses will be considered in order to
be compliant with TPC-H and SSB benchmarks. A
port of CuDB on OpenCL is also planned to target
other GPU manufacturers. Another important chal-
lenge is to overcome the limitations of the GPU mem-
ory capacity which is currently limited to 16GB for
high end GPUs. To overcome these size limitation,
the proposal is to pipeline the query processing en-
gine in order to mask memory transfers, and to trans-
fer the data through circular buffer mechanisms. The
overhead of transient memory requirements involved
in complex join queries could also be larger than the
physical GPU memory size. This will be also tackled
by pipelining and circular buffering.
REFERENCES
Bakkum, P. and Skadron, K. (2010). Accelerating sql
database operations on a gpu with cuda. In 3rd Work-
shop on GPGPU, pages 94–103, Pittsburgh, USA.
Breß, S., Siegmund, N., Bellatreche, L., and Saake, G.
(2013). An operator-stream-based scheduling engine
for effective gpu coprocessing. ADBIS, 8133:288–
301.
Fang, R., He, B., Lu, M., Yang, K., Govindaraju, N.,
Luo, Q., and Sander, P. (2007). Gpuqp : query
co-processing using graphics processors. In SIG-
MOD/PODS’07, pages 1061–1063, Beijing, China.
GIS-Federal (2014). Gpudb - a distributed database for
many-core devices. 54th HPC User Forum, Seattle.
Govindaraju, N., Lloyd, B., Wang, W., Lin, M., and
Manochad, D. (2004). Fast computation of database
operations using graphics processors. In SIG-
MOD/PODS’04 international conference on Manage-
ment of data, pages 215–216, Paris, France.
Hagen, P., Schulz-Hildebrandt, O., and Luttenberger, N.
(2010). Fast in-place sorting with cuda based on
bitonic sort. Parallel Processing and Applied Math-
ematics, 6067:403–410.
He, B. and Xu Yu, J. (2011). High-throughput transac-
tion executions on graphics processors. VLDB Endow-
ment, 8(5):314–325.
Heimel, M., Saecker, M., Pirk, H., Manegold, S., and
Markl, V. (2013). Hardware-oblivious parallelism for
in-memory column-stores. PVLDB, 6(9):709–720.
Huang, S., Xiao, S., and Feng, W. (2009). On the energy
efficiency of graphics processing units for scientific
computing. In IPDPS’09.
Hummel, M. (2010). Parstream - a parallel database on
gpus. GTC2010, San Jose, CA.
Landaverde, R., Zhang, T., Coskun, A., and Herbordt, M.
(2014). An investigation of unified memory access
performance in cuda. In HPEC 2014, Waltham, MA.
van den Braak, G., Mersman, B., and Corporaal, H. (2010).
Compiletime gpu memory access optimizations. In
ICSAMOS 2010, Samos, Greece.
Yong, K., Karuppiah, E., and Chong-Wee See, S. (2014).
Galactica: A gpu parallelized database accelerator. In
2014 International Conference on Big Data Science
and Computing, Beijing, China.
Yuan, Y., Lee, R., and Zhang, X. (2013). The yin and
yang of processing data warehousing queries on gpu
devices. VLDB Endowment, 6(10):817–828.
Zhang, S., He, J., He, B., and Lu, M. (2013). Omnidb:
towards portable and efficient query processing on
parallel cpu/gpu architectures. VLDB Endowment,
6(12):1374–1377.
Boosting an Embedded Relational Database Management System with Graphics Processing Units
175