the algorithm can perform considerably faster than
existent algorithms.
Our implementations have room to be further op-
timized, as they served fundamentally as a proof of
concept. We plan to continue testing the algorithms
in a broader range of hardware platforms and diversi-
fied data sets, expecting further insights that can lead
to improved variations and ideas. Another avenue to
future work is to study how these algorithms and con-
cepts adapt to the forthcoming (and expectably more
flexible) architectures and standards.
A major conclusion is that performance of com-
paction algorithms may have large data and architec-
tural (inter)dependencies. Tuning an optimal solution,
if possible, is therefore a complex task which is likely
to rely on several variables.
Aware of this, and as a future work avenue, we are
devising solutions that have the ability to adaptively
self-tune in order to achieve a better performance.
This goal is likely to encompass several approaches,
as for instance, developing new algorithms, hybrid al-
gorithms (retaining the best properties of each compo-
nent), and by the use of a generic optimization frame-
work, as outlined in (Moreira et al., 2006), to enable
a dynamic optimization of the compaction process.
ACKNOWLEDGEMENTS
This work has been supported by European Social
Fund program, public contest 1/5.3/PRODEP/2003,
financing request no. 1012.012, medida 5/acção 5.3 -
Formação Avançada de Docentes do Ensino Superior,
submitted by Escola Superior de Tecnologia e Gestão
do Instituto Politécnico de Viana do Castelo.
REFERENCES
Blelloch, G. (1990). Prefix sums and their applications.
Technical Report CMU-CS-90-190, Carnegie Mellon
University - CMU – School of Computer Science,
Pittsburgh, PA 15213.
GPGPU (2008). GPGPU.org. http://www.gpgpu.org.
Greß, A., Guthe, M., and Klein, R. (2006). GPU-based
collision detection for deformable parameterized sur-
faces. Computer Graphics Forum, 25(3):497–506.
Harris, M. (2005). GPU Gems 2, chapter Mapping Compu-
tational Concepts to GPUs, pages 493–508. Addison-
Wesley.
Hensley, J., Scheuermann, T., Coombe, G., Singh, M., and
Lastra, A. (2005). Fast summed-area table genera-
tion and its applications. Computer Graphics Forum,
24(3):547–555.
Hillis, W. D. and Steele JR, G. (1986). Data parallel algo-
rithms. Communications of the ACM, 29(12):1170–
1183.
Horn, D. (2005). GPU Gems 2, chapter Stream reduction
operations for GPGPU applications, pages 573–589.
Addison-Wesley.
Lefohn, A. E., Sengupta, S., and Owens, J. D. (2007). Res-
olution matched shadow maps. ACM Transactions on
Graphics, 26(4):20:1–20:17.
Moreira, P. M., Reis, L. P., and de Sousa, A. A. (2006).
Best multiple-view selection: Application to the visu-
alization of urban rescue simulations. IJSIMM - Int.
Journal of Simulation Modelling, 5(4):167–173.
Nickolls, J., Buck, I., Garland, M., and Skadron, K. (2008).
Scalable parallel programming with CUDA. Queue,
6(2):40–53.
Owens, J. D., Luebke, D., Govindaraju, N., Harris, M.,
Kruger, J., Lefohn, A. E., and Purcell, T. J. (2007).
A survey of general-purpose computation on graphics
hardware. Computer Graphics Forum, 26(1):80–113.
Roger, D., Assarsson, U., and Holzschuch, N. (2007). Whit-
ted ray-tracing for dynamic scenes using a ray-space
hierarchy on the GPU. In Proceedings of the Euro-
graphics Symposium on Rendering’07, pages 99–110.
Sengupta, S., Harris, M., Zhang, Y., and Owens, J. D.
(2007). Scan primitives for GPU computing. In GH
’07: Proceedings of the 22nd Symposium on Graphics
Hardware, pages 97–106.
Sengupta, S., Lefohn, A. E., and Owens, J. D. (2006). A
work-efficient step-efficient prefix sum algorithm. In
Proceedings of the Workshop on Edge Computing Us-
ing New Commodity Architectures, pages D:26–27.
Ziegler, G., Tevs, A., Theobalt, C., and Seidel, H. (2006).
GPU point list generation through histogram pyra-
mids. In 11th Int. Fall Workshop on Vision, Modeling,
and Visualization - VMV’06, pages 137–144.
GRAPP 2009 - International Conference on Computer Graphics Theory and Applications
146