timized. We plan to continue testing the algorithms
in a broader range of hardware platforms and diversi-
fied data sets, expecting further insights that can lead
to improved variations and ideas, and so, another av-
enue to future work focus on how these algorithms
and concepts adapt to new architectures and forth-
coming standards.
ACKNOWLEDGEMENTS
This work has been partially supported by
European Social Fund program, public con-
test 1/5.3/PRODEP/2003, financing request no.
1012.012, medida 5/acção 5.3 - Formação Avançada
de Docentes do Ensino Superior, submitted by
Escola Superior de Tecnologia e Gestão do Instituto
Politécnico de Viana do Castelo.
REFERENCES
Blelloch, G. (1990). Prefix sums and their applications.
Technical Report CMU-CS-90-190, Carnegie Mellon
University - CMU – School of Computer Science,
Pittsburgh, PA 15213.
Crow, F. C. (1984). Summed-area tables for texture map-
ping. In SIGGRAPH ’84: Proceedings of the 11th
annual conference on Computer graphics and interac-
tive techniques, pages 207–212, New York, NY, USA.
ACM.
Fernando, R. and Kilgard, M. J. (2003). The Cg Tuto-
rial: The Definitive Guide to Programmable Real-
Time Graphics. Addison-Wesley Longman Publishing
Co., Inc., Boston, MA, USA.
GPGPU (2008). GPGPU.org. http://www.gpgpu.org, last
visited 2008.07.22.
Greß, A., Guthe, M., and Klein, R. (2006). GPU-based
collision detection for deformable parameterized sur-
faces. Computer Graphics Forum, 25(3):497–506.
Harris, M., Sengupta, S., and Owens, J. D. (2007). Parallel
prefix sum (scan) with CUDA. In Nguyen, H., editor,
GPU Gems 3, chapter 39. Addison Wesley.
Hensley, J., Scheuermann, T., Coombe, G., Singh, M., and
Lastra, A. (2005). Fast summed-area table generation
and its applications. Computer Graphics Forum (Pro-
ceedings of Eurographics), 24(3):547–555.
Hillis, W. D. and Steele JR, G. (1986). Data parallel algo-
rithms. Communications of the ACM, 29(12):1170–
1183.
Horn, D. (2005). GPU Gems 2, chapter Stream reduction
operations for GPGPU applications, pages 573–589.
Addison-Wesley.
Kessenich, J. (2006). The OpenGL Shading Language
(v.1.20). OpenGL Architecture Review Board. avail-
able at http://www.opengl.org/documentation/glsl/.
Lefohn, A. E., Sengupta, S., and Owens, J. D. (2007). Res-
olution matched shadow maps. ACM Transactions on
Graphics, 26(4):20:1–20:17.
Moreira, P. M., Reis, L. P., and de Sousa, A. A.
(2009). Jumping jack : A parallel algorithm for non-
monotonic stream compaction. In GRAPP 2009 - 4th
International Conference on Computer Graphics The-
ory and Applications, February 5–8, Lisbon, Portugal.
Nickolls, J., Buck, I., Garland, M., and Skadron, K. (2008).
Scalable parallel programming with CUDA. ACM
Queue, 6(2):40–53.
OpenGL Architecture Review Board (2008).
ARB_geometry_shader4 Extension Spec-
ification. OpenGL Architecture Re-
view Board, rev 22 edition. available at
http://www.opengl.org/registry/specs/ARB/
geometry
s
hader4.txt.
Owens, J. D., Luebke, D., Govindaraju, N., Harris, M.,
Kruger, J., Lefohn, A. E., and Purcell, T. J. (2007).
A survey of general-purpose computation on graphics
hardware. Computer Graphics Forum, 26(1):80–113.
Roger, D., Assarsson, U., and Holzschuch, N. (2007). Whit-
ted ray-tracing for dynamic scenes using a ray-space
hierarchyon the GPU. In Proceedings of the Euro-
graphics Symposium on Rendering’07, pages 99–110.
Segal, M. and Akeley, K. (2006). The OpenGL
Graphics System: A Specification (Version
2.1). http://www.opengl.org/documentation/specs/
version2.1 (last visited 2008.07.24).
Sengupta, S., Lefohn, A. E., and Owens, J. D. (2006).
A work-efficient step-efficient prefix sum algorithm.
In Proceedings of the Workshop on Edge Comput-
ing Using New Commodity Architectures, May 23–24,
Chapel Hill, North Carolina, USA, pages D:26–27.
Ziegler, G., Tevs, A., Theobalt, C., and Seidel, H. (2006).
Gpu point list generation through histogram pyramids.
In 11th International Fall Workshop on Vision, Mod-
eling, and Visualization - VMV’06, pages 137–144.
GRAPP 2009 - International Conference on Computer Graphics Theory and Applications
128