NEW ALGORITHMS FOR GPU STREAM COMPACTION - A Comparative Study

Pedro Miguel Moreira, Luís Paulo Reis, A. Augusto de Sousa

Abstract

With the advent of GPU programmability, many applications have transferred computational intensive tasks into it. Some of them compute intermediate data comprised by a mixture of relevant and irrelevant elements in respect to further processing tasks. Hence, the ability to discard irrelevant data and preserve the relevant portion is a desired feature, with benefits on further computational effort, memory and communication bandwidth. Parallel stream compaction is an operation that, given a discriminator, is able to output the valid elements discarding the rest. In this paper we contribute two original algorithms for parallel stream compaction on the GPU. We tested and compared our proposals with state-of-art algorithms against different data-sets. Results demonstrate that our proposals can outperform prior algorithms. Result analysis also demonstrate that there is not a best algorithm for all data distributions and that such optimal setting is difficult to be achieved without prior knowledge of the data characteristics.

References

  1. Blelloch, G. (1990). Prefix sums and their applications. Technical Report CMU-CS-90-190, Carnegie Mellon University - CMU - School of Computer Science, Pittsburgh, PA 15213.
  2. Crow, F. C. (1984). Summed-area tables for texture mapping. In SIGGRAPH 7884: Proceedings of the 11th annual conference on Computer graphics and interactive techniques, pages 207-212, New York, NY, USA. ACM.
  3. Fernando, R. and Kilgard, M. J. (2003). The Cg Tutorial: The Definitive Guide to Programmable RealTime Graphics. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA.
  4. GPGPU (2008). GPGPU.org. http://www.gpgpu.org, last visited 2008.07.22.
  5. Greß, A., Guthe, M., and Klein, R. (2006). GPU-based collision detection for deformable parameterized surfaces. Computer Graphics Forum, 25(3):497-506.
  6. Harris, M., Sengupta, S., and Owens, J. D. (2007). Parallel prefix sum (scan) with CUDA. In Nguyen, H., editor, GPU Gems 3, chapter 39. Addison Wesley.
  7. Hensley, J., Scheuermann, T., Coombe, G., Singh, M., and Lastra, A. (2005). Fast summed-area table generation and its applications. Computer Graphics Forum (Proceedings of Eurographics), 24(3):547-555.
  8. Hillis, W. D. and Steele JR, G. (1986). Data parallel algorithms. Communications of the ACM, 29(12):1170- 1183.
  9. Horn, D. (2005). GPU Gems 2, chapter Stream reduction operations for GPGPU applications, pages 573-589. Addison-Wesley.
  10. Kessenich, J. (2006). The OpenGL Shading Language (v.1.20). OpenGL Architecture Review Board. available at http://www.opengl.org/documentation/glsl/.
  11. Lefohn, A. E., Sengupta, S., and Owens, J. D. (2007). Resolution matched shadow maps. ACM Transactions on Graphics, 26(4):20:1-20:17.
  12. Moreira, P. M., Reis, L. P., and de Sousa, A. A. (2009). Jumping jack : A parallel algorithm for nonmonotonic stream compaction. In GRAPP 2009 - 4th International Conference on Computer Graphics Theory and Applications, February 5-8, Lisbon, Portugal.
  13. Nickolls, J., Buck, I., Garland, M., and Skadron, K. (2008). Scalable parallel programming with CUDA. ACM Queue, 6(2):40-53.
  14. OpenGL Architecture Review Board (2008). ARB_geometry_shader4 Extension Specification. OpenGL Architecture Review Board, rev 22 edition. available at http://www.opengl.org/registry/specs/ARB/ geometryshader4:txt:
  15. Owens, J. D., Luebke, D., Govindaraju, N., Harris, M., Kruger, J., Lefohn, A. E., and Purcell, T. J. (2007). A survey of general-purpose computation on graphics hardware. Computer Graphics Forum, 26(1):80-113.
  16. . M. and Akeley, K. (2006). The OpenGL Graphics System: A Specification (Version 2.1). http://www.opengl.org/documentation/specs/ version2.1 (last visited 2008.07.24).
  17. Sengupta, S., Lefohn, A. E., and Owens, J. D. (2006). A work-efficient step-efficient prefix sum algorithm. In Proceedings of the Workshop on Edge Computing Using New Commodity Architectures, May 23-24, Chapel Hill, North Carolina, USA, pages D:26-27.
  18. Ziegler, G., Tevs, A., Theobalt, C., and Seidel, H. (2006). Gpu point list generation through histogram pyramids. In 11th International Fall Workshop on Vision, Modeling, and Visualization - VMV'06, pages 137-144.
Download


Paper Citation


in Harvard Style

Moreira P., Reis L. and de Sousa A. (2009). NEW ALGORITHMS FOR GPU STREAM COMPACTION - A Comparative Study . In Proceedings of the Fourth International Conference on Computer Graphics Theory and Applications - Volume 1: GRAPP, (VISIGRAPP 2009) ISBN 978-989-8111-67-8, pages 119-128. DOI: 10.5220/0001783601190128


in Bibtex Style

@conference{grapp09,
author={Pedro Miguel Moreira and Luís Paulo Reis and A. Augusto de Sousa},
title={NEW ALGORITHMS FOR GPU STREAM COMPACTION - A Comparative Study},
booktitle={Proceedings of the Fourth International Conference on Computer Graphics Theory and Applications - Volume 1: GRAPP, (VISIGRAPP 2009)},
year={2009},
pages={119-128},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001783601190128},
isbn={978-989-8111-67-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Fourth International Conference on Computer Graphics Theory and Applications - Volume 1: GRAPP, (VISIGRAPP 2009)
TI - NEW ALGORITHMS FOR GPU STREAM COMPACTION - A Comparative Study
SN - 978-989-8111-67-8
AU - Moreira P.
AU - Reis L.
AU - de Sousa A.
PY - 2009
SP - 119
EP - 128
DO - 10.5220/0001783601190128