loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Paper Unlock

Authors: Imene Guerfi ; Lobna Kriaa and Leila Azouz Saidane

Affiliation: CRISTAL Laboratory, RAMSIS Pole, National School for Computer Sciences (ENSI), University of Manouba, Tunisia

Keyword(s): GPU Computing, Parallel Programming, Program Optimization, Auto-tuning, and Face Detection.

Abstract: With the growing amount of data, computational power has became highly required in all fields. To satisfy these requirements, the use of GPUs seems to be the appropriate solution. But one of their major setbacks is their varying architectures making writing efficient parallel code very challenging, due to the necessity to master the GPU’s low-level design. CUDA offers more flexibility for the programmer to exploit the GPU’s power with ease. However, tuning the launch parameters of its kernels such as block size remains a daunting task. This parameter requires a deep understanding of the architecture and the execution model to be well-tuned. Particularly, in the Viola-Jones algorithm, the block size is an important factor that improves the execution time, but this optimization aspect is not well explored. This paper aims to offer the first steps toward automatically tuning the block size for any input without having a deep knowledge of the hardware architecture, which ensures the auto matic portability of the performance over different GPUs architectures. The main idea is to define techniques on how to get the optimum block size to achieve the best performance. We pointed out the impact of using static block size for all input sizes on the overall performance. In light of the findings, we presented two dynamic approaches to select the best block size suitable to the input size. The first one is based on an empirical search; this approach provides the optimal performance; however, it is tough for the programmer, and its deployment is time-consuming. In order to overcome this issue, we proposed a second approach, which is a model that automatically selects a block size. Experimental results show that this model can improve the execution time by up to 2.5x over the static approach. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 3.149.29.190

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Guerfi, I. ; Kriaa, L. and Saidane, L. (2022). Towards Automatic Block Size Tuning for Image Processing Algorithms on CUDA. In Proceedings of the 17th International Conference on Software Technologies - ICSOFT; ISBN 978-989-758-588-3; ISSN 2184-2833, SciTePress, pages 591-601. DOI: 10.5220/0011314800003266

@conference{icsoft22,
author={Imene Guerfi and Lobna Kriaa and Leila Azouz Saidane},
title={Towards Automatic Block Size Tuning for Image Processing Algorithms on CUDA},
booktitle={Proceedings of the 17th International Conference on Software Technologies - ICSOFT},
year={2022},
pages={591-601},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011314800003266},
isbn={978-989-758-588-3},
issn={2184-2833},
}

TY - CONF

JO - Proceedings of the 17th International Conference on Software Technologies - ICSOFT
TI - Towards Automatic Block Size Tuning for Image Processing Algorithms on CUDA
SN - 978-989-758-588-3
IS - 2184-2833
AU - Guerfi, I.
AU - Kriaa, L.
AU - Saidane, L.
PY - 2022
SP - 591
EP - 601
DO - 10.5220/0011314800003266
PB - SciTePress