2000). In this study, we are using CUDA platform
with GPU since CUDA is the most popular platform
used to increase the GPU utilization. The main
difference between CPU and GPU, as shown in
Figure 1 (Reddy et al, 2017), is the number of
processing units. In CPU it has less processing units
with cache and control units while in GPU it has more
processing units with its own cache and control units.
GPUs contain hundreds of cores which causes higher
parallelism compared to CPUs.
2.2 NVIDIA, CUDA Architecture and
Threads
The GPU follows the SIMD programming model.
The GPU contains hundreds of processing cores,
called the Scalar Processors (SPs). Streaming
multiprocessor (SM) is a group of eight SPs forming
the graphic card. Group of SPs in the same SM
execute the same instruction at the same time hence
they execute in Single Instruction Multiple Thread
(SIMT) fashion (Lad et al, 2012). Compute Unified
Device Architecture (CUDA), developed by
NVIDIA, is a parallel processing architecture, which
with the help of the GPU produced a significant
performance improvement. CUDA enabled GPU is
widely used in many applications as image and video
processing in chemistry and biology, fluid dynamic
simulations, computerized tomography (CT), etc
(Bozkurt et al, 2015). CUDA is an extension of C
language for executing on the GPU, that
automatically creates parallelism with no need to
change program architecture for making them
multithreaded. It also supports memory scatter
bringing more flexibilities to GPU (Reddy et al,
2017). The CUDA API allows the execution of the
code using a large number of threads, where threads
are grouped into blocks and blocks make up a grid.
Blocks are serially assigned for execution on each SM
(Lad et al, 2012).
2.3 Gaussian Blur Filter
The Gaussian blur, is a convolution technique used as
a pre-processing stage of many computer vision
algorithms used for smoothing, blurring and
eliminating noise in an image (Chauhan, 2018).
Gaussian blur is a linear low-pass filter, where the
pixel value is calculated using the Gaussian function
(Novák et al, 2012). The 2 Dimensional (2D)
Gaussian function is the product of two 1
Dimensional (1D) Gaussian functions, defined as
shown in equation (1) (Novák et al, 2012):
G(x,y) =
𝑒
(1)
where (x,y) are coordinates and ‘σ’ is the standard
deviation of the Gaussian distribution.
The linear spatial filter mechanism is in the
movement of the center of a filter mask from one
point to another and the value of each pixel (x, y) is
the result of the filter at that point is the sum of the
multiplication of the filter coefficients and the
corresponding neighbor pixels in the filter mask range
(Putra et al, 2017). The outcome of the Gaussian blur
function is a bell shaped curve as shown in Figure 1
as the pixel weight depends on the distance metric of
the neighboring pixels (Chauhan, 2018).
The filter kernel size is a factor that affects the
performance and processing time of the convolution
process. In our study, we used odd numbers for the
kernel width: 7x7, 13x13, 15x15 and 17x17.
Figure 1: The 2D Gaussian Function.
2.4 Related Work
Optimizing image convolution is one of the important
topics in image processing that is being widely
explored and developed. The effect of optimizing
Gaussian blur by running the filter on CPU multicore
systems and its improvement from single CPU had
been explored by Novák et al. (2012). Also, exploring
the effect of running Gaussian blur filter using CUDA
has been explored by Chauhan (2018). In the previous
studies, the focus was getting the best performance
from multicore CPUs or from GPU compared to the
sequential code. Samet et al. (2015) presented a
comparison between the speed up of real time
applications on CPU and GPU using C++ language
and Open Multi-Processing OpenMP. Also, Reddy et
al. (2017) presented a comparison between the
performance of CPU and GPU was studied for image
edge detection algorithm using C language and
CUDA on NVIDIA GeForce 970 GTX. In our study
we are exploring the performance improvement