plementation, which is not a novelty by itself, we
study the impact of OpenACC directives considera-
tions on the scalability and memory access for GPU
and CPU processors. The remainder of paper is or-
ganized as follows. Section 2 provides a basic back-
ground of the optical flow method and describes the
Lucas-kanade algorithm. we start with a baseline
multicore implementation in section 3. Section 4
and 5 gives an overview of GPU architecture and
OpenACC programming paradigms respectively. In
Section 6, we describe our OpenACC parallelization
and we provide a commented report of our experi-
mental results . Section 7 concludes the paper and
outline some perspectives.
2 RELATED WORK
Beside the quality of the estimation, the execution
time is also important (A. Garcia-Dopico, 2014;
S. Baker, 2004), especially with the consideration
of the real-time constraint. Indeed, since the algo-
rithm is likely to be applied on the consecutive fra-
mes of a live video, it should as fast as possible.
Implementation of the Lucas-Kanade algorithm on
the graphics processor Unit (GPU), in the GPGPU
standpoint, is seriously considered. In (J.Marzat,
2009), Marzat, Dumortier and Ducroct propose a pa-
rallel implementation on a GPU to compute a dense
and accurate velocity field using NVIDA Gt200 card,
which achieved 15 velocity field estimations per se-
cond on 640x480 images. Another relevant contri-
bution is presented in (S.A. Mahmoudi, 2014), in
which the authors implemented the optical flow mo-
tion tracking using Lucas-kanade combined with Har-
ris corner detector (only corner pixels are conside-
red) on a Full HD video using multiple GPUs. A
thorough implementation study is provided by Plyer,
Guy and Champagnat in (A. Plyer, 2016) denoted
by eFLOKI. It is a robust, accurate and high perfor-
mance method even on large format images. Duven-
hage, Delport and Jason(B. Duvenhage, 2010) also
investigated a GPU implementation using the Open
Graphics Library (OpenGL) and the Graphics Library
Shading Language (GLSL), with a performance simi-
lar than a comparative CUDA implementation. Other
authors have addressed the parallelization of optical
flow computation on FPGA (A. Garcia-Dopico, 2014;
R. Allaoui, 2017). They conclude that both have si-
milar performance, although their FPGA implemen-
tation took much longer to develop. An implementa-
tion on the CELL processor (C66xDSP) is provided
and discussed by Zhang and Cao (F. Zhang, 2014).
Regarding the multicore parallelization of the al-
gorithm, the work by (Kruglov, 2016) for instance
describes an updated method in order to speed up
the objects movement between frames in a video se-
quence using OpenMP. Another multi-core paralleli-
zation is proposed in (N. Monz, 2012). Pal, Biemann
and Baumgartner (I. Pal, 2014) discuss how the velo-
city of vehicles can be estimated using optical flow
implementation parallelized with OpenMP. Moreo-
ver, another hybrid model mitigate the bottleneck of
motion estimation algorithms with a small percentage
of source code modification. In (N. Martin, 2015),
Nelson and Jorge proposed the first implementation
of optical flow of Lucas-kanade algorithm based on
directives of OpenACC programming paradigms on
GPU. Carlos and Guillerom (C. Garcia, 2015) evalu-
ated also the directives of OpenACC with the GPU
performance. In this context our work evaluate also a
new OpenACC implementation but which processes
and analyzes the bottlenecks of the accesses memory.
3 LUCAS-KANADE ALGORITHM
3.1 Optical Flow
The optical flow is a computer vision topic, where the
main kernel is to calculate the apparent motion of fe-
atures across two consecutive frames of a given vi-
deo, thus estimating a global parametric transforma-
tion and local deformations. It is based mainly on lo-
cal spatio-temporal convolutions that are applied con-
secutively. The optical flow has lots of uses, and it
is an important clue for motion estimation, tracking,
surveillance, and recognition applications. Different
methods have been proposed for optical flow estima-
tion (B.D. Lucas, 1981; K.P. Horn, 1981; Gibson,
1950; Adelson and Bergen, 1985; Fleet and Jepson,
1995; Kories and Zimmerman, 1986), and they can be
grouped into block-based methods, spatio-temporal
differential methods, frequency-based methods and
correlation-based method. Each method has its ad-
vantages and its disadvantages, but the main draw-
back is the limited speed and the need of a large
memory space. Over the years, Horn and Schunck
algorithm(K.P. Horn, 1981) and Lucas-kanade algo-
rithm (B.D. Lucas, 1981) have became the most wi-
dely used techniques in computer vision. We have
focused on Lucas-kanade’s approach because is the
most adequate in terms of calculation complexity and
requires less computing resources. The main princi-
ple of the Lucas-Kanade optical flow estimation is to
assume the brightness constancy to find the velocity
vector between two successive frames (t and t+1) as
show in Figure 1, (a) and (b). The optical flow vectors
Efficient GPU Implementation of Lucas-Kanade through OpenACC
769