presented. Section IV gives a conclusion.
2 DATA REUSE IN TWO-LEVEL
HIERARCHICAL MOTION
ESTIMATION
2.1 Data Reuse Schemes for Full
Search Motion Estimation
FS ME algorithm checks each candidate inside SA.
If the horizontal SR is [-p
H
, p
H
), the vertical SR [-p
V
,
p
V
), and the size of the block N×N, the number of
positions to check is 4p
H
×p
V
and the size of SA is
(2p
H
+ N – 1)×(2p
V
+ N – 1). Since the adjacent
candidate blocks inside SA and SAs of adjacent
current blocks are highly overlapped, it creates the
opportunity for effective data reuse and EMB
reduction at the expense of the on-chip memory size
increase.
In (Tuan et al., 2002), four levels of the data
reuse have been distinguished from Level A to D.
The higher the level, the larger EMB reduction,
however, the larger the size of the on-chip memory
at the same time. Level A reuses pixels of two
horizontally adjacent candidate blocks; Level B,
pixels of two vertically overlapped candidate block
strips; Level C, pixels of two horizontally
overlapped SAs of two adjacent current blocks; and
Level D, pixels of two vertically overlapped SA
strips. The most popular scheme of Level C is
presented in Fig. 1. With HD1080p video, 30 fps, N
= 16, and [-192, 192)× [-128, 128) SA, requirements
for the on-chip memory size and EMB are 101 kB
and 1.17 GB/s at Level C, and 574 kB and 60 MB/s
at Level D, assuming eight-bit-pixel precision. It is
clear that the implementation any of these data reuse
schemes might be too costly either for the sake of
EMB or the on-chip memory size, and further
reduction of these parameters is necessary.
2.2 Two-level Hierarchical Search ME
Algorithm
Hierarchical search is quite popular approach used in
many VLSI architectures to reduce the
computational complexity of ME. Usually, two or
three levels of hierarchy are used, and MV found on
the higher (coarse) level becomes the search center
for the next (finer) level. The size of a current block
is maintained fixed or increased on each level. When
size of a current block is kept constant on each level,
an initial MV is obtained from a relatively large area
which makes it less noise sensitive but often also
less accurate as a larger block on the coarse level
covers a few blocks on the finer level. Thus, in the
proposed solution, it has been decided to scale the
current block size according to subsampling of the
reference frame.
Regarding the number of search levels, two
levels of hierarchy are most beneficial from the data
reuse point of view, since only on the first level SAs
of adjacent blocks are overlapped. In general, SAs
on the fine level are disjointed and have to be
fetched from memory separately for each current
block. Thus, any additional level of hierarchy
increases EMB.
To create the coarse-level image, two approaches
have been considered: subsampling with 4:1 factor
in each direction, and low-pass filtering by simple
averaging. Subsampling is accomplished by the
selection of one pixel from each 16×16 block of a
reference frame and does not required any extra
computation or memory, however, the presence of
noise might deteriorate an initial MV estimation. On
the other hand, averaging of a 16×16 block requires
15 additions and one division by 16 (which can be
easily accomplished by shifting) and the averaged
image must be prepared in advance and stored in an
external memory. With the noise reduction, a more
accurate estimation of an initial MV might be
expected.
Figure 1: Level C data reuse scheme for FS algorithm.
Overlapped and reused area is grey coloured.
On the first search level of hierarchy, FS ME is
performed on a subsampled or averaged SA. On the
next level, the refinement of an initial MV found on
the previous stage is performed on the full-resolution
SA but much smaller than an initial one. The initial
experiments with five CIF sequences (Container,
Football, Foreman, Mobile, News), 150 frames each
and H.264/AVC JM12.0 reference software with SR
+/-32, quantization parameter QP = 25, one
reference frame, variable block size, and quarter-
pixel ME, allowed to determine that the refinement
range +/-8 is sufficient to achieve the performance
close to OLFS both for low- and high-motion
activity sequences. The averaged SA gives better
SIGMAP 2010 - International Conference on Signal Processing and Multimedia Applications
160