depth 2 indices 4, 5, 6 and 7(D2CB4 + D2CB5 +
D2CB6 + D2CB7). The cost is updated in D1CB1.
Similarly in stage 3 and stage 4 split decisions are
taken and the best cost is updated in D1CB2 and
D1CB3. Once stage 4 is completed, stage 5
compares the cost of depth-0 DOCB0 and total cost
of split CTB (D1CB0 + D1CB1 + D1CB2 +
D1CB3). Note that the cost at depth 1 can be sum of
costs at depth 2. Decision is made whether the depth
0 CB is split or not split.
Figure 5: Different stages in Method-1 flow.
In this method, execution of all stages is
sequential. The sequential flow has an advantage
that the neighbor blocks’ (Left, Top, Top Left, Top
Right and Bottom Left) “accurate” information is
available to the CB being processed. Accurate
neighbor information refers to the to-be-coded mode
information of the neighbors. Since accurate
neighbor information is available to CB, it is
possible to generate exact AMVP list and Merge list
for Inter Mode Decision and MPM list for Intra
Mode Decision, which helps to estimate the bits
required for coding in a more precise manner. Thus,
it is possible to estimate the rate during cost
computation with greater accuracy or rather exactly.
This decision process is able to choose the RD
optimal MVP index, Merge index and MPM index
for inter, merge and intra blocks, respectively.
The approach used in this method is more
suitable for an encoder which has to achieve very
good compression at lower bitrates, which is a high
quality encoder. Also, it is suitable for encoders
which perform the full rate distortion optimization
(RDO) using Lagrangian Optimization or using
Viterbi algorithms. Although this method delivers
good results in compression and quality, this method
is computationally expensive and unfriendly to
multi-processor/parallel-programmable system. This
method’s sequential nature makes each succeeding
stage to wait for the current stage to get completed.
3.2 Method 2 (Performance Friendly)
The approach used in this method is more suitable
for an encoder which has to achieve good
performance (in multi-processor scenario) with a
little compromise on quality. Although, slight
compromise is made w.r.t. neighbor information
availability to the current CB during cost estimation.
Method 2 achieves better performance by taking
advantage of parallel processing. The approach used
in Method 2 is shown in Fig.6.
The example discussed here assumes that a
maximum of 3 depths are allowed and 4 processors
are available for processing. The color in the figure
indicates that the processor on which the block of
data is being processed. The entire mode decision
process is divided into three stages. Each stage is
executed by one or more processors depending on
the stage or on the availability of the processors.
Figure 6: Different stages in Method-2 flow.
In the first stage, each processor operates on one
depth. Processor 1, processor 2, processor 3 and
processor 4 operate on depth 0, depth 1, depth 2
(top) and depth 2 (bottom), respectively. To balance
the processing load, cost computation process in
depth2 is split between processor 3 and processor 4.
At the end of stage 1, the cost of each CB is
available at all depths. All the processors need to
sync after stage 1. In stage 2, decision is made
whether each CB of depth 1 (D1CB0, D1CB1,
D1CB2 and D1CB3) needs to be split to depth 2 or
to be coded at depth 1. The decision process for each
of the 4 CBs can be parallelized across the 4
processors. Again all the processors need to sync
after stage 2 is completed. In stage 3, decision is
made whether the depth 0 is to be coded or the
output of stage 2 is to be retained.
Approach used in method 2 makes good use of
multi-core processing environment. Method two can
be modified easily to work on different number of
cores or even a single core. Since method 2 does not
use accurate neighbor information for estimating the
SIGMAP2013-InternationalConferenceonSignalProcessingandMultimediaApplications
26