Table 2: Speedups of WPP.
Cfg. Seq. QP
HM11
Time[s]
Speedups for N Threads
6, 8, 10, 12, ideal (15 or 20)
(ra)
(1)
22 12866 4.04, 4.51, 4.62, 4.75, 4.75
27 10950 3.99, 4.48, 4.61, 4.77, 4.80
32 9741 4.03, 4.43, 4.62, 4.79, 4.84
37 8921 3.99, 4.42, 4.62, 4.80, 4.84
(2)
22 11925 3.90, 4.30, 4.43, 4.56, 4.58
27 9977 3.92, 4.36, 4.46, 4.62, 4.61
32 8882 3.98, 4.39, 4.48, 4.66, 4.66
37 8255 3.95, 4.43, 4.56, 4.73, 4.72
(3)
22 20177 4.17, 4.77, 5.02, 5.29, 5.39
27 17001 4.17, 4.82, 5.03, 5.33, 5.42
32 14861 4.20, 4.81, 5.02, 5.32, 5.40
37 13393 4.22, 4.84, 5.03, 5.33, 5.40
(4)
22 13430 4.22, 4.75, 5.01, 5.29, 5.35
27 11291 4.25, 4.80, 5.04, 5.32, 5.43
32 10185 4.22, 4.80, 5.02, 5.29, 5.42
37 9608 4.18, 4.80, 5.02, 5.28, 5.43
(in)
(1)
22 3216 3.75, 4.16, 4.37, 4.53, 4.58
27 2659 3.74, 4.10, 4.30, 4.48, 4.51
32 2405 3.68, 4.06, 4.27, 4.47, 4.51
37 2274 3.69, 4.11, 4.32, 4.51, 4.55
(2)
22 3910 3.69, 4.13, 4.32, 4.48, 4.54
27 3213 3.72, 4.12, 4.30, 4.48, 4.53
32 2750 3.68, 4.12, 4.30, 4.49, 4.53
37 2432 3.69, 4.09, 4.28, 4.47, 4.53
(3)
22 4548 3.84, 4.36, 4.58, 4.89, 5.10
27 3866 3.78, 4.33, 4.56, 4.84, 5.07
32 3409 3.80, 4.29, 4.53, 4.83, 5.02
37 3091 3.82, 4.29, 4.51, 4.82, 4.99
(4)
22 4465 3.86, 4.40, 4.68, 4.99, 4.98
27 3789 3.83, 4.39, 4.65, 4.97, 4.97
32 3350 3.82, 4.37, 4.62, 4.96, 4.97
37 3045 3.81, 4.28, 4.60, 4.93, 4.96
(ld)
(1)
22 19489 4.14, 4.59, 4.70, 4.80, 4.83
27 16811 4.09, 4.59, 4.72, 4.83, 4.84
32 14901 4.13, 4.59, 4.72, 4.86, 4.88
37 13447 4.09, 4.59, 4.73, 4.86, 4.89
(2)
22 18421 4.01, 4.45, 4.56, 4.69, 4.73
27 15242 3.99, 4.41, 4.52, 4.67, 4.70
32 13346 4.00, 4.42, 4.54, 4.70, 4.70
37 12193 4.06, 4.45, 4.60, 4.75, 4.75
(3)
22 28965 4.22, 4.85, 5.08, 5.34, 5.45
27 24653 4.24, 4.88, 5.10, 5.37, 5.45
32 21938 4.24, 4.86, 5.10, 5.38, 5.46
37 20050 4.26, 4.86, 5.12, 5.40, 5.50
(4)
22 19973 4.20, 4.83, 5.14, 5.39, 5.41
27 16724 4.30, 4.84, 5.13, 5.40, 5.47
32 14935 4.31, 4.84, 5.11, 5.37, 5.47
37 13829 4.23, 4.84, 5.07, 5.33, 5.45
or has to wait for a memory transfer, the other thread
can immediately take over and do its work. The usu-
ally large overhead caused by context switches is thus
severely reduced. The small improvements on the
graphs between 12 and T
ideal
threads suggest that the
described conditions still hold true, even with slower
context switches. The switches are overall slower in
this case, because, more often than before, the exe-
cution contexts of some threads need to be temporar-
Figure 4: Average Speedups of WPP for the Profiles (ran-
dom access), (intra), and (low delay).
ily stored in and eventually be reconstructed from the
RAM. Finally, it can be seen that intra configurations
benefit less from WPP overall. This is due to the fact
that if only intra-prediction is used, the CTU-loop is
less complex and contributes a considerably smaller
proportion to the total processing time. Thus smaller
overall time savings are achieved.
AMulti-ThreadedFull-featureHEVCEncoderBasedonWavefrontParallelProcessing
97