H.264/SVC ROI ENCODING WITH SPATIAL SCALABILITY

Lino Ferreira

Instituto de Telecomunicações, Instituto Politécnico de Leiria/ESTG, Leiria, Portugal

Luís Cruz

Instituto de Telecomunicações, Universidade de Coimbra/DEEC, Coimbra, Portugal

Pedro Assunção

Instituto de Telecomunicações, Instituto Politécnico de Leiria/ESTG, Leiria, Portugal

Keywords: ROI, Spatial scalability.

Abstract: This paper proposes two H.264/AVC compliant methods for encoding Regions-of-Interest (ROI) with

spatial scalability and evaluates their respective rate-distortion-complexity performance. The base layer is

kept unchanged and provides lower resolution images with roughly constant quality, without identification

of the ROI. In the proposed methods there is no need to encode contour information because the ROI is

implicitly defined in the upper layer of the spatial resolution in a transparent way by using different

encoding parameters for the ROI and its complementary region. It is shown, that spatial scalability in ROI

can be efficiently used to enhance specific regions of an image sequence in both spatial resolution and

quality with low coding complexity. The proposed encoding scheme is suitable for remote surveillance,

medical applications and entertainment, where higher resolution and higher quality ROI is a useful

functionality for object/face recognition, selective encryption, detail analysis, etc.

1 INTRODUCTION

The region-of-interest (ROI) functionality in visual

information representation and transmission systems

defines a set of methods and tools which allow

selection, extraction and specific processing of

important regions within the image acquisition/display

area. Support for region specific differentiated coding

has long been sought as a desirable feature for both

image and video compression algorithms, as evidenced

by any overview of the existing literature on this topic.

In image coding several solutions have been proposed,

taking different forms depending on the underlying

coding principles, e.g. whether the coding algorithm is

based either on block transforms or wavelets. In video

transmission and storage ROI-based coding has also

been studied in the recent video coding standards (e.g.,

H.264/AVC) which already offer some limited support

for this functionality through the use of Flexible

Macroblock Ordering (FMO) into different slice groups

(Thang, 2005), (Bae, 2006), (Lambert, 2006) and (Van

Leuven, 2006). In recent years, along with the

development efforts of the scalable extension of H.264,

H.264/SVC, availability of scalable ROI coding

functionalities was identified as an important

requirement to be fulfilled in future standards (ISO/IEC

TC1/SC29 WG11, 2005).

The H.264/SVC standard supports scalability in

terms of spatial and temporal resolution as well as the

variation of reconstruction quality (SNR). This type of

encoding is more flexible and adjustable for different

communication technologies and user requirements

(bandwidth, resolution, etc.). In scalable ROI a single

frame can be split into several independent regions

which in turn may be encoded at different SNR, spatial

and temporal scalabilities. In general, spatial or

temporal qualities can be assigned to ROI in order to

guarantee a predefined quality level while the

background region can be encoded at lower quality.

This paper deals with spatially scalable ROI coding

where the aim is to achieve efficient encoding of ROI

with both better quality than the background area and

higher spatial resolution than the base layer. The base

layer is intended to provide a low resolution signal with

an acceptable spatial quality in the whole image while

212

Ferreira L., Cruz L. and Assunção P. (2008).

H.264/SVC ROI ENCODING WITH SPATIAL SCALABILITY.

In Proceedings of the International Conference on Signal Processing and Multimedia Applications, pages 212-215

DOI: 10.5220/0001938302120215

 SciTePress

in the enhancement layer the ROI is the only useful

image area. Therefore spatial and quality scalability is

only achieved for the ROI, which should contain the

image area of interest for target applications. In the

following sections, the Rate-Distortion and Complexity

performance of two methods, compliant with

H.264/SVC, is evaluated and compared with

straightforward encoding without ROI.

2 H.264/SVC ROI WITH SPATIAL

SCALABILITY

The underlying idea to achieve efficient encoding of

the ROI in the higher resolution layer is to minimise

the number of bits spent in the background region of

the higher resolution images. In the base layer there is

no distinction between ROI and background. One of

the methods proposed in this work is based on coarse

quantisation of the background region and finer

quantisation of the ROI in the high resolution layer. In

this method, the macroblocks (MBs) of the background

region, i.e., outside the ROI, are encoded with the

maximum quantisation scale allowed by H.264/SVC

(Qp=51) in order to maximise the number of null

coefficients. The other method is based on setting to

zero the transform coefficients of the MBs outside the

ROI regardless their value. Note that in this case

quantisation is avoided for these MBs. In both

methods, the ROI is defined by a mask, providing a

ROI map (ROImap) which is used by the encoder to

identify the ROI MBs though it is not encoded into the

video stream.

2.1 QP

Outside ROI

The functional implementation of this method is

depicted in Figure 1. In each MB of the high resolution

layer, the QP value is switched between 51 and the QP

value selected for the current MB, either for MBs

located outside the ROI or within the ROI,

respectively. The ROI is not defined in the base layer,

thus the whole image is normally encoded at a lower

resolution.

Therefore, the quality of ROI MBs is much higher

than that of the MBs outside the ROI and consequently

most of the bits used in the high resolution layer are

assigned to the ROI. Note that in the high resolution

layer the only useful information that needs to be

encoded is the ROI itself, because the lower quality and

resolution of the background region provided by the

base layer should be enough for the envisaged

application.

Figure 1: Qp

functional diagram.

2.2 Set-to-Zero

The objective of this method is the same as the

previous one: to spend no bits in the MBs outside the

ROI and to increase the subjective quality of ROI in the

higher resolution layer. In the Set-to-Zero method, the

transform coefficients of residual blocks are set to zero

for those MB outside the ROI. Thus, the encoder sets

the syntax element coded block pattern (CBP) to 0. The

Figure 2 shows Set-to-Zero functional diagram.

Figure 2: Set-to-Zero diagram.

3 SIMULATION RESULTS

The performance of the two methods described in the

previous section was evaluated in regard to rate-

distortion and encoding complexity. Separate

experiments were carried out for Intra and Inter coding

modes. The proposed methods were implemented using

the JVT reference software, version 8.9, as a basis

framework. The test sequence “Mobile” was used in

the experiments with two layers QCIF@30fps (base

layer), CIF@30fps (enhancement layer) and two ROIs

(ROI1, ROI2) with different sizes were used. ROI1 is a

192x144 pel image region covering the area of the

calendar numbers and ROI2 is the whole calendar, as

shown in Figure 3.

In the experiments the following settings were used

for the Intra test: two spatial layers (QCIF and CIF) at

30fps; NumberReferenceFrames 1; FastSearch; Loop

Filter on. The coding parameters were as follow: for

the base layer: CABAC; Basic QP 35; FRExt no; for

layer 1: CABAC; InterLayerPred on; FRExt on. The

Inter tests the were used: two spatial layers (QCIF and

CIF); 30 frames; NumberReferenceFrames 1;

FastSearch; Loop Filter on; MaxDelay 1200; GOPsize

H.264/SVC ROI ENCODING WITH SPATIAL SCALABILITY

213

16; IntraPeriod 16. The configurations of base layer

and the layer 1 are equal to Intra test.

Figure 3: ROI1 and ROI2 definition.

The simulations were preformed on PC with a

2.4GHz processor and 1.0 GB of RAM memory. The

rate-distortion performance of both methods was

evaluated as well as the computational complexity

measured as the processing time per frame.

The bitrate shown in Figure 4 and Figure 5 is the

sum of both bitrates of base layer and layer1. The

various bitrates were obtained by using different QP in

layer1 while the QP of base layer is constant (QP=35).

The ROI PSNR (i.e. the PSNR computed for the pixels

within the ROI) is shown in the figures for Intra and

Inter coding, respectively. For reference, the two

proposed methods’ results are compared with results

from an experiment where the higher layer is totally

encoded using the same QP without distinguishing the

ROI and the background. These “ground-truth” results

are labelled SVC-without_ROI.

A. Intra Coding

The rate-distortion performance of the Intra case is

shown in Figure 4. The Set-to-Zero method is

compared with Qp

and with SVC-without_ROI. The

encoding complexity is shown in Table 1 for both

ROIs. From the figures it is clear that the efficiency of

the Set-to-Zero method is consistently better for both

ROIs in the Intra case. In ROI1 this method produces a

PSNR about 2dB higher than the Qp

method. As one

can see in the figures, the overall quality gain of the

proposed methods is much higher when compared to

SVC-without_ROI.

For the lower bitrates in ROI1, the Set-to-Zero method

produces a PSNR about 6.5dB higher than SVC-

without_ROI and at higher bitrates the gain is about

13dB. For the ROI2 the gains of Set-to-Zero are

smaller than in the case of ROI1. About 0.4dB-0.5dB

higher than Qp

and 2,5dB-7,5dB higher than SVC-

without_ROI for low and high bitrates, respectively.

For the same PSNR, both the Qp

method and SVC-

without_ROI produce more bits than Set-to-Zero for

encoding ROI1 and ROI2.

(a)

(b)

Figure 4: Intra case: Rate- Distortion (a)ROI1-Numbers

(b)ROI2-Calendar.

Table 1: Processing time of encoding: (a) ROI1 (b) ROI2.

Set-to-Zero

[ms/frame]

SVC-

without_ROI

[ms/frame]

182,53 195,83 262,26

174,37 187,35 225,24

167,54 179,58 192,12

(a)

Set-to-Zero

[ms/frame]

SVC-

without_ROI

[ms/frame]

204,97 217,07 262,26

189,10 198,85 225,24

174,37 182,66 192,12

(b)

Table 1 shows the processing time of the two proposed

methods as well as SVC-without_ROI. From this table

one can conclude that the coding complexity of the Set-

to-Zero method is smaller than that of the other two

(Qp51 and SVC-without_ROI) for both ROIs. For

ROI1, the processing time is reduced by 12% to 30%

with Set-to-Zero compared to SVC-without_ROI and by

7% compared to the Qp

method. In the case of ROI2,

the processing time of Set-to-Zero is reduced 9% to

22% compared to SVC-without_ROI and 5% compared

SIGMAP 2008 - International Conference on Signal Processing and Multimedia Applications

214

to Qp

. The lower complexity achieved by the Set-to

Zero is mainly due to the fact that quantisation is not

computed for the Mbs outside the ROI which

significantly reduces the number of computations.

B. Inter Coding

The performance of Inter coding is shown in Figure 5.

In this case, the efficiency of Set-to-Zero is closer to

. In ROI1 the gains of both proposed methods are

practically the same for low bitrates, while for higher

bitrates the Qp

method produces gains of about 0.8dB

and 1.2dB compared with Set-to-Zero and SVC-

without_ROI, respectively. In ROI2, the Set-to-Zero

yields better results relatively to the other methods. It is

about 0.4dB better than Qp

and nearly 2.6dB better

than SVC-without_ROI. Table 2, shows that the

processing time depends on either the ROI dimension,

the QP and the coding methods used. In this case, the

processing time is greater than in the Intra case.

However, as in the Intra case, the Set-to-Zero method

is better than the other methods.

(a)

(b)

Figure 5: Inter case: Rate - Distortion (a)ROI1-Numbers

(b)ROI2-Calendar.

Table 2: Processing time of encoding: (a)ROI1 (b)ROI2.

Set-to-Zer0

[ms/frame]

SVC-

without_ROI

[ms/frame]

168000,72 169000,05 170000,31

168000,72 169000,03 169000,41

168000,71 168000,87 169000,40

(a)

Set-to-Zero

[ms/frame]

SVC-

without_ROI

[ms/frame]

168000,74 169000,05 170000,31

168000,73 169000,04 169000,41

168000,71 169000,03 169000,40

(b)

4 CONCLUSIONS

The performance of the ROI coding methods proposed

in this paper shows that spatially scalable ROIs can be

obtained at very good quality by using selective

encoding for each region in the higher resolution layer.

The results obtained also show that the Set-to-Zero

method is less computationally complex than Qp

which makes it a good candidate for software-based

implementations. By keeping the coded stream fully

compatible with the H.264/SVC standard, the proposed

methods are suitable for a wide range of applications

where only specific regions of a video sequence are

needed at higher spatial resolution (e.g., remote

surveillance, medical apps, etc).

REFERENCES

T.C. Thang, T.M. Bae, Y.J. Jung, Y.M. Ro, J.-G. Kim, H.

Choi, 04/2005 "Spatial Scalability of Multiple ROIs

for Surveillance Video". In ISO/IEC MPEG & ITU-T

VCEG JVT-O037.

T. M. Bae, T. C. Thang, D. Y. Kim, Y. M. Ro, 01/2006

"Multiple ROI support in scalable video coding". In

Proc. SPIE Electronic Imaging, Vol. 6074.

P. Lambert, W. De Neve, Y. Dhondt, R. Van de Walle,

2006, “Flexible Macroblock Ordering in H.264/AVC”,

in Journal of Visual Communication and Image

Representation, vol. 17.

Van Leuven Sebastiaan, Van Schevensteen Kris, Dams

Tim and Schelkens Peter, 07/2006 “An

Implementation of Multiple Region-Of-Interest Models

in H.264 /AVC”, Masters Thesis, Department of

Industrial Sciences and Technology, University

College of Antwerpen, Belgium.

ISO/IEC JTC1/SC29/WG11, 2005, “Applications and

Requirements for Scalable Video Coding”, N6880.

H.264/SVC ROI ENCODING WITH SPATIAL SCALABILITY

215