3D Dual-Tree Discrete Wavelet Transform Based
Multiple Description Video Coding
Jing Chen
1, 2
, Canhui Cai
1
and Li Li
1
1
School of Information Science and Engineering, Huaqiao University, Xiamen, China
2
School of Information Science and Engineering, Xiamen University, Xiamen, China
Keywords: Multiple Description Coding, Video Coding, Dual-Tree Discrete Wavelet Transform, Layer Coding.
Abstract: A 3D dual-tree discrete wavelet transform (DT-DWT) based multiple description video coding algorithm is
proposed to combat the transmitting error or packet loss due to Internet or wireless network channel failure.
Each description of the proposed multiple description coding scheme consists of a base layer and an
enhancement layer. First, the input image sequence is encoded by a standard H.264 encoder in low bit rate
to form the base layer, which is then duplicated to each description. Second, the difference between the
reconstructed base layer and the input image sequence is encoded by a 3D dual-tree wavelet encoder to
produce four coefficient trees. After noise-shaping, these four trees are partitioned into two groups,
individually forming enhancement layers of two descriptions. Since the 3D DT-DWT equips 28 directional
subbands, the enhancement layer can be coded without motion estimation. The plenty of directional
selectivity of DT-DWT solves the mismatch problem and improves the coding efficiency. If all descriptions
are available in the receiver, a high quality video can be reconstructed by a central decoder. If only one
description is received, a side decoder can be used to reconstruct the source with acceptable quality.
Simulation results have shown that the quality of reconstructed video by the proposed algorithm is superior
to that by the state-of-the-art multiple description video coding methods.
1 INTRODUCTION
Due to the effect of transmission error, packet loss
and over-delay problems caused by unreliable
networks, the quality of video transmitted by
Internet or wireless network is degraded, which
restrains the development of multimedia services. As
an effective error resilient coding technique,
Multiple Description Coding (MDC) has received a
lot of attention from researchers and has been
applied in variety of scenarios to insure a robust
video transmission (Goyal, 2001).
The MDC includes three steps. First, the source
data is coded to form multiple self-decodable bit
streams, named descriptions. Then, descriptions are
transmitted over diverse channels to the receiver. In
the receiver, there are two sorts of decoder, central
decoder and side decoder, with which, video can be
reconstructed by either high quality or acceptable
quality according to the number of correctly
received descriptions. When a network transmission
error occurs so that only one description is received,
the received description is decoded by the side
decoder. The quality is incrementally improved with
more received descriptions. When all descriptions
are received correctly, the highest fidelity of
reconstruction can be achieved by the central
decoder.
Since the first MDC image coding scheme,
multiple description scalar quantizer (MDSQ),
published by Vaishampayan in 1993 (Vaishampayan,
1993), many other MDC methods has been
developed, including multiple description scalar
quantizing coding, multiple description transform
coding (MDTC), multiple description subband
coding (MD-SPIHT), polyphase transform and
selected quantization multiple description coding
(PTSQ), Unequal protected MDC and layered based
MDC, etc.
Researches on multiple description video coding
(MDVC) emerged in 1999. The pioneer ones, such
as MDSQ based video coding (Vaishampayan,
1999) and MDTC based multiple description video
coding (Reibman, 2002), were direct extension of
the multiple description image coding method. The
problem with these methods is the reference
180
Chen J., Cai C. and Li L..
3D Dual-Tree Discrete Wavelet Transform Based Multiple Description Video Coding.
DOI: 10.5220/0005057701800185
In Proceedings of the 11th International Conference on Signal Processing and Multimedia Applications (SIGMAP-2014), pages 180-185
ISBN: 978-989-758-046-8
Copyright
c
2014 SCITEPRESS (Science and Technology Publications, Lda.)
mismatch between the transmitting side and the
receiving side, introduced by the errors of unreliable
network transmission. How to solve the mismatch
problem is the primary task for multiple description
video coding. To eliminate the mismatch, motion
estimation and compensation were individually
performed in each description (Tillier, 2007), which
lowered the coding efficiency and increased the
computational complexity.
As the emergence of H.264 coding standard,
several H.264 based multiple description video
coding method were proposed to improve the error
resilient ability of H.264. Bernadini and Durigon
proposed a pholyphase spatial subsampling multiple
description coding (PSS-MDC) (Bernardini, 2004),
which separated the source video into four
subsequences by spatial subsampling, and then
coded each subsequence by an H.264 encoder to
form four bitstreams, which were transmitted
through diverse channels. Even three of them were
lost, the method was able to insure an acceptable
reconstructed quality of the input video. Based on
PSS-MDC, Wei et. al. paired subsequences to form
two descriptions and proposed a neighboring pixel
prediction algorithm to reduce the redundancy of
PSS-MDC (Wei, 2006). Campana et. al. innovated
the MDSQ by mapping more zero coefficients to
improve the coding efficiency (Campana, 2008). A
slice optimal allocation for H.264 multiple
description video coding was proposed by Tillo
(Tillo, 2008). However, it was not suitable for low
bitrate transmission.
Stimulated by the thought of layered based MDC
scheme, we proposed an H.264 and dual-tree
discrete wavelet transform based multiple
description video coding. The base layer is produced
by feeding input into an H.264 encoder with low
bitrate and copied into each description. The
enhancement layer is formed by four trees of
wavelet transform coefficients outputed by the dual-
tree discrete wavelet transform. These coefficient
trees are paired into two sets and sent them into
separate description. Each description comprises a
base layer and an enhancement layer, which is
transmitted through diverse channel. The simulation
results have shown the efficiency and the error
resillient ability of the proposed method.
The rest of this paper is organized as follows.
H.264 and dual-tree discrete wavelet transform is
introduced in Section 2. Section 3 concentrates on
how to generate descriptions of the proposed
multiple description video coding scheme.
Simulation results and analysis are illustrated in
Section 4. Concluding remarks are given in Section
5.
2 DUAL-TREE DISCRETE
WAVELET TRANSFORM
2.1 Dual-tree Discrete Wavelet
Transform (DT-DWT)
To improve the directional selection and shift-
invariant property of the traditional discrete wavelet
transform (DWT), Nick Kingsbury proposed the
dual-tree complex wavelet transform (DT-CWT) in
1998. Its directional subband decomposition make
DT-CWT nearly shift-invariant and higher
directional selectivity. However, the DT-CWT is an
over-complete transform with plenty of redundancy
(2
n
:1 for n-dimensional signal). By analysis of the
real part and imaginary part of wavelet coefficients,
Selesnick found these two parts have the same
directional selectivity and either one could serve as a
wavelet transform to halve the number of
coefficients. In this light, Selesnick (Selesnick, 2005)
proposed the dual-tree discrete wavelet transform
(DT-DWT). Figure 1 shows the six high frequency
directional subbands of 2D DT-DWT.
Figure 1: The directional subbands of 2D DT-DWT.
2.2 Implementation of 3D Dual-tree
Discrete Wavelet Transform
3D dual-tree discrete wavelet transform can be
separately performed by four 3D discrete wavelet
transforms (Wang, 2007). Figure 2 shows one of the
transform, where
denotes the convolution
operation,
2
is down sampling by 2;
x
,
y
and
z
represent three directions of axis, horizontal, vertical
and time, respectively;
0
h
and
1
h
are low-pass filter
and high-pass filter, which form a Hilbert
transformed pair, and insure the perfect
reconstruction of the discrete wavelet transform. For
convenient,
0
hx
is used to denote the convolution
3DDual-TreeDiscreteWaveletTransformBasedMultipleDescriptionVideoCoding
181
of video source with low-pass filter
0
h
and
2
x
to
show the down sampling process in horizontal
direction; Many symbols are likewise defined in
Figure 2. As shown in Figure 2, there are one low
frequency subband (LLL) and seven high frequency
subbands (LLH LHL LHH HLL HLH
HHLHHH) in one 3D DWT. Since there are four
3D DWT in one 3D DT-DWT structure, 28 high
frequency subbands guarantee variety of directional
selectivity of 3D DT-DWT, as shown in Figure 3.
2
x
2
x
2
y
2
y
2
y
2
y
2
z
2
z
2
z
2
z
2
z
2
z
2
z
2
z
0
h
1
h
0
h
0
h
0
h
0
h
0
h
0
h
1
h
1
h
1
h
1
h
1
h
1
h
Figure 2: The filter band of 3D DT-DWT.
Figure 3: The 28 directional high frequency subbands of
3D DDWT.
2.3 The Sparse Representation of
Coefficients of 3D DT-DWT
As mentioned above, the number of 3D DT-DWT
coefficients is four times as much as that of the
traditional 3D DWT. Based on the observation that
most 3D DT-DWT coefficients with small values
have little contribution on reconstructed images,
Reeves and Kingsbury (Reeves, 2002) proposed a
noise shaping method to thin out the coefficients.
The so-called noise shaping in fact is an iterative
projection between the image domain and the
wavelet transform domain, during which only
significant coefficients are picked up. And then they
are modified to compensate the error incurred due to
the loss of insignificant coefficients. The rationality
and effectiveness of the sparse representation
resulted from noise shaping can be found in some
video coding schemes (Wang, 2007; Li, 2009).
3 3D DT-DWT BASED LAYERD
MULTIPLE DESCRIPTION
VIDEO CODING
3.1 The Proposed Layered Multiple
Description Video Coding Scheme
It is well known that the layered structure of video
coding can provide a convenient way to change its
bit rate for different bandwidth requisition of
heterogeneous clients and for dynamic network
congestion reduction. Consequently, the layered
MDC method can take the advantage of the layer
structure by providing common information and
private information for each description to form the
base layer and the enhancement layer respectively.
To avoid the mismatch, the inter-frame prediction in
the enhancement layer is undesired.
Figure 4 shows the proposed layered MDC
scheme, where the base layer of each description is
encoded by an H.264 encoder, the residual between
source video and the one reconstructed from base
layer is encoded by a 3D DT-DWT encoder to form
the enhancement layer. Since the 3D DT-DWT
equips 28 directional subbands, the enhancement
layer coding can be done without motion estimation.
In this way, the mismatch is eliminated and the
computational complexity of the enhancement layer
coding is reduced. Moreover, the proposed multiple
description video coding scheme can keep higher
coding efficiency.
Figure 4: The 3D DT-DWT layered MDC.
3.2 The Base Layer Coding
Since the base layer contains the common
information of all descriptions, to reduce the
redundancy of the MDC scheme, H.264 video
coding is adopted to provide the high coding
SIGMAP2014-InternationalConferenceonSignalProcessingandMultimediaApplications
182
efficiency. The coded bitstream is then copied to
both descriptions, to ensure the primary information
of video. The features of H.264, such as multi-
reference frame mode and 1/4 pixel precision of
motion estimation, guarantee the accuracy of its
inter-frame prediction. The precise intra-frame
prediction coding further improves its coding
efficiency.
3.3 The Enhancement Layer Coding
The study of 3D DT-DWT coefficient trees (Figure
3) shows the high directional correlation between
tree 1 and tree 3, tree 2 and tree 4, respectively.
Combining tree 1 and tree 4 to form the
enhancement layer of description 1, tree 2 and tree 3
to form the enhancement layer of description 2, even
one of descriptions is lost, the other one can
effectively provide the directional information.
Based on the above observation, the residual of
souce video and reconstructed one from the base
layer is encoded by a 3D DT-DWT encoder to form
four wavelet coefficient trees. The noise shaping
then applied to reduce the redundancy of 3D DT-
DWT. The resulted four 3D wavelet trees are coded
by 3D SPIHT (Kim, 2000) to form the enhancement
layers. These four tree bistreams are grouped to form
enhancement layers of two descriptons: tree 1 and
tree 4 are allotted to description 1, tree 2 and tree 4
are allotted to description 2. Since the directional
correlation of two descriptions conveys some
information of each other, even one description
losts, the other one can provide an acceptable quality
reconstruction.
3.4 Decoding Algorithm
If all descriptions are correctly received, the source
video can be perfectly decoded from the base layer
and enhancement layers of both descriptions by
using the central decoder. If only one description is
received, the side decoder is used instead. Firstly,
the base layer is reconstructed, and then two wavelet
trees in the correctly received description are used to
reconstruct two received wavelet coefficient trees
and two missed ones by exploiting the directional
correlation of four wavelet coefficient trees. The
base layer ensures the elementary quality of
reconstructed video, and the received enhancement
layer can effectively improve the quality of
reconstructed image based on the received residual
and directional information.
4 EXPERIMENTAL RESULTS
AND ANALYSIS
To verify the proposed method, lots of experiments
are conducted using different motion types of video
sequences (slow, moderate, fast) with different
formats (cif, qcif) and different frame rates.
The first set of experiments is to compare the
proposed method with other wavelet transform
based multiple description video codecs. As shown
in Figure 5 and Figure 6, the proposed method
outperforms the LMDVC (Chen, 2008) and the one
by Tiller (Tillier, 2007) in both central decoder and
side decoder with different bitrates, especially in the
high bitrate.
50 100 150 200 250 30
0
29
30
31
32
33
34
35
36
37
38
Kbps
PSNR(dB)
LMDVC central
Proposed central
50 100 150 200 250 300
26
28
30
32
34
36
38
Kbps
PSNR(dB)
LMDVC side
Proposed side
(a) central decoder (b) side decoder
Figure 5: The comparison of the proposed method with
LMDVC (Chen, 2008) on Foreman (qcif, 15fps).
0 200 400 600 800 100
0
26
28
30
32
34
36
38
40
42
Kbps
PSNR(dB)
Reference central
Proposed central
0 200 400 600 800 100
0
26
28
30
32
34
36
38
40
42
Kbps
PSNR(dB)
Reference side1
Reference side2
Proposed side1
Proposed side2
(a)central decoder (b) side decoder
Figure 6: The comparison of the proposed method with
Tillier’s (Tillier, 2007) on Foreman (qcif, 30fps).
The second set of experiments is to compare the
proposed method with other H.264 based multiple
description video codecs. Figure 7 shows
experimental results on different motion activity
types of video sequences. On slow (Miss-america)
and moderate motion (Foreman) video sequences,
shown in Figure 7(a) and Figure 7(b), the proposed
method outperforms the PSPT-MDC (Wei, 2006),
especially when only one description is available.
While tested on high speed motion (Football) video
3DDual-TreeDiscreteWaveletTransformBasedMultipleDescriptionVideoCoding
183
sequences, shown in Figure 7(c), 28 directions of 3D
DT-DWT subbands are still not good enough to
represent the scene, which lowered the performance
of central decoder compared with the H.264 based
MDC. However, when the bitrate is higher than
450K/s, the side reconstruction quality is still better
than PSPT-MDC, which proves the error resilient
ability of the proposed method.
0 200 400 600 800 1000 1200 1400 1600
35
36
37
38
39
40
41
Kbps
PSNR(dB)
PSPT-MDC central
Proposed central
0 200 400 600 800 1000 1200 1400 1600
35
36
37
38
39
40
41
Kbps
P
NR
dB
PSPT-MDC side
Proposed side
central decoder side decoder
(a) Miss-america (slow motion, cif, 30fps)
0 200 400 600 800 1000 1200 1400 160
0
27
28
29
30
31
32
33
34
35
36
37
Kbps
PSNR(dB)
PSPT-MDC central
Proposed central
0 200 400 600 800 1000 1200 1400 1600
27
28
29
30
31
32
33
34
35
36
37
Kbps
PSNR(dB)
PSPT-MDC side
Proposed side
central decoder side decoder
(b) Foreman (moderate motion, cif, 30fps)
0 200 400 600 800 1000 1200 1400 160
0
28
29
30
31
32
33
34
35
36
37
Kbps
PSNR(dB)
PSPT-MDC central
Proposed central
0 200 400 600 800 1000 1200 1400 1600
28
29
30
31
32
33
34
35
36
37
Kbps
PSNR(dB)
PSPT-MDC side
Proposed side
central decoder side decoder
(c) Football (fast motion, cif, 30fps)
Figure 7: The comparison of proposed method with PSPT-
MDC (Wei, 2006) on different sequences.
5 CONCLUSIONS
The proposed method takes the advantages of both
multiple description coding and the layer coding. To
form multiple description, the primary information
of video is encoded with H.264 and copied into both
descriptions to ensure the fundamental quality of the
reconstruction. The directional correlation between
descriptions is provided by 3D DT-DWT after noise
shaping and form the enhancement layer of two
descriptions. The high directional selectivity of 3D
DT-DWT saves the bitrate of motion estimation, and
reduces the computational complexity. The
experimental results have shown the effectiveness of
the proposed method.
ACKNOWLEDGEMENTS
This work is partially supported by the National
Natural Science Foundation of China (No.
61372107) and the Xiamen Key Science and
Technology Project Foundation under the Grant
3502Z20133024.
REFERENCES
V.K. Goyal, 2001. Multiple description coding:
Compression meets the network.
IEEE Signal
Processing Magazine
.
V.A. Vaishampayan, 1993. Design of multiple description
scalar quantizers.
IEEE
Transactions
on
Information
Theory
.
V.A. Vaishampayan, S. John, 1999. Balanced interframe
multiple description video compression. In
Proceeding
of IEEE International Conference on Image
Processing (ICIP).
R.A. Reibman, H. Jafarkhani, Y. Wang, M.T. Orchard,
and R. Puri, 2002. Multiple description video coding
using motion compensated temporal prediction.
IEEE
Transactions on Circuits and System for Video
Technology
.
C. Tillier, T. Petrisor, B. Pesquet-Popescu, 2007. A
motion-compensated overcomplete temporal
decomposition for multiple description scalable video
coding.
EURASIP Journal on Image and Video
Processing
.
R. Bernardini, M. Durigon, R. Rinaldo, L. Celetto, and A.
Vitali, 2004. Polyphase spatial subsampling multiple
description coding of video streams with h.264. In
Proceeding of IEEE International Conference on
Image Processing (ICIP)
.
Z. Wei, C. Cai, K. Ma, 2006. H.264-based multiple
description video coder and its DSP implementation.
In
Proceeding of IEEE International Conference on
Image Processing (ICIP)
.
O. Campana, R. Contiero, and G. A. Mian, 2008. An
H.264/AVC video coder based on a multiple
description scalar quantizer.
IEEE Transactions on
Circuits and Systems for Video Technology
.
T. Tillo, M. Grangetto, and G. Olmo, 2008. Redundant
slice optimal allocation for H.264 multiple description
SIGMAP2014-InternationalConferenceonSignalProcessingandMultimediaApplications
184
coding,
IEEE Transactions on Circuits and Systems
for Video Technology
.
I.W. Selesnick, R. G. Baraniuk, and N. G. Kingsbury,
2005. The dual-tree complex wavelet transform.
IEEE
Signal Processing Magazine
.
B. Wang, Y. Wang, I. Selesnick and A. Vetro, 2007.
Video coding using 3D dual-tree wavelet transform.
EURASIP Journal on Image and Video Processing
.
T.H. Reeves, N.G. Kingsbury, 2002. Overcomplete image
coding using iterative projection-based noise shaping.
In
Proceeding of IEEE International. Conference on
Image Proceeding (ICIP)
.
L. Li, C. Cai, 2009. Multiple description image coding
using dual-tree discrete wavelet transform. In
proceeding of International Symposium on Intelligent
Signal Processing and Communication Systems
(ISPACS)
.
J. Chen, C. Cai, 2008 . Layered multiple description video
coding, In
Proceeding of IEEE International
Conference on Signal Processing (ICSP)
.
B.J. Kim, Z. Xiong, and W.A. Pearlman, 2000. Low bit-
rate scalable video coding with 3-D set partitioning in
hierarchical trees (3-D SPIHT).
IEEE Transactions on
Circuits and Systems for Video Technology
.
3DDual-TreeDiscreteWaveletTransformBasedMultipleDescriptionVideoCoding
185