Reliable Stereoscopic Video Streaming

Considering Important Objects of the Scene

Ehsan Rahimi and Chris Joslin

Department of Systems and Computer Engineering, Carleton University, 1125 Colonel By Dr., Ottawa, ON, Canada

Keywords:

Stereoscopic Video, 3D/Multiview Video, Depth Map and Color Image, Multiple Description Coding, Error

Prone Environment, Region of Interest, Pixel Variation, Coefﬁcient of Variation.

Abstract:

In this paper, we introduce a new reliable method of stereoscopic Video Streaming based on multiple descrip-

tion coding strategy. The proposed multiple description coding generates 3D video descriptions considering

interesting objects contained in its scene. To be able to ﬁnd interesting objects in the scene, we use two metrics

from the second order statistics of the depth map image in a block-wise manner. Having detected the objects,

the proposed multiple description coding algorithm generates the 3D video descriptions for the color video

using a non-identical decimation method with respect to the identiﬁed objects. The objective test results ve-

rify the fact that the proposed method provides an improved performance than that provided by the polyphase

subsampling multiple description coding and our previous work using pixel variation.

1 INTRODUCTION

Errors exist in the received video due to unreliable

communication is one of the common problems that

happens in both wired or wireless networks. In the wi-

red networks, error can occur due to packet loss, cor-

ruption, congestion and large packet delay whereas in

the wireless networks unreliable communication can

stem from temperature noise and interference that ex-

ist in the physical environment. Also, when dealing

with immersive videos, the increase of the data traf-

ﬁc load will consequently produce data congestion.

Therefore, the serious packet failure problem needs to

be addressed since such errors on the delivered video

diminishes the viewing quality experience(Kazemi,

2012; Liu et al., 2015; Tillo and Olmo, 2007; Y. Yapc

and Urhan, 2008; Ates et al., 2008; Wang and Liang,

2007; Wei et al., 2012). To avoid such errors, an error

resilient method of data transmission is required used

by the encoder.

Generally, there are usually three methods in

the communication systems to avoid packet failure:

Automatic Repeat reQuest (ARQ), Forward Error

Correction (FEC), and Error Resilient Coding (ERC)

(Kazemi, 2012). The ﬁrst method, the ARQ approach

requires a network with feedback capability and as a

result it is not suited for real-time or broadcast ap-

plications. The second method, the FEC approach is

designed to cope with a speciﬁc amount of noise error

making it impractical for noise variances that exceed

the threshold level. The third method, the ERC appro-

ach, is the approach of choice in this paper because of

its resiliency against packet corruption or noise fea-

ture. This resiliency is achieved through redundancy

bits added to the data stream. Among a number of

ERC methods, the multiple description coding met-

hod is our method of choice due to its suitability for

the channel with the large noise power. MDC avoids

packet failure because it creates multiple complimen-

tary and separately-decodable descriptions.

Using MDC, a video stream is partitioned into se-

veral separately decodable descriptions and transmit-

ted to its respective receivers. In the receiver, there are

two different types of decoder - the side decoder and

central decoder. The receiver chooses one of the two

decoders based on the availability of error free des-

criptions remaining. With the MDC method should an

error occur in one description, it may be ﬁxed when

considering other error free descriptions.

This paper organizes as follows: a literature re-

view regarding multiple description coding and how

it can be applied on the stereoscopic video, is presen-

ted in Section 2. Then, the proposed method will be

introduced in Section 3 and afterword, test results will

be presented and discussed in Section 4. Finally, we

have a review about our achievement in Section 5.

Rahimi, E. and Joslin, C.

Reliable Stereoscopic Video Streaming Considering Important Objects of the Scene.

DOI: 10.5220/0006616801350142

In Proceedings of the 13th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2018) - Volume 4: VISAPP, pages

135-142

ISBN: 978-989-758-290-5

135

2 STATE OF THE ART

The MDC method is best recognized for its error

robust property at the expense of compression ra-

tio as it adds redundancies in its temporal, spatial

or frequency domain. With the temporal MDC met-

hod, usually two descriptions are produced in or-

der to avoid a drop in the coding efﬁciency. The

drop in the coding efﬁciency is reﬂected when more

than two descriptions are used because the distance

between the assigned frames to each description is

increasing resulting in the motion prediction being

less effective (Liu et al., 2015; Chakareski et al.,

2005). When the network is very noisy, a higher num-

ber of descriptions are required. Therefore the tem-

poral MDC method is no longer a suitable techni-

que. The frequency MDC method partitions Discrete

Cosine Transform (DCT) coefﬁcients between video

descriptions. Because DCT transformation provides

independent components, the descriptions will be less

dependent. To maintain the correlation of the descrip-

tions, extra transformation like Lapped Orthogonal

Transformation (LOT) needs to be applied (Chung

and Wang, 1999; Sun et al., 2009). Therefore the

complexity of frequency MDC methods is higher than

that of both the spatial and temporal MDC methods

respectively. With the spatial MDC method, each vi-

deo frame is partitioned into several lower resolution

subimages using Polyphase SubSampling (PSS) algo-

rithm (Shirani et al., 2001; Gallant et al., 2001; Ka-

zemi, 2012). It is worth mentioning that with a simple

spatial MDC method, there is no precise adjustment

tool over the redundancy in order to control the side

quality(Shirani et al., 2001; Gallant et al., 2001; Ka-

zemi, 2012). This means that there is no control for

the redundancy increase resulting in higher resistivity

to compensate for the higher noise level.

To apply the MDC method for 3D videos, the

depth map image also needs to be partitioned into dif-

ferent descriptions. It is worth mentioning that the

depth map image mainly contains depth information

of the scene’s objects. Because of the nature of the

real objects, depth information of 3D scenes rarely

contain high frequency content. Consequently, the

depth map image can be effectively compressed ef-

fectively resulting in saved bandwidth and disk space

(Fehn, 2004; Hewage, 2014). To improve compres-

sion, Karim et al. have shown that the downsampled

version of the depth map image provides an adequate

reconstruction of the 3D video in the receiver (Karim

et al., 2008). They have experimented with the spatial

MDC method for 3D videos using color plus depth

map image representation. Karim et al. have carried

out experimental tests with a scalable multiple des-

cription coding approach arriving at the same result.

Therefore, it can be said that downsampling of the

depth map image does not cause a considerable de-

gradation in the quality of a reconstructed video. This

is due to the fact that the depth map image includes

low frequency contents or more precisely, the depth

values of adjacent pixels are similar. Consequently,

one can state that the neglected pixels during downs-

ampling can be better predicted. Liu et al utilized the

fact of having similar depth values of pixels for real

objects and introduced a texture block partitioing al-

gorithm in order to perform their MDC algorithm for

wireless multi-path streaming (Liu et al., 2015).

However, multiple description coding has been in-

vestigated for 2D videos thoroughly. More investi-

gation is required to apply MDC to 3D video speci-

ﬁcally. For 2D videos, different MDC methods are

classiﬁed according to the type of data which is di-

vided into descriptions which include: temporal, spa-

tial, frequency, or compressed. For example, with a

temporal MDC method using two descriptions, one

description can be odd frames and the other descrip-

tion even frames. With a spatial MDC algorithm each

video frame is partitioned into several lower resolu-

tion subimages. With a frequency MDC method, the

frequency components divide between descriptions.

Each type of MDC method has its own advantages

and disadvantages with regard to its particular appli-

cation. The temporal MDC method is simple though

unsuitable for an application involving a network with

high packet failure due to its low capability in incre-

asing data redundancy. With the higher complexity

of frequency MDC method, the spatial MDC method

can best accommodate a live HD video conference ap-

plication over an error prone environment.

3 PROPOSED METHOD

This section describes the new proposed multiple des-

cription coding applicable for 3D videos considering

ROI. In order to be able to recognize which part of

the frame is more important or ROI map extraction, a

metric needs to be deﬁned. To this end, two metrics

(PV and CV ) are deﬁned and the result for each me-

tric will be compared at the end. For the ﬁrst metric

(PV ), we calculated the average of the absolute varia-

tions for pixels’ values found in the depth map image

in a block wise manner:

∑

j=1

− µ

| (1)

where µ

is the average of depth values for block i

and PV

stands for the pixel variation of block i; D

VISAPP 2018 - International Conference on Computer Vision Theory and Applications

136

is the depth value of pixel j in ith block and N

is the

total number of pixels in block i (i.e. j = 1, 2, ..., N

Generally, PV

of block i is a Non-negative value that

can vary from zero to inﬁnity. Large PV shows that

block i is probably related to several objects or edges

and very small PV states that block i is likely related

to the far distanced background or the planar objects

for example, a wall. This is due to the fact that the

depth information of an object contains low frequency

contents, naturally.

For the second metric, we deﬁne a new metric

(CV) as the ratio of Pixel Variation (PV ) to the mean

µ, also known as Coefﬁcient of Variation (CoV):

, (2)

where CV

is CoV for the block i within a depth map

image. Like before, µ

stands for the mean value of

depth for the pixels in block i . PV

has already been

deﬁned in Equation (1). Similar to PV , the CV is also

a positive value. When CV of a block equals one then

the depth values of that block have the same mean

and standard deviation values. It can be argued that

blocks with large CV values are probably related to

several objects or edges while blocks with very small

CV values are related to the background of the video

frame. Consequently, they are not the interesting part

of the frame that the ROI extraction algorithm is look-

ing for.

Figure 1 shows an overview of the proposed enco-

der. As can be seen in this ﬁgure, the ﬁrst step of the

proposed encoder is to determine which part of the

frame is more important. One important issue in this

process is its requirement for a low complexity algo-

rithm in order to realize the interesting objects in the

frame. The ROI extraction algorithm proposed in this

paper uses the characteristics of the depth map image

and extracts the map of ROI using one of the metrics

explained in the previous section. In this algorithm,

the ROI range is deﬁned as the distance between σ

min

and σ

max

. σ

min

is the threshold which is used to se-

parate the very far background objects from the in-

teresting objects and σ

max

is the limit used to detect

edges of the interesting objects. Also N

Tot

itr

is the total

possible number of iterations that can be run by the

hierarchical block division algorithm. The algorithm

that identiﬁes the objects is run in four major steps:

• Step 1: Create two empty lists (L

), and as-

sign the entire depth map image as one block to

. Then start the ﬁrst iteration as explained in

step 2.

• Step 2: Check if the algorithm reaches the limit

of N

Tot

itr

or if all blocks in L

are with PV or CV

values smaller than σ

max

or σ

max

, respectively. If

yes, go to step 4. If not, go to step 3. Clearly, in

the ﬁrst iteration there is only one block in L

and

its metrics are with the strong probability greater

than σ

max

• Step 3: For every block in L

with the metric

value greater than the threshold, divide the block

into four equal sized blocks and assign them to L

Any block with metric value less than the thres-

hold is assigned without change to L

. After ha-

ving checked all the blocks in L

, L

is updated

with L

and L

is cleared. Then return back to the

step 2.

• Step 4: All blocks in L

with metric values less

than σ

min

are considered as region I. Blocks with

metric values within the ROI range are considered

as region II and remainders are region III .

In the hierarchical block division algorithm, a

block is partitioned to smaller blocks by dividing the

width and height of the block by a factor 2 in each

iteration. It is worth mentioning that N

Tot

itr

should be

deﬁned in order that the minimum block size be grea-

ter than a 2 × 1 or 1 × 2 pixels block size. This is due

to the fact that both metrics used in this algorithm eva-

luate pixel variation where there is at least two pixels

to measure the variation.

To have reliable video streaming, the proposed

new spatial MDC algorithm exploits the Multiple

Description Coding (MDC) strategy for 3D videos af-

ter ROI extraction algorithm. To this end, four des-

criptions are created using Poly phase SubSampling

(PSS). PSS-MDC is the basic low complex method

that can be used in the spatial domain to have a re-

liable transmission in the error prone environment.

Although, the most important advantage of the PSS-

MDC encoder is its simplicity, there is a capability

lack in increasing the redundancy in order to avoid

errors in the strong noisy environment. To ﬁx this, the

new spatial MDC algorithm enhances the pixel reso-

lution for areas that are less predictable and also on

objects of interest that are more important to focus

on.

As can be seen in Figure 1, two different algo-

rithms are applied on the color video and the depth

map stream. For the depth map stream, the resolu-

tion of each description is enhanced according to its

prediction difﬁculty. Since the metrics deﬁned in this

paper evaluate the variation between adjacent pixels,

it can be said that pixels of the depth map frame are

clustered into regions I to III according to their dif-

ﬁculty prediction levels. This means that the region

I, which includes pixels with very low variations, re-

mains without any change. Pixel resolution in the re-

gion II is enhanced to one second for each description.

Since the region III contains pixels with large variati-

Reliable Stereoscopic Video Streaming Considering Important Objects of the Scene

137

Figure 1: Block diagram of the proposed method.

ons, it is likely that the prediction of a pixel (in case

of missing) from adjacent pixels leads to error. As a

result, this region’s pixel resolution has increased to a

fuller pixel resolution for each description.

Since the region’s clustering algorithm is done

using the depth map image rather than the color video

frame, it cannot reﬂect the pixels’ value variations for

the color video frame. Therefore, the above mentio-

ned argument is no longer applicable. One sugges-

tion with regards to the color video is to apply the

proposed ROI detection algorithm on the color video

stream in order for it to extract ROI map based on the

pixel variation found in the color video frame; but the

drawback is its greater complexity due to a wide va-

riety of colors inherently part of any scene naturally.

As a result, the hierarchical block division algorithm

needs more time to identify different regions in the

frame. Another suggestion is to use the ROI map ex-

tracted from the depth map image to then focus on

region II for the enhancement of pixel resolution in

the color video frame rather than on region III which

is performed within the depth map stream. Since the

human eye is more sensitive to objects rather than of

pixels, this suggestion introduces better performance

with regards to the subjective assessment. Also, it can

provide improvement with regards to the objective as-

sessment since the recording of moving objects inhe-

rently part of the frame in the scene are now more

focused. Because all video coding standards use Dif-

ferential Pulse Code Modulation (DPCM) and prox-

imate pixels’ values of the objects in the color video

frame, the increase of the resolution of those parts of

a frame that include the ROI can be compensated by

DPCM algorithm in point of compression ratio. The-

refore, with regards to the color video stream, region

II and III are enhanced to full and one second reso-

lution, respectively. Region I remains with the same

resolution as before (one fourth). This enhancement

algorithm helps to perfectly recover the ROI in the

instance of missing a description, although at the ex-

pense of increased redundancy.

4 SIMULATION RESULT AND

DISCUSSION

For the assessment of the proposed algorithm, this pa-

per carried out some tests using two stereoscopic test

video sequences with the format of DVD-Video PAL

(720 × 576), called video ”Interview” and ”Orbi”.

Each video has 90 frames and the frame rate is 30

frames per second (fps). The chroma and depth subs-

ampling format is 4: 2: 2 : 4 (the last 4 stands for the

resolution of the depth map image) or in other words

the total frame resolution is 1440 × 576. The new al-

gorithm is implemented using H.264/AVC reference

software, JM 19.0 (Institut, 2015). To encode with

JM software, I frames are repeated every 16 frames

and only P frames are used between I frames.

VISAPP 2018 - International Conference on Computer Vision Theory and Applications

138

(a) Original video frame.

(b) Region I (Ex. by PV ).

(d) Region II (Ex. by PV ). (e) Region II (Ex. by CV ).

(f) Region III (Ex. by PV ). (g) RegionIII (Ex. by CV).

Figure 2: Comparison the performance of extracting diffe-

rent regions (I-III) for the ﬁrst frame of video ”Interview”.

In the remainder of this paper, we will ﬁrst inves-

tigate and compare the Performance and Complexity

of the proposed algorithm using PV and CV . Then,

we will assess the performance of the new proposed

spatial MDC algorithm for streaming in the noisy en-

vironment. It is worth mentioning that to simulate

an error prone environment, we have assumed that

the decoder receives only one description among four

descriptions generated in the encoder.

Figure 2 shows the identiﬁed regions I to III using

PV and CV metrics. As can be seen, the identiﬁed

region II is more accurately depicted with the CV me-

tric rather than with the PV metric. The same sce-

nario is also applicable for the region I. As can be

seen in Figure 2d there are some important pixels that

have not been detected as the region II (ROI). Also we

have identiﬁed some missed pixels in region I (back-

ground) with PV as shown in Figure 2b. Such in-

accuracy in realizing different regions with PV can

be due to the fact that pixel values of different blocks

are in dissimilar ranges. Therefore the pixel varia-

tion (PV ) can not be an appropriate metric to be used

when extracting for regions I and II. To ﬁx this pro-

blem as argued before, it is necessary to normalize

the pixel variation metric(PV ). Indeed, the CV me-

tric is the normalized version of pixel variation and

works like a smoothing ﬁlter. Although using norma-

lized pixel variation metric (CV ) provides a conside-

rable improvement in the extraction of regions I and

II, such performance is not achieved when using the

CV metric in detecting region III (which stands for the

edges). As can be seen in Figure 2, the detected edges

shown in Figure 2g is not as clear as the detected ed-

ges shown in Figure 2f. This can be due to the smoo-

thing effect brought about by the normalization using

the CV metric. As the blocks that contain edges are

considered as blocks with high frequency contents, a

high frequency ﬁlter like the pixel variation measure-

ment (PV ) is more beneﬁcial for identifying the ed-

ges. Therefore, an optimum algorithm can extract the

edges using metric (PV ) an then detect the important

objects using metric (CV).

Table 1 shows the average number of blocks for

different metric values of PV and CV . As can be seen

by this table, about 55% of the depth map image for

video Interview and 40% of the depth map image for

the video Orbi have PV values less than 1. On the ot-

her hand, for the video Interview more than one half

and for video Orbi more than one third of the depth

map image have very close depth values. This is the

reason why the decimation of the depth map image

does not affect its quality when it is reconstructed in

the decoder. Table 1 also shows that about 95% of the

depth map image for both test video sequences have

PV values less than 3. The fact that about 95% of the

depth map image have similar depth values result in

no longer needing to send the depth map image with

its original resolution, justiﬁes why the non-identical

decimation is more advantageous than the identical

decimation sugested by Karim et al. in (Karim et al.,

2008). On the other hand, only about 5% of the depth

map image needs to be encoded with the original reso-

lution. The 95% remainder can be decimated to save

bandwidth or storage.

To investigate how robust the proposed MDC met-

hod is against error, we assumed that only one des-

cription is availble to decoder and all other three des-

criptions have been lost. In order to reconstruct the

video, the decoder estimates the missed pixel value

from the nearest available pixel value. Figure 3 and

Figure 4 compare PSNR and SSIM measurements

of the reconstructed color video for video Interview

using the basic Poly phase SubSampling MDC met-

hod (PSS-MDC), our previous MDC method presen-

ted in (Rahimi and Joslin, 2017), and the new pro-

posed spatial MDC algorithm with the help of PV

and CV metrics. Figure 5 and Figure 6 also show

Reliable Stereoscopic Video Streaming Considering Important Objects of the Scene

139

Table 1: Number of blocks with different metric values after hierarchical division algorithm.

(a) Video ”Interview”.

Blocks’ size Percent of blocks with

6 24 96 384 1536 6144 24576 98304 metric value in a

(2 × 3) (4 × 6) (8 × 16) (16 × 24) (32× 48) (64 × 96) (128 × 192) (256 × 384) speciﬁc range(%)

≤ 1 662.78 371.67 172.54 75.80 22.28 17.82 0.68 0.00 55.68

1 ∼ 3 1008.44 618.18 336.37 133.91 25.44 2.82 0.11 0.00 41.64

3 ∼ 10 831.50 4.77 0.24 0.00 0.00 0.00 0.00 0.00 1.30

≥ 10 898.74 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.37

≤ 0.1 646.10 276.93 150.99 67.53 37.82 18.57 0.00 0.00 56.74

0.1 ∼ 0.2 32.79 11.21 4.24 2.59 2.59 1.28 1.93 0.00 15.57

0.2 ∼ 0.3 45.37 16.27 5.67 3.80 3.92 2.40 0.00 0.00 5.96

0.3 ∼ 0.4 105.10 24.84 4.11 3.00 2.19 2.34 0.00 0.00 5.22

0.4 ∼ 0.5 52.64 29.00 4.31 4.34 0.56 0.84 1.69 0.07 14.55

≥ 0.5 1286.22 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.96

(b) Video ”Orbi”.

Blocks’ size Percent of blocks with

6 24 96 384 1536 6144 24576 98304 metric value in a

(2 × 3) (4 × 6) (8 × 16) (16 × 24) (32 × 48) (64 × 96) (128 × 192) (256 × 384) speciﬁc range(%)

≤ 1 542.72 295.40 172.56 69.13 34.42 4.69 0.80 0.00 39.37

1 ∼ 3 1680.86 752.81 331.84 108.19 39.29 6.48 0.74 0.00 55.95

3 ∼ 10 2276.38 8.09 0.47 0.00 0.00 0.00 0.00 0.00 3.53

≥ 10 753.60 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.15

≤ 0.1 614.22 244.82 118.68 49.28 35.30 6.58 0.81 0.00 39.28

0.1 ∼ 0.2 59.43 28.10 11.99 8.81 5.51 1.64 0.10 0.00 6.76

0.2 ∼ 0.3 79.78 28.24 9.36 5.84 4.41 2.56 0.48 0.40 19.80

0.3 ∼ 0.4 134.88 35.41 10.13 4.74 4.08 1.38 0.22 0.61 21.54

0.4 ∼ 0.5 90.23 41.59 9.84 3.82 2.31 0.80 0.00 0.30 10.66

≥ 0.5 1285.50 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.96

the PSNR and SSIM assessments for the video Orbi.

As can be seen in Figure 3, in the recreated video

Interview about 1 dB improvement for the PV me-

tric and 2 dB improvement for the CV metric can

be achieved by the new proposed spatial MDC alo-

rithm when compared to our previous work presented

in (Rahimi and Joslin, 2017). Regarding video Orbi

(see Figure 5), although a considerable improvement

cannot be seen compared to our previous work, more

than 2 dB improvement has been achieved by the new

proposed spatial MDC algorithm in comparison with

the PSS-MDC method. Regarding to the SSIM as-

sessment, the proposed algorithm provides about 0.3

improvement for both test videos in high rate strea-

ming compared to the PSS-MDC method. It should

be mentioned that since the human eye is more sensi-

tive to objects rather than that of pixels, a subjective

assessment can better emphasize the improved per-

formance brought forward by the proposed algorithm

compared to the previous methods.

When it comes to the evaluation of the proposed

algorithm for the reconstructed depth map image, it

shows a better performance. As shown in Figure 7

and Figure 8 for the video Interview and in Figure 9

and Figure 10 for the video Orbi, the improvement of

the proposed algorithm is considerably evident. This

can be due to the fact that metrics PV and CV are

calculated based on the depth map image and there-

Figure 3: PSNR assessment of color image for video Inter-

view.

fore blocks with larger values of metrics PV and CV

can be considered as the least predictable blocks in

the depth map image. Therefore, focusing on these

pixels in each description results in a more accurate

reconstruction in the decoder. In view of the PSNR

assessment, about 8 dB for video Interview and more

than 10 dB for video Orbi improvement have been

achieved by the proposed algorithm. Such high per-

formance of the proposed algorithm in view of the

SSIM assessment is also more evident compared with

the color video assessment. With regards to the SSIM

assessment, the proposed algorithm outperforms by

more than 0.02 compared to PSS-MDC method.

VISAPP 2018 - International Conference on Computer Vision Theory and Applications

140

Figure 4: SSIM evaluation of color image for video Inter-

view.

Figure 5: PSNR assessment of color image for video Orbi.

Figure 6: SSIM evaluation of color image for video Orbi.

5 CONCLUSION

Multimedia streaming is affected by packet failure in

the network due to packet loss, packet corruption, and

large packet delay. An appropriate solution against

packet failure in the error prone environment can be

multiple description coding (MDC). With MDC, one

video description is partitioned into several separa-

tely decodable descriptions. In the instance of mis-

sing a description during transmission, the decoder

is capable to estimate the lost description from other

error free description(s). To improve the basic spa-

tial partitioning and to be applicable to 3D videos, a

Figure 7: PSNR assessment of the depth map image for

video Interview.

Figure 8: SSIM evaluation of the depth map image for video

Interview.

Figure 9: PSNR assessment of the depth map image for

video Orbi.

non identical decimation algorithm for the stereosco-

pic videos has been provided in this paper. Our algo-

rithm works based on existing objects in the scene and

assigns more bandwidth to the region of interest.Since

human eyes are more sensitive to the objects rather

than that of pixels, the proposed algorithm can pro-

vide an improved performance compared to the PSS

MDC method in view of subjective assessment. Ho-

wever, the objective assessment results conﬁrm the

improved performance achieved by the proposed spa-

tial MDC algorithm. With regard to the depth map

image, the proposed algorithm enhances the current

basic decimation to a non identical decimation. As

Reliable Stereoscopic Video Streaming Considering Important Objects of the Scene

141

Figure 10: SSIM evaluation of the depth map image for

video Orbi.

shown earlier, most parts of the depth map have si-

milar depth values and therefore decimation in those

parts can save bandwidth or storage without conside-

rable quality degradation. However, for the parts of

the frame with high pixels’ value variation, it is re-

commended to keep the original resolution. There-

fore, with the new algorithm those parts of the depth

map image that have large variations is encoded with

the original resolution.

ACKNOWLEDGEMENTS

The authors would like to acknowledge that this re-

search was supported by NSERC Strategic Project

Grant: Hi-Fit: High Fidelity Telepresence over Best-

Effort Networks.

REFERENCES

Ates, C., Urgun, Y., Demir, B., Urhan, O., and Erturk,

S. (2008). Polyphase downsampling based multiple

description image coding using optimal ﬁltering with

ﬂexible redundancy insertion. In Signals and Elec-

tronic Systems, 2008. ICSES ’08. International Con-

ference on, pages 193–196.

Chakareski, J., Han, S., and Girod, B. (2005). Layered

coding vs. multiple descriptions for video streaming

over multiple paths. Multimedia Systems, 10(4):275–

285.

Chung, D.-M. and Wang, Y. (1999). Multiple description

image coding using signal decomposition and recon-

struction based on lapped orthogonal transforms. Ci-

rcuits and Systems for Video Technology, IEEE Tran-

sactions on, 9(6):895–908.

Fehn, C. (2004). Depth-image-based rendering (dibr), com-

pression and transmission for a new approach on 3d-

tv. SPIE: Stereoscopic Displays and Virtual Reality

Systems, 5291:93– 104.

Gallant, M., Shirani, S., and Kossentini, F. (2001).

Standard-compliant multiple description video co-

ding. In Image Processing, 2001. Proceedings. 2001

International Conference on, volume 1, pages 946–

949 vol.1.

Hewage, C. (2014). 3D Video Processing and Transmis-

sion Fundamentals. Chaminda Hewage and book-

boon.com.

Institut, H.-H. (2015). H.264/avc reference software.

Karim, H., Hewage, C., Worrall, S., and Kondoz, A. (2008).

Scalable multiple description video coding for stereo-

scopic 3d. Consumer Electronics, IEEE Transactions

on, 54(2):745–752.

Kazemi, M. (2012). Multiple description video coding ba-

sed on base and enhancement layers of SVC and chan-

nel adaptive optimization. PhD thesis, Sharif Univer-

sity of Technology, Tehran, Iran.

Liu, Z., Cheung, G., Chakareski, J., and Ji, Y. (2015). Multi-

ple description coding and recovery of free viewpoint

video for wireless multi-path streaming. IEEE Jour-

nal of Selected Topics in Signal Processing, 9(1):151–

164.

Rahimi, E. and Joslin, C. (2017). 3d video multiple descrip-

tion coding considering region of interest. In Accepted

in 12th International Conference on Computer Vision

Theory and Applications (VISAPP 2017).

Shirani, S., Gallant, M., and Kossentini, F. (2001). Mul-

tiple description image coding using pre- and post-

processing. In Information Technology: Coding and

Computing, 2001. Proceedings. International Confe-

rence on, pages 35–39.

Sun, G., Samarawickrama, U., Liang, J., Tian, C., Tu, C.,

and Tran, T. (2009). Multiple description coding with

prediction compensation. Image Processing, IEEE

Transactions on, 18(5):1037–1047.

Tillo, T. and Olmo, G. (2007). Data-dependent pre-

and postprocessing multiple description coding of

images. Image Processing, IEEE Transactions on,

16(5):1269–1280.

Wang, J. and Liang, J. (2007). H.264 intra frame coding

and jpeg 2000-based predictive multiple description

image coding. In Communications, Computers and

Signal Processing, 2007. PacRim 2007. IEEE Paciﬁc

Rim Conference on, pages 569–572.

Wei, Z., Ma, K.-K., and Cai, C. (2012). Prediction-

compensated polyphase multiple description image

coding with adaptive redundancy control. Circuits and

Systems for Video Technology, IEEE Transactions on,

22(3):465–478.

Y. Yapc, B. Demir, S. E. and Urhan, O. (2008). Down-

sampling based multiple description image coding

using optimal ﬁltering. SPIE: journal of Electronic

Imaging, 17.

VISAPP 2018 - International Conference on Computer Vision Theory and Applications

142