Multiview Point Cloud Filtering for Spatiotemporal Consistency

Robert Skupin, Thilo Borgmann and Thomas Sikora

Communication Systems Group, Technische Universit¨at Berlin, Berlin, Germany

Keywords:

Point Cloud Filtering, Multiview Resampling, Spatiotemporal Consistency.

Abstract:

This work presents algorithms to resample and ﬁlter point cloud data reconstructed from multiple cameras

and multiple time instants. In an initial resampling stage, a voxel or a surface mesh based approach resamples

the point cloud data into a common sampling grid. Subsequently, the resampled data undergoes a ﬁltering

stage based on clustering to remove artifacts and achieve spatiotemporal consistency across cameras and time

instants. The presented algorithms are evaluated in a view synthesis scenario. Results show that view synthesis

with enhanced depth maps as produced by the algorithms leads to less artifacts than synthesis with the original

source data.

1 INTRODUCTION

Stereoscopic video that allows for depth perception

through two view points is already a mass market

technology and means for acquisition, transport, stor-

age and presentation are broadly available. Auto-

stereoscopy is targeted towards a more immersive

viewing experience with glasses-free display tech-

nology and an increased freedom of viewer position

(Smolic et al., 2006). The necessary rich scene rep-

resentations consist of camera or camera views from

more than two view points. Out of this large data

set, only two suitable camera views are visible to a

spectator at a given time to achieve depth perception

(Dodgson, 2005). Transmitting the necessary amount

of camera views may challenge the available infras-

tructure, e.g. with respect to capacity. A solution may

be the transmission of a limited set of camera views

with their associated depth information, referred to as

depth maps, to allow synthesis of additional camera

views at the end device without sacriﬁcing transmis-

sion capacity (Vetro et al., 2008).

However, the quality of synthesized views de-

pends on the quality of the provided depth informa-

tion, i.e. its accuracy and consistency, which may

be degraded, e.g through compression or estimation

errors (Merkle et al., 2009). In (Scharstein and

Szeliski, 2002) and (Seitz et al., 2006), the authors

give an extensive taxonomy and evaluation of tech-

niques for multiview reconstruction algorithms that

distinguishes global methods, e.g. graph-cuts based

optimization algorithms as in (Vogiatzis et al., 2005)

or (Starck and Hilton, 2005) and local methods such

as depth map fusion (Merrell et al., 2007). Camera

view point, image noise as well as occlusions, texture

characteristics, motion or other inﬂuences challenge

multiview reconstruction algorithms. As illustrated

in Fig. 1 on the widely used ballet data set (Zitnick

et al., 2004), estimated depth maps may suffer from

object boundary artifacts across camera view points

(a), movement artifacts at static areas over time (b) or

artifacts at revealed occlusions from edge view points

of the camera setup (c).

Another aspect that can make the representation

and processing of multiview data cumbersome is the

(a) (b)

(c)

Figure 1: Various types of depth map artifacts: (a) across

cameras, (b) over time, (c) caused by occlusions.

531

Skupin R., Borgmann T. and Sikora T..

Multiview Point Cloud Filtering for Spatiotemporal Consistency.

DOI: 10.5220/0004681805310538

In Proceedings of the 9th International Conference on Computer Vision Theory and Applications (VISAPP-2014), pages 531-538

ISBN: 978-989-758-009-3

 2014 SCITEPRESS (Science and Technology Publications, Lda.)

Scene

Figure 2: Illustration of the sampling grid of two cameras

andC

capturing a scene and the perspective transforma-

tion of the sampling grids into each other.

distribution of samples across multiple camera sam-

pling grids. A camera sensor constitutes a discrete,

rectangular and uniform sampling grid. Using multi-

ple cameras to capture a common scene as illustrated

in Fig. 2, samples taken by a ﬁrst camera C

can be

transformed to the sampling grid of a second camera

with a different camera view point and vice versa.

The resulting spatial distribution of transformed sam-

ples on the target camera sampling grid is typically

continuous and projectively distorted. Reconstruct-

ing a point cloud from camera sensor samples, the

individual 3D points are structured through a 3D pro-

jection of the 2D sampling grid of the camera. This

sampling grid in 3D space is thereby determined by

the camera position, orientation and properties, i.e.

the intrinsic and extrinsic camera parameters. Us-

ing multiple cameras for point cloud reconstruction,

the respective sampling grid projections are superim-

posed and the spatial distribution of 3D points is nei-

ther rectangular nor uniform. A representation that

conforms to a single sampling grid but preserves as

much of the captured information as possible simpli-

ﬁes subsequent processing, e.g. ﬁltering of the incon-

sistencies illustrated in Fig. 1.

We build our work on the algorithms presented

in (Belaifa et al., 2012) that resample and ﬁlter point

cloud data reconstructed by multiple cameras to cre-

ate a consistent representation across cameras. In

this work, we enhance the presented algorithms and

extend them to resample and ﬁlter multiview video

over time and across cameras. Furthermore, an objec-

tive evaluation scenario is presented in order to ob-

jectively evaluate the performance of the presented

algorithms. Therefore, the source camera views to-

gether with the enhanced depth maps are tested in a

view synthesis scenario against synthesis results of

the original source material.

Section 2 presents the algorithms while the eval-

uation procedure and results are given in section 3

followed by a short summary and conclusion in sec-

tion 4.

2 POINT CLOUD RESAMPLING

AND FILTERING

The presented algorithms operate in two consecutive

stages, i.e. resampling and ﬁltering as illustrated in

Fig. 3. First, in the initial resampling stage, samples

of moving objects are removed from the source data

for temporal ﬁltering and a color matching scheme is

applied. Two separate approaches for alignment of

3D points to a common 3D sampling grid are pre-

sented, i.e. either through sampling voxels in 3D

space or reconstructed surface meshes. Second, the

resampled data undergoes a ﬁltering stage based on

clustering to remove artifacts by a weighting thresh-

old and achieve spatiotemporal consistency across

cameras and time instants. From the resampled and

ﬁltered point clouds, enhanced depth maps can be

produced through projection to the target camera im-

age plane. The algorithms intend to preserve details

captured only by single cameras while ﬁltering arti-

facts of the depth estimation. The following subsec-

tions describe the algorithms in detail.

2.1 Voxel and Mesh based Resampling

In the initial resampling stage, each sample at 2D

sampling point (u,v) taken from each cameraC

, with

i = 0 ... (n − 1) in a n camera data set, is projected

Weight threshold

Source data

Temporal Filtering

1..k

resampled

point cloud

point cloud per

camera

and

time instant

Color

Matching

Enhanced

depth view

Voxel or mesh

resampling

…

1. ResamplingInput 2. Filtering

Output

time instant

interpolated

point cloud

Clustering

Figure 3: Algorithm ﬂow chart.

VISAPP2014-InternationalConferenceonComputerVisionTheoryandApplications

532

voxel border

image plane

of C

ref

camera centre

of C

ref

Figure 4: Schematic of the voxel-based approach with a

sampling volume constituted by a C

ref

sampling grid pro-

jection.

to point (X,Y, Z) in 3D space by means of central pro-

jection with the appropriate camera parameters. The

third component Z, i.e. the depth component in 3D

space, is determined by a non-linear mapping of the

associated sample value in the corresponding depth

map.

In the voxel-based resampling approach, the com-

mon 3D sampling grid is established by projecting

the borders of the 2D sampling grid of a reference

camera C

ref

to 3D space as illustrated through dashed

lines in Fig. 4. The volume enclosed by the scene

depth range and the projected borders of a given 2D

sampling point (u, v) of C

ref

to 3D space constitutes

a sampling voxel. 3D points reconstructed from a

contributing cameraC

, exemplary annotated as P

the ﬁgure, that are located within the respective voxel

constitute the sampling set A(u,v) for each (u,v) of

ref

. This approach is referred to as voxel resampling

in the following sections.

A more complex method to resample the point

cloud data uses a generated 3D surface mesh for each

individual camera C

as illustrated as dashed lines in

Fig. 5. The surface mesh is generated by projecting all

samples of C

to 3D space and constructing triangles

across each pair of neighboring sample rows. The in-

tersection of the surface mesh with the projection of

each sampling position (u, v) of C

ref

is used as ba-

sis to linearly interpolate the depth value at the inter-

section and constitute the sampling set A(u,v). This

approach is referred to as mesh resampling in the fol-

lowing sections.

2.2 Color Discontinuities

When reconstructing a point cloud from multiple

cameras, depth discontinuities at object boundaries

tend to vary notably between camera view points.

Figure 6 provides an example of this observation,

with (a) depicting a point cloud reconstruction of a

foreground detail from C

ref

source data only and (b)

surface mesh

for C

image plane

of C

ref

camera centre

of C

ref

intersection

Figure 5: Schematic of the mesh-based approach. Sampling

points are generated from the intersection of the projected

reference sampling point and the 3D surface mesh of each

camera C

showing a point cloud reconstructed from all cameras

of the data set with a C

ref

camera view overlay and

noticeably dilated object boundaries. Apart from the

additive depth estimation error noise of all cameras,

the imprecision of estimated camera parameters leads

to minor translation and rotation of the reconstructed

point cloud per cameras with respect to each other.

With the basic assumption, that depth disconti-

nuities at object boundaries tend to coincide with

color discontinuities as in (Tao and Sawhney, 2000),

a threshold can be established to ﬁlter samples that

do not match color discontinuities of the C

ref

cam-

era view. Therefore, both resampling approaches use

a threshold m

on the L

distance between the RGB

vector of the currentC

sample and the corresponding

ref

sample to ensure correspondence of the two. The

effect of color matching is depicted in (c) of Fig. 6

showing the same reconstruction as (b) with active

color threshold m

2.3 Extension to the Temporal Domain

In (Belaifa et al., 2012) the authors consider samples

from multiple camera view points while the used data

set actually consists of sequences of video frames and

(a) (b) (c)

Figure 6: Effect of color threshold on depth discontinuities:

(a) point cloud reconstruction of C

ref

, (b) resampled point

cloud from all C

and overlay of the C

ref

camera view with-

out color threshold and (c) with color threshold.

MultiviewPointCloudFilteringforSpatiotemporalConsistency

533

thus contains additional information for point cloud

reconstruction and subsequent processing in the tem-

poral domain. Considering this domain allows ﬁlter-

ing of movement artifacts on static objects such as

the ﬂoor as shown in Fig. 1 (b). For this purpose,

creation of the sampling sets A(u,v) at the time in-

stant t

in the resampling stage relies on source data

from time instants t

...t

where k as the temporal ﬁl-

tering depth. Samples belonging to objects that move

in the course of the additional k frames should not

be added to A(u,v). Therefore, a binary mask view

for source data of each camera C

of time instants

> t

is created and used in the resampling stage ac-

cording to the following procedure.

• Initialize M

to zero and apply Gaussian ﬁlter to

the camera view luma components L of C

at time

instants t

j−1

and t

• Set M

to one at sample positions where the dif-

ference between L

j−1

and L

exceeds a threshold.

If j > 1, additionally set M

to one at sample

positions where the difference between L

and L

exceeds a threshold.

• Compute a dense optical ﬂow map between L

and

j−1

based on polynomial expansion according to

(Farneb¨ack, 2003). Set M

to one at sample posi-

tions with motion vectors that exceed a threshold.

• Dilate M

with a rectangular kernel.

• Samples of C

source data for which the corre-

sponding sample in M

is set to one are not con-

sidered in the subsequent resampling.

An example for the source data masked by M

and

a temporal ﬁltering depth of k = 3 is given in Fig. 7.

A detail of the original camera view from a single

camera C

for time instants t

to t

is given from left

to right in the top row (a). Row (b) shows samples

of corresponding depth map of C

that subsequently

contribute to the resampling stage after M

is applied,

where the white areas at t

to t

correspond to active

areas of M

. The bottom row (c) shows the recon-

structed point cloud of all cameras at t

to t

which

jointly contribute to the resampling stage for t

. In

order to compensate removed samples of moving ob-

jects at t

to t

in the following ﬁltering stage, the

weight of samples from each cameraC

at t

for which

the corresponding sample in M

is set to one is in-

creased accordingly.

2.4 Filtering Stage

In the subsequent ﬁltering stage, the resampled source

data from multiple cameras C

and time instants t

(a)

(b)

(c)

Figure 7: Moving object removal for temporal ﬁltering: (a)

camera view, (b) corresponding depth map with overlay of

mask M

in white, (c) reconstructed point cloud of all cam-

eras C

ﬁltered. While merging the samples of the contribut-

ing cameras within A(u, v), the aim is to ﬁlter arti-

facts and preserve details. Our algorithm is based on

hierarchical group-average linkage clustering as de-

scribed in (Hastie et al., 2001) and regards all sam-

ples in A(u, v) as clusters with weight equal to 1 along

the depth coordinate axis. Iteratively, the two clos-

est samples within A(u,v) are merged by linearly in-

terpolating their depth values to the new depth Z

new

and agglomerating their weight. The position of the

merged sample along the horizontal and vertical di-

mension in 3D space correspond to the projection of

the C

ref

sampling grid position to Z

new

in 3D space.

For each sampling grid position (u,v) of C

ref

, the

clustering process stops when all samples in A(u,v)

have a distance to each other greater or equal than

a minimum cluster distance threshold m

. This end

condition ensures back- and foreground separation of

the given scene. The mapping between the sample

value of a depth map and the depth coordinate in 3D

space is not necessarily linear, e.g. to capture more

depth detail of objects closer to the camera. To ensure

equal preservation of details regardless of the object

depth in the scene, m

is given in terms of depth lev-

els rather than Euclidean distance in 3D space.

After the clustering process, a minimum cluster

weight threshold m

is applied to remove samples

with a weight less than m

from A(u,v), i.e. out-

lier samples that are spatially far from any other sam-

ples in A(u,v) and are thus not merged. As the size

of A(u, v) considerably varies over the sampling grid

depending on camera setup and the resampling ap-

VISAPP2014-InternationalConferenceonComputerVisionTheoryandApplications

534

(a) (b) (c) (d) (e)

Figure 8: Exemplary results of the enhanced depth maps: (a) original depth map,(b) enhanced depth map using voxel re-

sampling, (c) mesh resampling, (d) and (e) with temporal ﬁltering depth k = 2 using voxel resampling and mesh resampling,

respectively.

proach, m

is given as relative to sum of weights

within A(u,v). If none of the samples in A(u,v) sat-

isfy m

, only the sample with the largest weight is

kept.

3 EVALUATION

The presented algorithms are evaluated in a view syn-

thesis scenario on the 100 frames ballet and break-

dancers data set (Zitnick et al., 2004) based on the

assumption that depth maps with less inconsistencies

across cameras and over time lead to a higher quality

of synthesis results. As base line for the evaluation,

the original texture and depth maps of all available

cameras C

are used to synthesize camera views at all

camera view points.

Synthesis is carried out through projection of all

relevant samples of the data set into 3D space and

back to the target image plane of interest. At each

sampling position (u,v) of the target image plane, a

blending procedure of samples ensures decent image

quality while preserving artifacts that originate from

the depth maps. The quality of synthesis results com-

pared to the corresponding original camera views is

measured frame-wise in terms of PSNR and MSSIM

(Wang et al., 2004) with an 8× 8 rectangular window.

The same synthesis procedure is followed using

the original camera views but depth maps enhanced

by the algorithms presented in this work. The synthe-

sis results with enhanced depth maps are compared to

the respective original camera views likewise.

Enhanced depth maps of all cameras were pro-

duced for all frames of the data sets with a tempo-

ral ﬁltering depth of k = {0,... ,2}. This range gives

an outlook on the effects of the proposed technique

on the view synthesis and limits the amount of addi-

tional data that has to be processed in the resampling

and subsequent ﬁltering stage of the algorithms. A

color threshold of m

= 0.18 relative to the com-

bined maximum sample value of the three color chan-

nels and a minimum cluster distance of m

= 0.03

relative to the maximum depth map sample value is

evaluated. The minimum cluster weight m

is cho-

sen as 0.13 of the sum of weights in A(u, v) without

temporal ﬁltering. This choice is motivated by the

amount of n cameras in the data set. A threshold m

slightly higher than 1/n is not satisﬁed by outliers that

stem from single cameras and remain disjoint during

clustering. As the number of samples to process in

the ﬁltering stage grows with the temporal ﬁltering

depth, so does the weight of outlier clusters after ﬁl-

tering. Therefore, the threshold of removal of outliers

is increased to m

= 0.26 and m

= 0.34 for k = 1

and k = 2, respectively.

Across all experiments, the quality of the synthe-

sis results is above 32dB PSNR and an MSSIM of

0.86 on average. Figure 8 shows a detail of an origi-

nal depth map of the ballet sequence of the right-most

camera (a) and the corresponding enhanced depth

map as produced by the voxel resampling (b) and the

mesh resampling approach (c) without ﬁltering in the

temporal domain. Exemplary results for a temporal

ﬁltering depth of k = 2 are given for voxel resam-

pling in (d) of Fig. 8 and mesh resampling in (e). It

can be seen from the ﬁgure that the algorithms suc-

Figure 9: Exemplary results for ﬁltering in the temporal do-

main.

MultiviewPointCloudFilteringforSpatiotemporalConsistency

535

cessfully reduce artifacts at object boundaries and re-

vealed occlusions. Figure 9 shows a detail of the same

four consecutive frames of the ballet sequence as in

Fig.1 (b) after applying the algorithm with voxel re-

sampling and a temporal ﬁltering depth of k = 2 with

noticeably smoothed artifacts of the static areas.

The top plot in Fig. 10 reports the ∆PSNR of syn-

thesis results of the two algorithms with both data

sets compared to the synthesis results of the original

source data over the temporal ﬁltering depth k and

averaged over all frames and all cameras while the

bottom plot reports the results in terms of ∆MSSIM.

Positive values report a quality improvement for the

algorithms.

It can be seen that the synthesis results based on

the enhanced depth maps achieve a ∆PSNR above

0.45dB for the ballet data set and a minimal positive

tendency with an increasing temporal ﬁltering depth.

For the breakdancers data set that contains more mo-

tion only a smaller quality improvement is reported

and the results show a lower quality increase with in-

creasing temporal ﬁltering depth. Although overall

MSSIM gains are minimal, they show a similar be-

0 1 2

0.2

0.4

0.6

0.8

∆ PSNR [dB]

Temporal Filtering Depth k

Ballet − Voxel

Ballet − Mesh

Breakdancers − Voxel

Breakdancers − Mesh

0 1 2

x 10

−3

∆ MSSIM

Temporal Filtering Depth k

Figure 10: Synthesis quality improvements for both data

sets over temporal ﬁltering depth with presented algorithms

in PSNR and SSIM averaged over all cameras and frames.

havior and tendency as the PSNR measurements.

Figure 11 reports the results of the experiments

over camera index i without performing temporal ﬁl-

tering and averaged over all frames. It can be noticed

that the best performing synthesis achieves well above

1dB PSNR gain for the ballet data set and MSSIM

measurements show a tendency similar to that of the

PSNR measurements. A noticeable drift of results can

be observed from the left-most to the right-most cam-

era for both tested data sets, which may be caused by

scene geometry and synthesis setup, i.e. depth map

artifacts may impair synthesis results to a varying ex-

tent.

While in general the synthesis results with en-

hanced depth maps of the ballet data set (little mo-

tion with strong back-/foreground separation) show

notable objectiveand visual improvements, the results

of the breakdancers data set (much motion with weak

back-/foregroundseparation) do so only to a small de-

gree.

Figure 12 gives a comparison of the original cam-

era view (a), the synthesis result with source data (b),

the synthesis results of voxel resampling (c) and mesh

0 1 2 3 4 5 6 7

−0.5

0.5

1.5

∆ PSNR [dB]

Camera Index i

Ballet − Voxel

Ballet − Mesh

Breakdancers − Voxel

Breakdancers − Mesh

0 1 2 3 4 5 6 7

−0.01

−0.005

0.005

0.01

0.015

∆ MSSIM

Camera Index i

Figure 11: Synthesis quality improvements for both data

sets over camera index i with presented algorithms in PSNR

and MSSIM averaged over all frames.

VISAPP2014-InternationalConferenceonComputerVisionTheoryandApplications

536

(a) (b) (c) (d) (e) (f)

Figure 12: Exemplary results of the view synthesis: (a) original camera view, (b) synthesis with source data, (c) voxel

resampling, (d) mesh resampling, (d) and (e) temporal ﬁltering depth of k = 2 for voxel resampling and mesh resampling,

respectively.

resampling (d) without temporal ﬁltering, the respec-

tive synthesis results with a temporal ﬁltering depth

k = 2 in (e) and (f). While effects in the temporal

domain are hard to notice from single frames, synthe-

sis artifacts related to depth map artifacts on object

boundaries and revealed occlusions can be noticed

in (b) which do not occur in (c) to (f). Overall, the

quality of synthesis results with enhanced depth maps

does not vary signiﬁcantly with a negligible positive

margin for the voxel based approach and temporal ﬁl-

tering.

4 SUMMARY AND

CONCLUSIONS

This work presents algorithms to resample and ﬁlter

point cloud data reconstructed from multiple cameras

and multiple time instants. In an initial resampling

stage a voxel and a surface mesh based approach are

presented to resample the point cloud data into a com-

mon sampling grid. Subsequently, the resampled data

undergoes a ﬁltering stage based on clustering to re-

move artifacts of depth estimation and achieve spa-

tiotemporal consistency. The presented algorithms

are evaluated in a view synthesis scenario. Results

show that view synthesis with enhanced depth maps

as produced by the algorithms leads to less artifacts

than synthesis with the original source data. The dif-

ference in achieved quality between the voxel and the

surface mesh based approach is negligible and with

regard to the computational complexity of the surface

mesh reconstruction, the voxel based approach is the

desirable solution for resampling.

Filtering in the temporal domain shows slight syn-

thesis quality improvements when moving objects are

conﬁned to a limited region of the scene as in the bal-

let data set. For data sets in which moving objects

cover larger areas such as in the breakdancer data set,

temporal ﬁltering does not improve synthesis results

compared to ﬁltering across cameras. The presented

motion masking excludes samples within a relatively

wide image area with respect to the actual moving ob-

ject. Therefore, depth map artifacts in the correspond-

ing areas are not interpolated in the ﬁltering stage and

thus affect synthesis.

REFERENCES

Belaifa, O., Skupin, R., Kurutepe, E., and Sikora, T. (2012).

Resampling of Multiple Camera Point Cloud Data. In

Consumer Communications and Networking Confer-

ence (CCNC), 2012 IEEE, pages 15–19. IEEE.

Dodgson, N. A. (2005). Autostereoscopic 3D Displays.

Computer, 38(8):31–36.

Farneb¨ack, G. (2003). Two-Frame Motion Estimation

Based On Polynomial Expansion. In Image Analysis,

pages 363–370. Springer.

Hastie, T., Tibshirani, R., and Friedman, J. J. H. (2001). The

Elements of Statistical Learning, volume 1. Springer

New York.

Merkle, P., Morvan, Y., Smolic, A., et al. (2009). The

Effects of Multiview Depth Video Compression On

Multiview Rendering. Signal Processing: Image

Communication, 24(1-2):73–88.

Merrell, P., Akbarzadeh, A., Wang, L., Mordohai, P.,

Frahm, J.-M., Yang, R., Nister, D., and Pollefeys, M.

(2007). Real-Time Visibility-Based Fusion of Depth

Maps. Computer Vision, IEEE International Confer-

ence on, 0:1–8.

Scharstein, D. and Szeliski, R. (2002). A Taxonomy and

Evaluation of Dense Two-Frame Stereo Correspon-

dence Algorithms. International journal of computer

vision, 47(1):7–42.

Seitz, S., Curless, B., Diebel, J., Scharstein, D., and

Szeliski, R. (2006). A Comparison and Evaluation of

Multi-View Stereo Reconstruction Algorithms.

Smolic, A., Mueller, K., Merkle, P., Fehn, C., Kauff,

P., Eisert, P., and Wiegand, T. (2006). 3d Video

and Free Viewpoint Video-Technologies, Applica-

tions and Mpeg Standards. In Multimedia and Expo,

2006 IEEE International Conference on, pages 2161–

2164. IEEE.

Starck, J. and Hilton, A. (2005). Virtual View Synthesis of

People From Multiple View Video Sequences. Graph-

ical Models, 67(6):600–620.

MultiviewPointCloudFilteringforSpatiotemporalConsistency

537

Tao, H. and Sawhney, H. S. (2000). Global Matching Crite-

rion and Color Segmentation Based Stereo. In Appli-

cations of Computer Vision, 2000, Fifth IEEE Work-

shop on., pages 246–253. IEEE.

Vetro, A., Yea, S., and Smolic, A. (2008). Toward a 3D

Video Format for Auto-Stereoscopic Displays. In

Optical Engineering+ Applications, pages 70730F–

70730F. International Society for Optics and Photon-

ics.

Vogiatzis, G., Torr, P. H., and Cipolla, R. (2005). Multi-

View Stereo Via Volumetric Graph-Cuts. In Computer

Vision and Pattern Recognition, 2005. CVPR 2005.

IEEE Computer Society Conference on, volume 2,

pages 391–398. IEEE.

Wang, Z., Bovik, A. C., Sheikh, H. R., and Simoncelli,

E. P. (2004). Image Quality Assessment: From Error

Visibility to Structural Similarity. Image Processing,

IEEE Transactions on, 13(4):600–612.

Zitnick, C., Kang, S., Uyttendaele, M., Winder, S., and

Szeliski, R. (2004). High-Quality Video View Inter-

polation Using a Layered Representation. In ACM

Transactions on Graphics (TOG), volume 23, pages

600–608. ACM.

VISAPP2014-InternationalConferenceonComputerVisionTheoryandApplications

538