Perspectively Correct Construction of Virtual Views

Christian Fuchs and Dietrich Paulus

Active Vision Group, University of Koblenz-Landau, Universit¨atsstraße 1, 56070, Koblenz, Germany

{fuchsc, paulus}@uni-koblenz.de

Keywords:

Virtual Views, Homography Deshadowing, Correct Perspective.

Abstract:

The computation of virtual camera views is a common requirement in the development of computer vision

appliances. We present a method for the perspectively correct computation of conﬁgurable virtual cameras

using depth data gained from stereo correspondences. It avoids unnatural warping of 3-D objects as caused by

homography-based approaches. Our method is tested using different stereo datasets.

1 INTRODUCTION

The installation position and orientation of optical

sensors is a crucial issue when planning a computer

vision application. A lot of different requirements

have to be taken into account in order to identify ap-

propriate views of the captured scene for the particu-

lar scenarios. Yet, it is not always possible to mount

a camera in the desired position, e.g. due to geomet-

ric limitations. Therefore, it is not possible to capture

the desired view directly. Common applications are

in top view systems on vehicles of any kind, for ex-

ample, which utilize cameras installed on the vehicle

to compute virtual bird’s views of the surroundings.

Conventional approaches use the images from sin-

gle cameras for the transformation into a virtual cam-

era view. As single camera setups of course do not

provide depth information for the captured scene, the

transformation to a virtual view is a difﬁcult issue.

Most conventional approaches (Section 2) use a ﬂat

world assumption which has several drawbacks in the

resulting virtual camera’s view. Main drawbacks are

the imperfect perspective transformation and the un-

natural warping of 3-D objects.

We address this matter and introduce a method for

creating virtual camera views using depth informa-

tion extracted from stereo images. The goal is to cre-

ate perspectively correct virtual camera views while

avoiding unwanted warping due to violated assump-

tions. Relevant related work is discussed in section 2.

The main drawback of state-of-the-art techniques for

virtual camera view computation is depicted in sec-

tion 3. Sections 4 & 5 introduce the geometric setup

and the mathematic techniques used. Test results are

shown and concluded in sections 6 & 7.

2 RELATED WORK

Several research topics have to be taken into account

when creating virtual camera views. Yet, the follow-

ing paragraphs can only give a short introduction into

the relevant aspects.

2.1 Perspective 2-D Warping

A commonly used approach for the computation of

virtual camera views is the utilization of perspec-

tive transformations as described by Vincent and La-

ganiere (Vincent and Lagani`ere, 2001) and Hartley

and Zisserman (Hartley and Zisserman, 2003): Given

a camera C

and a camera C

. Camera C

captures an

image which shall be converted to the view described

by cameraC

. AssumingC

looks at a plane on which

all objects on the image are located, the plane in 3-

D space can be described using a minimum of four

image points Q



∈ IR



i ∈ IN



, |Q

| ≥ 4

on the image plane. The image plane of C

(re-

spectively C

) is thereby interpreted as a projective

plane. Given corresponding image points Q



∈ IR



i ∈ IN



with |Q

| = |Q

| in image co-

ordinates of C

, a transformation from camera C

camera C

can be formulated using a homography

matrix H

→C

∈ IP

2×2

Assuming that the cameras to have ﬁxed lenses

and a rigid afﬁne transformation between their poses,

matrix H

→C

can be considered constant.

2.2 Camera Geometry

In order to describe a camera’s imaging properties

properly, camera models for the particular camera

626

Fuchs, C. and Paulus, D.

Perspectively Correct Construction of Virtual Views.

DOI: 10.5220/0006233106260632

In Proceedings of the 6th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2017), pages 626-632

ISBN: 978-989-758-222-6

types are needed. Tsai (Tsai, 1987) and Tsai and

Lenz (Lenz and Tsai, 1988) have published funda-

mental work on camera calibration techniques espe-

cially for the pinhole camera model. Their methods

can be used to estimate camera parameters. Based

upon their work, Zhang (Zhang, 2000) has released

an extended approach for camera calibration. Zhang’s

algorithm is widely used in computer vision. Of

course, cameras following another camera model,

need adapted calibration techniques. Geyer and Dani-

ilidis (Geyer and Daniilidis, 2001) published a fun-

damental discussion of the geometry of catadioptric

cameras. Scaramuzza (Scaramuzza, 2008) presents

a solution for proper calibration of omnidirectional

cameras. This method is suitable for catadioptric and

ﬁsheye cameras. The camera parameters estimated

using the methods mentioned are prerequisite for the

connection of camera images to the 3-D world.

2.3 Virtual Camera and Bird’s View

Computation

Various publications address the issue of computing a

virtual camera’s or a bird’s eye view from cameras at

different locations. A common approach for the com-

putation of virtual camera views is the utilization of

homographies (Section 2.1) for the warping process,

which is a reasonable assumption e. g. for vehicles,

as the street can be assumed to be a plane. Consider-

able artifacts are created, e.g. for pedestrians on the

street. Liu, Kin and Chen (Liu et al., 2008) use single

cameras mounted around a car and transform the im-

ages using homography matrices, interpolate and ﬁ-

nally stich the images to a top view image. Addition-

ally, they propose a virtual ﬁsheye view as a bird’s eye

view. Thomas et al. (Thomas et al., 2011) stitch top

view images on a cost-efﬁcient computation system.

Sato et al. (Sato et al., 2013) use ﬁsh-eye cameras

and homographies on spatio-temporal data, whereas

Li and Hai (Li and Hai, 2011) focus on the calibra-

tion of a multi-view bird’s eye view. However, the

approaches incorporate heavy distortions for objects

which violate the ﬂat world assumption.

Virtual camera views are discussed in the ﬁeld of

image-based rendering also. Shum and Kang (Shum

and Kang, 2000) givea survey of different approaches

towards view interpolation. Laveau and Faugeras

(Laveau and Faugeras, 1994) propose view prediction

based on the fundamental matrix using two captured

images. Zinger, Do and De With (Zinger et al., 2010)

discuss a depth based rendering for 3-D-TV applica-

tions based on disparity maps. Vogt et al. (Vogt et al.,

2004) use light-ﬁelds to improve image quality in im-

age sequences. However, these publications discuss

the rendering of virtual camera views from camera

poses nearby the original camera’s views, e. g. light

positions shift and/or light rotation.

Up to the best of the author’s knowledge, no ap-

proach towards perspectively correct virtual cameras

with extensively shifted views was published, yet.

2.4 Stereo Vision

Using a multiple camera setup, it is possible to com-

pute depth information from images using multiple

view geometry. Hartley and Zisserman (Hartley and

Zisserman, 2003) summarize the principle behind this

approach. The basic idea behind it is to make use of

known geometric relations between calibrated cam-

eras and to match the image frames taken at a time.

By solving the point correspondence problem, key-

points in corresponding frames can be used together

with the camera calibration data in order to estimate

the 3-D position of an object relative to the cameras.

The precision of the depth estimates is primarily de-

pendent on the cameras, the baseline of the stereo

setup and the matching algorithms.

Concerning the matching of the image content,

two major approaches exist: On the one hand,

the matching of keypoints, computed using algo-

rithms like Scale-Invariant Feature Transform (SIFT)

(Lowe, 2004), Speeded Up Robust Features (SURF)

(Bay et al., 2006) or Oriented FAST and Rotated

BRIEF (ORB) (Rublee et al., 2011), is a possibility.

In general, image features of any kind can be used

as long as an assignment among the image views can

be established. An example for a feature based ap-

proach towards stereo correspondences was presented

by Grimson (Grimson, 1985). Horaud and Skordas

(Horaud and Skordas, 1989) group features in order

to ﬁnd correspondences.

Approaches using features resp. keypoints for

stereo matching usually result in sparse 3-D data with

high precision matching.

On the other hand, an approach incorporating

block matching constitutes the second main category.

Various algorithms and improvement have been de-

veloped and published so far. Results show dense 3-

D data. The semiglobal matching (SGM) approach by

Hirschm¨uller et al. (Hirschm¨uller, 2008) is based on

mutual information and uses pixel-wise matching. It

shows promising precision properties. The algorithm

has become popular and has already been adapted to

particular scenarios, for example for in-vehicle ap-

plications (Scharwachter et al., 2014; Hermann and

Klette, 2013). An optimized version of the SGM

was presented by Pantilie and Nedevschi (Pantilie and

Nedevschi, 2012). Einecke and Eggert (Einecke and

Perspectively Correct Construction of Virtual Views

627

Eggert, 2015) follow a local correspondenceapproach

in order to signiﬁcantly reduce execution time while

maintaining correspondence quality. A lot of publi-

cations concerning stereo reconstruction have already

been presented. Therefore, this work does not fo-

cus on this issue. Several datasets for the evaluation

of stereo reconstruction datasets have been published

with an appropriate ground truth.

The KITTI Stereo Benchmark (Geiger et al., 2012)

provides stereo datasets from road scenes for exam-

ple. It is commonly for benchmarks for benchmark-

ing stereo correspondence algorithm. Pfeiffer, Gehrig

and Schneider (Pfeiffer et al., 2013) published the

Ground Truth Stixel Dataset, which contains anno-

tated stereo sequence datasets of road scenes.

The Middlebury Stereo Datasets

provide a col-

lection of stereo scenes. We use stereo images from

the newly released 2014 datasets which have been

presented by Scharstein et al. (Scharstein et al., 2014).

The datasets contain high resolution stereo images to-

gether with camera calibration ﬁles and ground truth

disparity maps.

3 HOMOGRAPHY SHADOWING

A commonly used approach for the computation of

virtual camera views is the utilization of perspective

transformations as described in section 2.1.

As introduced in section 1, the perspective

transformation using homographies underlies several

drawbacks, because the assumption of a ﬂat world is

mostly violated in real world scenarios. When deﬁn-

ing a virtual camera V, the corresponding point set

has to be computed. A reasonable approach is to use

a calibration pattern with known geometry here. This

way, the points can be calculated for the desired view.

Of course, the resulting homography matrix H

→V

will rely on the ﬂat world assumption for the deﬁned

plane, which is usually confronted with 3-D objects

visible in the image.

The main effect of the violated ﬂat world assump-

tion is an unnaturally warped image which suffers

from an effect we denote as the Homography Shad-

owing Effect: The shape of the unnaturally warped

object is caused by the violated plane assumption.

The homography transformation incorporates the as-

sumption, that all pixels are on the same plane in 3-D

space. The resulting effect on the image projection is

an effect similar to a shadow effect that occurs when a

point light source is installed at the camera’s position.

The shape of the warped object matches the shape of

http://vision.middlebury.edu/stereo/data/

Figure 1: Homography Shadowing Effect when using the

yellow plane as the homography plane.

the resulting shadow that would be caused by the light

source at the virtual camera’s position.

The effect is illustrated in Figure 1: Camera C

captures the image as depicted on its image plane us-

ing a pinhole camera model. The yellow plane E

is the homography plane for the ﬂat world assump-

tion which is used in the conventional approach. The

viewing rays of C

(blue) of course end at the object.

As no depth information is available, the object will

be mapped onto plane E (ﬂat world assumption) when

applying the transformation. The view of virtual cam-

era V is computed using a homography transforma-

tion. Therefore, V will see the object in shape of its

homography shadow (gray). An example for this ef-

fect is shown in Figure 4.c.

4 VIRTUAL CAMERA

GEOMETRY

In order to create virtual camera views, a deﬁnition

of the virtual camera is needed. The two approaches

followed are perspective and orthographic projection.

For both projections we assume the z-axis of the

Cartesian coordinate system to be the look direction

of the cameras. However, our method can be easily

extended to other camera projection models.

The perspectiveprojection model follows the prin-

ciples of a pinhole camera. It assumes all rays to pass

through an inﬁnitesimal hole at the camera’s optical

center. A camera’s geometry and lenses form a ﬁeld

of view

characterized by an intrinsic camera matrix K ∈

2×2

(Hartley and Zisserman, 2003; Lenz and Tsai,

1988; Zhang, 2000; Scaramuzza, 2008):

K =





0 c

0 k

0 0 1





(1)

ICPRAM 2017 - 6th International Conference on Pattern Recognition Applications and Methods

628

with (k

, k

)

∈ IR

the camera constants in both x and

y direction of the pixel grid and (c

, c

)

∈ IR

the

camera’s principal point. Matrix K is used to describe

the projection of 3-D points to the image plane (2-

D). In case of a virtual camera, the imaging properties

for the cameras have to be deﬁned adequately using a

camera conﬁguration space. Figure 2 shows the geo-

metric properties of a pinhole camera. The deﬁnition

contains ﬁve parameters, which are u, b, l, r, d ∈ IR. It

is reasonable to assume the camera’s view frustum to

be symmetric (l = r and u = b). This assumption is

incorporated in many camera calibration techniques

as well (e. g. (Tsai, 1987; Lenz and Tsai, 1988)). The

trigonometric relations for the camera opening angles

α (horizontal) and β (vertical) are:

α = 2· atan



r· d

−1



β = 2 · atan



u· d

−1



(2)

In combination with the virtual camera’s target

image resolution (w

, h

)

∈ IN

, the imaging pro-

cess of the virtual camera can be described. The as-

pect ratio of a pixel is assumed to be 1 : 1 (quadratic)

in the following paragraphs.

The ratio between w

and h

is dependent of the

ratio between α and β. Therefore, the conﬁguration

space can be narrowed down to:

conf

(V) = (w

, h

, α)

(3)

Using this camera deﬁnition, a perspective projec-

tion matrix F

∈ IP

2×2

can be formulated with F

F(conf

(V)) as a diagonal matrix:





(tan(0.5α))

−1

0 0

0 w

tan(0.5α))

−1

0 0 1





(4)

This matrix is used for the perspective projection

of the 3-D points into the virtual camera’s image. An

orthographic projection follows a simple idea of pro-

jection. The 3-D points are transformed onto the im-

age plane using orthogonal projection (no perspec-

tive). The principle behind an orthographic projec-

tion is shown in Figure 3. In general, four parameters

are needed for the deﬁnition of the camera Figure 2:

u, b, l, r ∈ IR. The same assumptions concerning the

symmetric properties as above can be made. Assum-

Figure 2: Parametrization of a pinhole camera.

Figure 3: Parametrization of an orthographic camera.

ing a target pixel aspect ratio of 1 (square), the camera

is dependent on a scalar factor s ∈ IR. As no perspec-

tive parameter is included in an orthographic projec-

tion, the conﬁguration is:

conf

(V) = (w

, h

, s)

(5)

The corresponding matrix for orthographic projection

∈ IP

2×2

can be deﬁned using O

= O(conf

(V)):





2(sw

)

−1

0 0

0 2(sh

)

−1

0 0 1





(6)

5 VIRTUAL CAMERA

TRANSFORMATION

The position and orientation of an object in a 3-D

space deﬁned by a Cartesian coordinate system can

be described using a translation and a rotation rela-

tive to the orthonormal bases of the coordinate sys-

tem. The components can be combined to a so called

pose a ∈ IR× H , a = (t, φ), with t ∈ IR

, t = (t

, t

)

the translation and φ ∈ H, φ = φ

+ i·φ

+ j·φ

+ k· φ

the rotation as a unit quaternion. The pose deﬁnition

is used to describe a rigid body transform between

two cameras C

and C

Given a vector b ∈ IR

, the vector transformed to

homogenous coordinates is represented by

b ∈ IP

For readability reasons, we use this notation in the

following paragraphs and perform some implicit con-

versions between vectors and their homogenous rep-

resentation.

Now let C

be a calibrated camera with dispari-

ties available. These disparities are used for the re-

projection to 3-D space. The goal is to create a per-

spectively correct view of virtual camera V: Given

a set of points P





∈ IR



in the coor-

dinate system of C

. The transition to the coordi-

nate system of a virtual camera V, whereby pose a

is the transition from C

to V, is deﬁned as a function





→





= ϒ

) ϒ

(P) = {υ

(p)|∀p ∈ P} (7)

Perspectively Correct Construction of Virtual Views

629

with υ

(p) : IR

→ IR

the transformation of a single

vector from C

to V. Let θ : IR

× H → IP

3×3

com-

pute the transformation matrix according to pose a.

Function υ

(p) is deﬁned as:

(p) = θ(a) · ˜p (8)

The projection onto the image plane of camera V

is described by function Γ





→





= Γ

) Γ

) = {γ

(p)|∀p ∈ P

}

(9)

with γ

: IR

→ IR

the projection function for camera

V for a single point p and Q





∈ IR



a set

of points on the image plane of V. Let λ

describe

the projection matrix for C, which is dependent on

the desired projection:

(p) = λ

· p (10)

= Z

(

perspective projection

orthographic projection

(11)

with Z

∈ IP

2×2

the transformation matrix to pixel co-

ordinates with respect to the image resolution of V:





0.5w

0 0.5w

0 0.5h

0.5h

0 0 1





(12)

In case of a perspective projection, K

= Z

· F

holds. The resulting transform from the 3-D points

to the view of V is:

= Γ

◦ ϒ

) (13)

The steps above perform the transformation from one

camera into the view of a virtual camera. However, it

might be desirable to transform multiple source cam-

eras to one virtual view. Our approach can be ex-

tended to work with multiple source cameras: Given a

set of cameras {C

, ..., C

} with n∈ IN

and the corre-

sponding set of poses {a

, ..., a

}, the view of virtual

camera V is deﬁned as:

= Γ

[

i=1

)

(14)

Using the transformations described, the view of a

virtual camera can be computed out of one or more

cameras, when depth/disparity data is available.

6 TEST RESULTS

The virtual camera transformation method

proposed was tested using public available

datasets from the Middlebury Stereo Datasets

(a) original image

(Scharstein et al., 2014)

(b) orthographic virtual

view

(Hartley and Zisserman,

2003)

(d) orthographic virtual

top view

Figure 4: Result for the dataset “playtable”.

Scharstein et al. (Scharstein et al., 2014). We show

raw results which were computed using the method

we propose. This means that no interpolation for

missing parts of the virtual image is performed, as

we regard this a post-processing matter which would

interfere with the projection quality. In a ﬁrst test

step, the goal is to just alter the projection model of

the images. An orthographic projection matrix was

conﬁgured while the relative pose a was set to the

zero-pose (identity). Figure 4.b shows an example

for the orthographic projection. In the second run, a

virtual view is computed. As the datasets unfortu-

nately do not contain the cameras’ world positions,

pose a had to be manually estimated. Nevertheless,

the projection properties are shown adequately.

The virtual top views use an orthographic projec-

tion and show the transformation to a bird’s view per-

spective. Of course, both perspective and orthogonal

projections were investigated. An example result for

an orthographic top view is giventogether with a clas-

sically computed homography top view in Figure 4.

The conﬁgured virtual camera pose is shifted and ro-

tated by 60

◦

which of course leads to a sparse pixel

density in the areas hidden in the original view. In

comparison to the heavily distorted homography top

view, we consider the transformation as proposed in

this publication to be superior. More results computed

using differentdatasets – including shifted and rotated

virtual perspective views – are shown in Figure 5.

7 CONCLUSION

We present a solution for the computation of virtual

camera views using stereo data. Using depth data

ICPRAM 2017 - 6th International Conference on Pattern Recognition Applications and Methods

630

original image (Scharstein et al., 2014) orthographic projection virtual camera view

dataset “pipes” – right: orthographic top view

dataset “recycle” – right: orthographic top view

dataset “piano” – right: perspective projection from different camera pose

dataset “shelves” – right: perspective view from different camera pose

Figure 5: Example transformation results for datasets form the Middlebury Stereo Datasets.

gained from stereo correspondences, it is possible to

create perspectively correct images from other cam-

era poses and conﬁgurations. The resulting images

can be used in various appliances. The density of the

pixels in the computed images is dependent of pose

a, so that virtual cameras close to the original camera

Perspectively Correct Construction of Virtual Views

631

will lead to (optical) better results. The incorporation

of adequate interpolation technologies, such as image

inpainting, can improve the quality of the images. As

stated in section 6, this is not the current goal of our

method, but subject of our current research.

REFERENCES

Bay, H., Tuytelaars, T., and Gool, L. V. (2006). SURF :

Speeded Up Robust Features. In European Confer-

ence on Computer Vision, pages 404–417.

Einecke, N. and Eggert, J. (2015). A Multi-Block-Matching

Approach for Stereo. In Intelligent Vehicles Sympo-

sium, pages 585–592.

Geiger, A., Lenz, P., and Urtasun, R. (2012). Are we ready

for Autonomous Driving? The KITTI Vision Bench-

mark Suite. In Computer Vision and Pattern Recogni-

tion, pages 3354–3361.

Geyer, C. and Daniilidis, K. (2001). Catadioptric projective

geometry. International Journal of Computer Vision,

45(3):223–243.

Grimson, W. E. (1985). Computational experiments with a

feature based stereo algorithm. Transactions on Pat-

tern Analysis and Machine Intelligence, 7(1):17–34.

Hartley, R. and Zisserman, A. (2003). Multiple View Geom-

etry in Computer Vision. Cambridge University Press,

New York, NY, USA, 2 edition.

Hermann, S. and Klette, R. (2013). Iterative semi-global

matching for robust driver assistance systems. In

Asian Conference on Computer Vision, pages 465–

478.

Hirschm¨uller, H. (2008). Stereo processing by semiglobal

matching and mutual information. Transactions

on Pattern Analysis and Machine Intelligence,

30(2):328–341.

Horaud, R. and Skordas, T. (1989). Stereo correspondence

through feature grouping and maximal cliques. Trans-

actions on Pattern Analysis and Machine Intelligence,

11(11):1168–1180.

Laveau, S. and Faugeras, O. (1994). 3-D scene representa-

tion as a collection of images. International Confer-

ence on Pattern Recognition, 1.

Lenz, R. K. and Tsai, R. Y. (1988). Techniques for cali-

bration of the scale factor and image center for high

accuracy 3-D machine vision metrology. Transac-

tions on Pattern Analysis and Machine Intelligence,

10(5):713–720.

Li, S. and Hai, Y. (2011). Easy calibration of a blind-spot-

free ﬁsheye camera system using a scene of a parking

space. Transactions on Intelligent Transportation Sys-

tems, 12(1):232–242.

Liu, Y. C., Lin, K. Y., and Chen, Y. S. (2008). Bird’s-eye

view vision system for vehicle surrounding monitor-

ing. LNCS, 4931:207–218.

Lowe, D. G. (2004). Distinctive image features from scale-

invariant keypoints. International Journal of Com-

puter Vision, 60(2):91–110.

Pantilie, C. D. and Nedevschi, S. (2012). SORT-SGM: Sub-

pixel Optimized Real-Time Semiglobal Matching for

Intelligent Vehicles. Transactions on Vehicular Tech-

nology, 61(3):1032–1042.

Pfeiffer, D., Gehrig, S., and Schneider, N. (2013). Exploit-

ing the power of stereo conﬁdences. Computer Vision

and Pattern Recognition, pages 297–304.

Rublee, E., Rabaud, V., Konolige, K., and Bradski, G.

(2011). ORB - an efﬁcient alternative to SIFT or

SURF. In International Conference on Computer Vi-

sion, pages 2564–2571.

Sato, T., Moro, A., Sugahara, A., Tasaki, T., Yamashita,

A., and Asama, H. (2013). Spatio-temporal bird’s-eye

view images using multiple ﬁsh-eye cameras. Interna-

tional Symposium on System Integration, pages 753–

758.

Scaramuzza, D. (2008). Ominidirectional vision: from cal-

ibration to robot estimation. PhD thesis, Citeseer.

Scharstein, D., Hirschm¨uller, H., Kitajima, Y., Krath-

wohl, G., Neˇsi´c, N., Wang, X., and Westling,

P. (2014). High-Resolution Stereo Datasets with

Subpixel-Accurate Ground Truth. German Confer-

ence on Pattern Recognition, 1(2):31–42.

Scharwachter, T., Schuler, M., and Franke, U. (2014). Vi-

sual guard rail detection for advanced highway assis-

tance systems. Intelligent Vehicles Symposium, pages

900–905.

Shum, H.-Y. and Kang, S. B. (2000). A review of image-

based rendering techniques. Proc. SPIE Visual Com-

munications and Image Processing, pages 2–13.

Thomas, B., Chithambaran, R., Picard, Y., and Cougnard,

C. (2011). Development of a cost effective bird’s eye

view parking assistance system. Recent Advances in

Intelligent Computational Systems, pages 461–466.

Tsai, R. Y. (1987). A Versatile Camera Calibration Tech-

nique for High-Accuracy 3D Machine Vision Metrol-

ogy Using Off-the-Shelf TV Cameras and Lenses.

Journal on Robotics and Automation, 3(4):323–344.

Vincent, E. and Lagani`ere, R. (2001). Detecting planar ho-

mographies in an image pair. International Sympo-

sium on Image and Signal Processing and Analysis,

0(2):182–187.

Vogt, F., Kr¨uger, S., Schmidt, J., Paulus, D., Niemann, H.,

Hohenberger, W., and Schick, C. H. (2004). Light

ﬁelds for minimal invasive surgery using an endo-

scope positioning robot. Methods of information in

medicine, 43(4):403–408.

Zhang, Z. (2000). A ﬂexible new technique for camera cal-

ibration. Transactions on Pattern Analysis and Ma-

chine Intelligence, 22(11):1330–1334.

Zinger, S., Do, L., and De With, P. H. N. (2010). Free-

viewpoint depth image based rendering. Journal of Vi-

sual Communication and Image Representation, 21(5-

6):533–541.

ICPRAM 2017 - 6th International Conference on Pattern Recognition Applications and Methods

632