Light Field Rendering for Head Mounted Displays using Pixel

Reprojection

Anne Juhler Hansen, J

akup Klein and Martin Kraus

Department of Architecture, Design & Media Technology, Aalborg University, Rendsburggade 14, 9000 Aalborg, Denmark

Keywords:

Light Field, Pixel Reprojection, Shader Programming, Real-time Rendering.

Abstract:

Light ﬁeld displays have advantages over traditional stereoscopic head mounted displays, for example, because

they can overcome the vergence-accommodation conﬂict. However, rendering light ﬁelds can be a heavy task

for computers due to the number of images that have to be rendered. Since much of the information of the

different images is redundant, we use pixel reprojection from the corner cameras to compute the remaining

images in the light ﬁeld. We compare the reprojected images with directly rendered images in a user test. In

most cases, the users were unable to distinguish the images. In extreme cases, the reprojection approach is not

capable of creating the light ﬁeld. We conclude that pixel reprojection is a feasible method for rendering light

ﬁelds as far as quality of perspective and diffuse shading is concerned, but render time needs to be reduced to

make the method practical.

1 INTRODUCTION

Development of head mounted displays (HMDs) has

evolved signiﬁcantly during the last years, especially

considering the consumer markets and consumers’

use of HMDs, e.g., Oculus Rift, HTC Vive, Sony

PlayStation VR, etc. One of the shortcomings and

challenges of traditional HMDs is the lack of 3-

dimensional cues, hereunder the parallax effect for

small eye movement and correct eye accommoda-

tion. The vergence-accommodation conﬂict has been

under suspicion of causing visual fatigue, eyestrain,

diplopic vision, headaches, and other signs of simula-

tion sickness (Hoffman et al., 2008).

In the future, it might be possible to eliminate vi-

sual discomfort and nausea since a light ﬁeld display

can provide correct retinal blur, parallax and eye ac-

commodation, which may balance out some of the

conﬂicting cues which are experienced with tradi-

tional HMDs. A light ﬁeld display allows an observer

to perceive a scene at different depths and angles by

placing a distance-adjusted array of microlenses in

front of a display.

When rendering for a light ﬁeld display with mi-

crolenslets, several 2D subimages have to be rendered

from different views, as seen from an array of differ-

ent cameras. We hypothesize that we can compute all

views from only four rendered cameras using pixel

reprojection instead of rendering the whole array of

virtual cameras. This paper investigates the feasibil-

ity of this approach to create light ﬁeld renderings,

and explores its beneﬁts and shortcomings.

A head-mounted light ﬁeld display has been built

and implemented, and a user evaluation of the light

ﬁeld images has been conducted. The goal of the ex-

periment is to ﬁnd out if users are able to perceive a

difference in the light ﬁeld images created with the

two different methods; rendering of the full array of

cameras and pixel reprojection from images of only

four cameras.

Our contributions are:

• We propose a method to render light ﬁelds using

pixel reprojection that reduces the number of vir-

tual cameras.

• We evaluated whether or not test subjects are able

to notice a difference in the image quality with the

use of a two-interval forced choice test.

2 RELATED WORK

2.1 Vergence-Accommodation Conﬂict

Vergence and accommodation are parameters that in-

ﬂuence our perception of depth and focus. In real-

ity, the human ocular system will adapt when focus

Juhler Hansen A., Klein J. and Kraus M.

Light Field Rendering for Head Mounted Displays using Pixel Reprojection.

DOI: 10.5220/0006091100270036

In Proceedings of the 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2017), pages 27-36

ISBN: 978-989-758-224-0

is changed between different distances, such that the

point of interest remains binocularly fused.

Accommodation refers to the physical shape of

the lens of the eye, where the eye adapts optical power

to maintain a clear focused image. The vergence

mechanism continually adjusts the angle between the

two eyes such that features at the focus distance re-

main fused in the binocular vision.

A change in visual cues will affect both system;

stereo disparity drives the eyes to converge or di-

verge, and retinal blur prompts an oculomotor ac-

commodation adjustment. To further strengthening

the argument of these systems being very tightly cou-

pled, Suryakumar et al. have shown that visual dis-

parity in isolation elicits a fully comparable accom-

modation response to that of retinal blur (Suryakumar

et al., 2007). However, in traditional stereo imaging

where the depth is ﬁxed, vergence towards a differ-

ent distance elicits conﬂicting cues between the two

systems; this has been linked to discomfort (Shibata

et al., 2011), visual fatigue, and reduced visual per-

formance (Hoffman et al., 2008).

One of the consequent beneﬁts of a light ﬁeld dis-

play is that it allows natural accommodation and ver-

gence. Focusing at different distances simply deter-

mines which parts of the 2D image slices are focused

onto the retina.

2.2 The Light Field

A light ﬁeld can be described as the amount of light

travelling in every direction through every point in

space (Levoy, 2006). Light can be interpreted as a

ﬁeld because space is ﬁlled with an array of light rays

at various intensities. The 5D plenoptic function de-

scribes all light information visible from all viewing

positions, and can be explained as recording the in-

tensity of the light rays passing through the center of

a pupil placed at every possible x, y, and z in a 3-

dimensional volume, and at every angle θ and φ.

Since radiance does not change along a line un-

less it is blocked, the 5D plenoptic function can be

reduced to 4D in space free of occluders (Levoy and

Hanrahan, 1996). The 4D light ﬁeld can explain the

total light intensity of each ray as a function of posi-

tion and direction.

2.3 Head-Mounted Light Field Displays

Head-Mounted Displays (HMDs) are still struggling

with being heavy and having big and bulky optics,

and most traditional HMDs do not account for the

vergence-accommodation conﬂict (Rolland and Hua,

2005). Since light ﬁelds consist of more informa-

tion than usual 2D images, light ﬁelds can improve

on some of the limitations of traditional ﬁxed-focus

HMDs.

A light ﬁeld can be optically reconstructed by

placing a distance-adjusted array of microlenses in

front of a display (see Figure 1). This is known as

a light ﬁeld display. A light ﬁeld display allows an

observer to integrate a correct 2D image of the light

ﬁeld at different depths and angles in accordance with

the spatial and depth resolution that the light ﬁeld con-

tains. In other words, the light ﬁeld display allows an

observer to accommodate and converge his/her eyes

on a virtual object as if it were part of the real world.

The image seen through a light ﬁeld display has focus

cues, where the convergence point is the point in fo-

cus, and the rest of the image appears blurred just like

in the real world.

With the beneﬁts from using microlenslet arrays in

HMDs, Lanman and Luebke have shown that a light

ﬁeld display can be integrated into an HMD, which

can both minimize the size of HMDs and potentially

allow for much more immersive VR solutions com-

pared to the ﬁxed focus displays used in most com-

mon HMDs (Lanman and Luebke, 2013).

2.4 Light Field Rendering

One of the ﬁrst times light ﬁelds were introduced

into computer graphics was by Levoy et al. in 1996,

where they used image based rendering to compute

new views of a scene from pre-existing views without

the need for scene geometry (Levoy and Hanrahan,

1996). The technique showed a real-time view of the

light ﬁeld, where it was possible to see a scene with

correct perspective and shading, and with the option

of zooming in and out. When zooming in, the light

samples disperse throughout the array of 2D slices, so

the perceived image is constructed from pieces from

several elemental images.

Reducing complexity is highly desired when

working with light ﬁelds, and (re)construction of

overlapping views is a good place to start, since this

is where the light ﬁeld contains a lot of redundant in-

formation. Much of the data is repetitive, especially

when looking at a scene placed at inﬁnity, where all

subimages are created from parallel light rays. Instead

of creating a virtual camera or capturing an individual

subimage for each elemental image, we hypothesize

that pixel reprojection can be used to reduce the com-

putational effort.

GRAPP 2017 - International Conference on Computer Graphics Theory and Applications

Figure 1: An observer sees the light ﬁeld by looking through a microlens array in front of a screen, where each lens covers

one subimage. Rays from different subimages enter the pupil of the observer, and the light ﬁeld is experienced as one image,

where the light samples disperse throughout the array of subimages. When focus (vergence and accommodation) is changed,

the perceived image will be constructed from rays from other subimages.

2.5 Pixel Reprojection

Pixel reprojection is about reprojecting data (e.g.

color values) from one image to another. Geomet-

rically valid pixel reprojection techniques have been

studied by Kang, who stated that “if the depth value

at each pixel is known, then the change in location of

that pixel is constrained in a predictable way” (Kang,

1998). The traditional approach for generating virtual

views of a scene is to render a 3D model, but with

image-based rendering techniques new views can be

created using pixel reprojection from source images

onto the target image, and thereby the method does

not need to process the geometry of the 3D models,

and the cost of rendering is independent of the scene

complexity.

Pixel reprojection can be used when rendering

video, where data reprojection can exploit the natu-

ral temporal coherence between consecutive frames

by caching expensive intermediate shading calcula-

tions performed at each frame, and then reuse this

data when rendering subsequent frames (Sitthi-amorn

et al., 2008). Pixel reprojection can therefore also be

used as a tool to optimize shaders since reusing data

between consecutive frames can accelerate real-time

shading (Nehab et al., 2007)(Havran et al., 2003). In

general, pixel reprojection is a useful technique in

the ﬁeld of computer graphics (Adelson and Hodges,

1995)(Tawara et al., 2004).

3 IMPLEMENTATION AND

METHODS

Our approach is to render only the four corner cam-

eras of the subimage array, and then use these four

views in order to create all subimages of the light

ﬁeld. We implement all the computations of the

subimages of the light ﬁeld with the use of pixel

reprojection, while maintaining correct perspective

and diffuse shading, and investigated where short-

comings of the pixel reprojection occured. The light

ﬁeld display was implemented with microlenslets in a

setup that was inspired by Lanman and Luebke 2013.

3.1 Rendering the Light Field

Using the Unity engine, we render a virtual image for

every lenslet that is within the bounds of the microdis-

play, so the light ﬁeld will be perceived as one holo-

graphic image with focus cues. Each subimage (or

elemental image) is rendered to a portion of the mi-

crodisplay; optimally 15mm × 8mm out of 15.36mm

× 8.64mm to utilise as much of the spatial resolution

as possible.

Since the perceived image is constructed from

pieces from several subimages, we need to render all

these subimages in an array corresponding to the di-

mensions of our microlens array. The secure and reli-

able solution would be to render 15 × 8 different vir-

tual cameras, where each camera has the same align-

ment as the lenslets. We refer to this as a light ﬁeld

Light Field Rendering for Head Mounted Displays using Pixel Reprojection

image created with virtual cameras. We consider this

the gold standard and compare our approach to this

method in tests. As previously mentioned, our ap-

proach is to render only the four corner cameras of

the subimage array, and then use pixel reprojection to

create the subimages in-between the four corner cam-

eras.

Our method can be outlined by ﬁve steps:

1. First we render the four corner cameras to sep-

arate render textures. The depth is saved in the

alpha channel.

2. Then a shader calculates the in-between images

on the x-axis on the top and bottom row.

3. The x-axis result is saved to a render texture, and

from that another shader calculates the in-between

images on the y-axis.

4. Again the result is saved to a render texture, and

anti-aliasing is done using downsampling in a

third shader.

5. Lastly, the image is scaled to ﬁt the display output

by a fourth shader.

3.2 Pixel Reprojection

The subimages are computed by pixel reprojection,

where the pixels from the corner images are copied to

the corresponding place in each subimage. To achieve

this, the pixel must be placed back to the 3D world

and be “captured” to the computed subimage. Here

the view space must be projected to the image plane

that will be displayed on the screen. The input pixel

energy must be redistributed to the output pixel based

on the exact overlap between these pixels.

The transformation goes back and forward be-

tween the projection plane (the generated 2D image)

and the eye/camera space (the 3D scene with the cam-

era as the center) (see Figure 2). All in-between views

have an individual position in world space and need to

do a transformation between these spaces in order to

generate the subimages. If the projection plane for

one camera and the transformation in relation to the

other camera is known, then the pixels can be repro-

jected to the other camera. Finally the view space is

projected onto the 2D screen.

Our transformation depends on the x-coordinates

on both the projection plane, x

, and in eye space, x

as well as the depth of the projection plane n and the

z-position z

in eye space (see Equation 1).

−n

(1)

The x-coordinate in eye space, x

, is mapped to

, which is calculated by using the ratio of similar

triangles.

We need the depth information of the scene to ef-

fectively interpolate between the images. Using per-

spective projection the relation between z

and the

value that is stored in the depth buffer, z

, is non-

linear with high precision at the near plane and little

precision at the far plane.

The transformation from the projection plane to

the eye space requires the depth from the eye space.

The depth is saved from the corner cameras into the

(unused) alpha channel. It was found through experi-

mentation that a 32 bit per channel texture was sufﬁ-

cient to give accurate depth information.

Accurate depth is crucial for the pixel reprojec-

tion method to work. This also means that no anti-

aliasing can be performed on the four corner cam-

eras due to the fact that while the smoother edges of a

anti-aliased image are more esthetically pleasing, they

would no longer match the depth map, resulting in ar-

tifacts in the reprojected images. Anti-aliasing can

still be achieved by downsampling after all pixel re-

projection calculations are ﬁnished.

3.3 Filling the Gaps

There are cases where pixel reprojection does not

yield a full image, but rather an image with gaps. This

is because in some cases objects will occlude other

objects in such a way that when the camera is being

reprojected, information is missing. The effect can

be seen when a computed image is comprised of the

pixels from two corner cameras, but because of the

way the objects are placed in the scene, there are spots

where the depth and colour are unknown. The size of

the invisible “shadow” depends on the distance from

the camera to the occluding object and the distance

between the camera that captures the scene and the

pixel reprojected camera.

The problem is that the required information is not

available. One easy solution would be to include more

cameras to the scene (e.g. 5 cameras — one in the

center and one in each of the four corners — would

eliminate many cases). When the information is not

available then the holes must be ﬁlled with something

that hides them as well as possible. One could use

the pixels on the edge of the hole and use them to

ﬁll in the hole. This could however lead to visible

pixel “streaks”. Another solution is to use an average

colour of the available information. In this project the

missing pixel values were ﬁlled by using the subpixel

position where the pixel value from the corner images

were read, and then using the mean of these values,

the pixel value could be created. The result is that

the holes are ﬁlled with colours that are present in the

GRAPP 2017 - International Conference on Computer Graphics Theory and Applications

)

CamB

)

CamA

)

Projection

plane length

)

Figure 2: Pixel reprojection shown betwen two cameras (CamA and CamB). A pixel can be reprojected from a corner camera

(CamA), to a pixel reprojected camera (e.g. CamB). The cameras have coordinates in the 3-dimensional eye space, whereas

the projection plane is 2-dimensional.

scene and there are no pixel “streaks”.

3.4 Shader Programming

The pixel reprojection requires a number of calcula-

tions to be made for every pixel. Because of the large

amount of calculations and the repeatability of the

calculations, a shader would be a good tool. In this

case a Cg shader was used in the Unity engine.

There are disadvantages of using a shader for the

pixel reprojections. It would be easy to reproject a

pixel with known depth and color from a pre-rendered

camera to a reprojected camera. It is not as simple to

ﬁnd the correct reprojected pixel position in a subim-

age from the pre-rendered camera since the depth is

unknown from the reprojected camera.

Our method starts with a fragment shader that runs

in a loop over all possible pixels in the subimages

from the pre-rendered cameras. We then run a loop to

choose the best pixel match based on the pixel with

closest depth to the reprojected camera. All pixel

values in the reprojected camera start with an output

value of [0.0,0.0,0.0,2.0]. If the depth is smaller than

2.0 (which it will always be the ﬁrst time), then the

color for the actual fragment pixel position is chosen

from the camera with the lowest pixel depth.

If the scene being captured by the pre-rendered

cameras is placed at inﬁnity, then all subimages show

the same image. In this case any position on any re-

projected subimage should be ﬁlled with the colour

from a pre-rendered corner camera at the same posi-

tion. If, however, the scene contains elements placed

closer than inﬁnity to the pre-rendered cameras, then

there is a difference between the position of those pix-

els on the projection plane of the pre-rendered camera

and the pixels on the projection plane of the repro-

jected camera.

This distance is greatest at the near clipping plane

and zero at inﬁnity (see Equation 1), where x

changed in order to offset it to a new camera position,

the position x

would depend on the depth z

Knowing this and the two subimages’ positions

relative to each other, one only needs to check all pos-

sibilities. Given any reprojected pixel position, the

same position is checked in the pre-rendered camera

subimages, and then the neighbours are checked until

the maximum disparity is reached.

The variables that affect maximum disparity are

the distance to the closest object (or rather the cam-

eras’ near clipping plane), the distance between the

pre-rendered corner camera and the point where the

pixel reprojected camera is placed, and the cameras’

ﬁeld of view. None of the reprojected pixels are nec-

essarily a perfect match, but one of them should be

the closest and within 0.5 pixels. This is the maxi-

mum distance from the center of the pixel to the edge

of the pixel (if the match would be more than 0.5 off

then the best pixel match has not been found).

It is important to note that the disparity is not one-

dimensional but two-dimensional since all cameras lie

on a two dimensional plane. Here the x- and y-axis

can be calculated separately. If a “two step” approach

Light Field Rendering for Head Mounted Displays using Pixel Reprojection

is used, then one axis is calculated ﬁrst and saved to

a texture. After this step the other axis is calculated,

but it is worth mentioning that saving the render tex-

ture one time more than necessary implies extra com-

putation time. If a “one step” approach is used, then

the number of times that values have to be written to

textures is reduced. However, one needs to consider

the added calculation (and logic) necessary to ﬁnd the

correct pixel in one step. The two step approach was

used in this project due to debugging purposes where

errors were easily located because the different stages

of the program could be inspected separately.

A sub-pixel correction is needed since the subim-

ages are a result of (up to) all four corner cameras.

One pixel from one of the corner cameras will be the

best match for any given subimage pixel, but this pre-

rendered subimage pixel will not necessarily match

up to the reprojected pixel being calculated. The off-

set is slightly different for each pre-rendered camera

due to their different corner positions. The closest

pixel match is up to 0.5 pixel off, and the offset can

be calculated, resulting in a position that is between

pixels in the pre-rendered subimage. Linear interpo-

lation is used on these pixels giving a color value for

the reprojected subimage pixel. In our setup this point

can be between two pixels (either on the x-axis or on

the y-axis).

The pixel values lend themselves quite well to in-

terpolation, but this is not the case with the depth map.

The depth map can easily be interpolated on surfaces.

The edges of objects are, however, being smudged if

the difference of the neighboring pixel depths is large.

An example would be a scene with an object rel-

atively far from inﬁnity. This would result in three

neighbouring pixels where one has the depth of the

background, one has the depth of an object, and the

middle pixel has a depth that is somewhere in be-

tween. The simple solution is not to interpolate the

depth while interpolating the colour values. This re-

sults in an image where the pixel values and depth

values do not match completely but are quite close

to correct. The downside is that the pixel values are

interpolated while the depth is not, meaning that the

edge of an object can go beyond the edge in the depth

map, effectively spilling colour to the neighbouring

objects.

The computation of the pixel reprojection was

split into ﬁve steps (see Section 3.1). The ﬁrst step

saves the depth to the alpha channel, and secondly

the reprojected subimages on the x-axis between the

pre-rendered corner images are computed (see Fig-

ure 3). The result was an image where the top and

bottom row were ﬁlled with reprojected subimages.

The rest of the image was ﬁlled with the mean of the

colour values from the corner cameras since the mean

value is a better guess for the pixel colors compared

to missing the information completely. The next step

computed the values from the two rows of subimages

and thus ﬁlled the remainder of the image with repro-

jected subimages in the y-axis. Because the x-axis

and y-axis is computed in two steps special care had

to be taken to avoid looking beyond the boundaries

of the reprojected images. First we check that the

position is within the image, after this step the sub

pixel correction is performed. This process utilises

linear interpolation of two pixels, and if this interpo-

lation is performed sufﬁciently close to the edge of

the image, then the pixel that resides in the image

is interpolated with a pixel that is outside the image

boundary. The problem was solved by clamping all

textures. Clamping means that the edge pixels will

be repeated beyond the boundaries of the image. The

complete image was rendered in four times the resolu-

tion of the screen, and then downsampled to achieve

anti-aliasing. The last step after downsampling was

scaling. The reason for this step is that the previous

steps are relying on that each subimage has a resolu-

tion in whole pixels. This is, however, not the case for

our use as one millimeter on the screen of the HMD

occupies ≈ 83

pixels/mm.

4 EXPERIMENT

When looking at the image difference a complete

pixel match will be shown as black [0], and since

the pixel differences is normalized the image differ-

ence will therefore be in the range [0;1]. We can see

that our method has a small image difference, and the

difference is largest around the edges of objects (see

Figure 4), and/or when we have occlusion and data

simply is not available. We can also see small pixel

value differences in textures, but in general we have

many black or dark pixels, and thereby a good pixel

match.

4.1 User Test

This experiment aims at statistically comparing if

subjects can discriminate between the images cre-

ated with 120 virtual cameras (VC) in the Unity en-

gine, and the image created with our pixel reprojec-

tion method (PR). The 120 camera image was created

by capturing the camera views to individual render

textures and combining them to a larger render texture

that would ﬁt 15 × 8 subimages. These were down-

sampled and scaled to the appropriate screen size. Es-

sentially the same method as when using pixel repro-

GRAPP 2017 - International Conference on Computer Graphics Theory and Applications

Figure 3: Pixel reprojection ﬂowchart.

jection just without the pixel reprojection. 5 differ-

ent scenes were tested in a total of 10 different im-

age tests; 5 shown with the light ﬁeld display and 5

single-images (position [4;8] out of [8;15]) shown on

a computer monitor.

The 5 different scenes were designed to test dif-

ferent rendering scenarios, and how the difference in

geometry inﬂuence our rendering method. The differ-

ent scenes include various numbers of objects, shapes,

sizes and textures (see Figure 5):

1. Scene with many objects occluding each other

2. Scene with few objects occluding each other

3. Scene with several object occluding each other

4. Scene with curved texture

5. Scene with occlusion (objects 12 cm away from

the camera).

Image 5 was intentionally designed to fail the test.

The image was created to push the boundaries of the

method with the presumption that the test participants

were able to notice a difference between PR and VC.

Based on work by Cunning and Wallraven 2011, we

wanted to avoid participant frustration, where partici-

pants get frustrated and start answering randomly be-

cause they are never sure if their answers are correct

or not.

4.2 Test Setup

The test took place at an experimentarium targeted

at young children but accompanied by adults. Both

adults and children took part in the test.

Results from 34 test participants are in the experi-

ment, since some samples have been removed due to

test participants having bad sight. Since the objects

in the scenes were within 12 cm to 6 m, test partic-

ipants with nearsightedness were ﬁt, but participants

with farsightedness would bias the results, since far-

sightedness does not allow participants to accommo-

date on objects that are close. Therefore most test

participants had normal vision or corrected to normal

vision, and a few test participants were in the range

-1.75 to +0.50 (glasses/contact lenses strength), but

did not have corrected to normal vision. All samples

were independent from 16 female and 18 male partic-

ipants with their age ranging from 9 to 67 years (some

participants refused to disclose their age).

4.3 Two-Interval Forced Choice Test

A forced choice test is one that requires the test par-

ticipants to identify a stimulus by choosing between a

ﬁnite number of alternatives. We chose the 2-interval

forced choice test where test participants must choose

one of two alternatives with no neutral alternatives

listed.

Light Field Rendering for Head Mounted Displays using Pixel Reprojection

Figure 4: Example of the short comings of our pixel reprojectionmethod; edges and textures can have a small pixel value

difference. NOTICE: The contrast and brigthness in “Image 1, Difference” has been drastically enhanced for printing.

The test participants were asked to solve several

matching-to-sample tasks, where the standard stimu-

lus (the sample or reference) is shown together with

two other stimuli (the comparison stimuli), and then

the test participants are requested to choose the com-

parison stimulus that most closely matches the refer-

ence.

The experiment was conducted as a delayed

matching-to-sample, where the test participants were

ﬁrst shown a reference image, and then after the sam-

ple was removed two stimuli were presented sequen-

tially. The inter-stimulus interval (ISI), which is the

break between two stimuli, was 250 ms to help pre-

vent temporal integration and masking effects. The

time spent looking at each stimulus, the inter-trial in-

terval (ITI), was longer than the ISI (Cunningham and

Wallraven, 2011).

With two possible choices shown sequentially

this is referred to as a two-interval forced choice

(2-IFC) procedure. If the test subjects can do no

better than a random guess, then the test has been

passed, meaning that we can conclude that the test

participants experience no difference between VC

and PR.

The 2-IFC tasks are:

1. The reference image is shown.

2. Two visual stimuli are presented in random order

(reference stimulus and the two comparison stim-

uli can be revisited as many times as the test par-

ticipant desires).

3. The test participant chooses one of the two visual

stimuli.

This test is passed if the probability for test par-

ticipants to incorrectly identify PR as VC is greater

than 19% with a conﬁdence level of 95%. This corre-

sponds to the commonly used threshold of test partic-

ipants guessing incorrectly minimum 25% of at least

100 trials and complies with true hypothesis testing

where the probability of incorrectly rejecting the null

hypothesis is less than 5% (McKee et al., 1985).

The probability mass function for the number i of

incorrect answers is (Borg et al., 2012):

f (i|n, p

null

) =

x!(n − i)!

(1 − p

null

)

n−x

(2)

where p

null

is the probability of PR incorrectly

identiﬁed as VC, i is the number of incorrect answers

and n is the number of trials. From the probability

mass function we can ﬁnd the critical number i

which

is the minimum amount of test participants that need

to incorrectly identify the PR image to be the best

match to the reference image (VC) (Borg et al., 2012):

(n, p

null

) = min{i|

∑

j=i

f ( j; n, p

null

) < 0.05} (3)

With 34 test participant the critical number i

= 11.

5 RESULTS AND ANALYSIS

The results from the experiment show that with i

11 at least 11 of the test participants have to choose

our image, PR, to match the reference image, VC, in

order for the test to be passed. With 10 different im-

age tests (5 shown with the light ﬁeld display and 5

on a computer monitor) we see that image 1-4 passed

the test (see Figure 6).

When looking closely at the images (see Figure 5),

we can ﬁnd small mistakes in the PR images, and it is

especially easy to notice the difference between VC

and PR in image 5. Image 5 was designed to show the

inadequacy of our method. Holes in the pixel repro-

jected images are created when occluded objects need

to be shown on the screen. When this happens, no in-

formation is available and therefore a hole appears.

We expect our test participants to notice the differ-

ence in PR and therefore will choose VC to match the

reference image (VC).

GRAPP 2017 - International Conference on Computer Graphics Theory and Applications

Figure 5: Image samples from the center virtual camera of the ﬁve different scenes (position [4, 8] out of [8, 15]). When

looking closely we can see small mistakes in the PR images, and especially Image 5 shows a large difference between VC

and PR.

In our setup with 15 × 8 cameras (120 cam-

eras total) and a inter-camera-distance matching our

microlenslets (1mm×1mm) we were pressing the

boundaries of the method when objects were only 12

cm away from the cameras. Areas (holes) that are in-

visible to the corner cameras become larger when the

objects are close to the camera, but small problems

can occur at any distance. With higher disparity, the

occlusion will also be more extreme.

With only 6 test participants choosing PR for im-

age 5 the critical number i

= 11 was not reached and

therefore the tests failed. We can therefore conclude

that our method is inefﬁcient when participants are

able to notice a difference in the images because of

missing information creating large holes, but for im-

age 1-4 the test participants did not see a difference in

image quality.

6 CONCLUSIONS

Our approach was to render only the four corner

cameras of the subimage array, and then compute

all subimages of the light ﬁeld using pixel reprojec-

tion. We have implemented the pixel reprojection

method, while maintaining correct perspective and

diffuse shading, and investigated where shortcomings

of the method occur.

Four out of ﬁve images passed the test, meaning

that test participants were not able to notice a differ-

ence between the PR and VC images (image 5 was

deliberately designed to fail the test in order to ﬁnd

the shortcomings of the pixel reprojection method).

The results were applicable for both images rendered

for a light ﬁeld display and for a computer monitor.

The worst shortcoming of our pixel reprojection

method is gaps due to missing information. Since

our subimages are created only from the corner cam-

eras, our in-between views will have holes whenever

the corner cameras have invisible points, but imple-

menting an extra virtual camera for these cases can

reduce the problem. It is also worth noting that the

pixel reprojection method does not create good results

when rendering scenes with transparency or view-

dependent shading.

We can conclude that pixel reprojection can be

used to lower the amount of cameras needed to ren-

der the 4D light ﬁeld.

7 FUTURE WORK

Future development would require higher resolution

displays, but we expect that our pixel reprojection

method is applicable to higher resolution images.

With a pixel offset error of maximum 0.5 px, the pixel

error percentage will only decrease with higher reso-

lution images.

We have shown that the pixel reprojection method

creates acceptable images for light ﬁeld renderings,

but the method needs optimization before being ap-

plicable in real-time scenarios. The performance test

showed that the framerate (≈ 5.35fps) is far from us-

able, and needs to be drastically optimized before be-

ing useful.

Light Field Rendering for Head Mounted Displays using Pixel Reprojection

Figure 6: When 11 or more test participants choose PR, we can conclude that the test participants can do no better than a

random guess, and therefore that they do not see a differenec between VC and PR.

REFERENCES

Adelson, S. J. and Hodges, L. F. (1995). Generating ex-

act ray-traced animation frames by reprojection. IEEE

Computer Graphics and Applications, 15(3):43–52.

Borg, M., Johansen, S. S., Thomsen, D. L., and Kraus, M.

(2012). Practical implementation of a graphics turing

test. In International Symposium on Visual Comput-

ing, pages 305–313. Springer.

Cunningham, D. W. and Wallraven, C. (2011). Experimen-

tal design: From user studies to psychophysics. CRC

Press.

Havran, V., Damez, C., Myszkowski, K., and Seidel, H.-

P. (2003). An efﬁcient spatio-temporal architecture

for animation rendering. In ACM SIGGRAPH 2003

Sketches & Applications, pages 1–1. ACM.

Hoffman, D. M., Girshick, A. R., Akeley, K., and Banks,

M. S. (2008). Vergence–accommodation conﬂicts hin-

der visual performance and cause visual fatigue. Jour-

nal of vision, 8(3):33.

Kang, S. B. (1998). Geometrically valid pixel reprojection

methods for novel view synthesis. ISPRS journal of

photogrammetry and remote sensing, 53(6):342–353.

Lanman, D. and Luebke, D. (2013). Near-eye light ﬁeld

displays. ACM Transactions on Graphics (TOG),

32(6):220.

Levoy, M. (2006). Light ﬁelds and computational imaging.

IEEE Computer, 39(8):46–55.

Levoy, M. and Hanrahan, P. (1996). Light ﬁeld rendering. In

Proceedings of the 23rd annual conference on Com-

puter graphics and interactive techniques, pages 31–

42. ACM.

McKee, S. P., Klein, S. A., and Teller, D. Y. (1985). Statis-

tical properties of forced-choice psychometric func-

tions: Implications of probit analysis. Perception &

Psychophysics, 37(4):286–298.

Nehab, D., Sander, P. V., Lawrence, J., Tatarchuk, N., and

Isidoro, J. R. (2007). Accelerating real-time shading

with reverse reprojection caching. In Graphics hard-

ware, volume 41, pages 61–62.

Rolland, J. and Hua, H. (2005). Head-mounted display sys-

tems. Encyclopedia of optical engineering, pages 1–

13.

Shibata, T., Kim, J., Hoffman, D. M., and Banks, M. S.

(2011). Visual discomfort with stereo displays: Ef-

fects of viewing distance and direction of vergence-

accommodation conﬂict. In IS&T/SPIE Electronic

Imaging, pages 78630P–78630P. International Society

for Optics and Photonics.

Sitthi-amorn, P., Lawrence, J., Yang, L., Sander, P. V., Ne-

hab, D., and Xi, J. (2008). Automated reprojection-

based pixel shader optimization. ACM Transactions

on Graphics (TOG), 27(5):127.

Suryakumar, R., Meyers, J. P., Irving, E. L., and Bobier,

W. R. (2007). Vergence accommodation and monoc-

ular closed loop blur accommodation have similar dy-

namic characteristics. Vision research, 47(3):327–

337.

Tawara, T., Myszkowski, K., and Seidel, H.-P. (2004). Ex-

ploiting temporal coherence in ﬁnal gathering for dy-

namic scenes. In Computer Graphics International,

2004. Proceedings, pages 110–119. IEEE.

GRAPP 2017 - International Conference on Computer Graphics Theory and Applications