AN AUGMENTED REALITY SYSTEM BASED ON LIGHT FIELDS

A. Quintana

, R. Quir

, I. Remolar

and E. Camahort

Institute of New Image Technologies, Universitat Jaume I, Castel

on, Spain

Dept. de Sistemas Inform

aticos y Computaci

on, U. Polit

ecnica de Valencia, Valencia, Spain

Keywords:

Augmented Reality, Light Fields.

Abstract:

The development of augmented reality systems that combine virtual elements with the real world is currently

increasing. This paper presents an augmented reality system that renders the virtual elements from models

based on the light ﬁelds. The use of these models allow us to obtain higher level of photorealism than rep-

resentations that render polygons. Using light ﬁelds also allows us to keep constant the rendering time. The

presented system has been implemented using the Open Source library ARToolKit and a spherical model of

the light ﬁeld based on the direction-and-point parameterization (DPP) with some associated depth informa-

tion. The system has been validated using different light ﬁelds, and it has been compared its performance with

a classic version of the ARToolKit library based on VRML ﬁles. The presented augmented reality system can

be applied to the visual inspection of synthetic objects of great complexity or based on real images.

1 INTRODUCTION

Augmented Reality (AR) is a technique that extends

the real-world environments with some virtual objects

that appear in the user views. AR allows 3-D virtual

objects to be integrated into a 3-D real environment

in real time. Its application is very common in some

ﬁelds such as computer-assisted surgery, education or

entertainment (Azuma, 1997). The main problem of

this technique is to achieve a low response time of the

system while ensuring a good integration between the

real and virtual elements. The real-virtual integration

improves by using models with a high level of real-

ism, however the high geometric complexity of this

type of models does not allow to achieve a satisfac-

tory response time.

The recent advances in the ﬁeld of information

technology and communications have enabled the de-

velopment of mobile devices with sophisticated fea-

tures such as the iPhone, or some PDAs, usually

equipped with a small camera. These devices allow to

run augmented reality applications that superimpose

on the real image some very simple virtual objects,

such as annotations, images and 3D objects modeled

with very few polygons (Wagner and Schmalstieg,

2009).

As an alternative to geometry-based models, im-

age based rendering techniques (IBR) provide a high

realism having a constant rendering time, independent

of the geometric or structural complexity of the repre-

sented virtual elements. The Light Field is an image-

based modeling technique that represents the objects

as a 4D function from the plenoptic function (Gurrea,

2001). Models based on Light Fields allow us to ren-

der objects both synthetic and real, with a high degree

of realism and a constant time, which makes it suit-

able for use in augmented reality.

Taking this modeling technique into account, this

paper presents an augmented reality system based on

a open source library (ARToolKit) (ARToolKit, 2011)

that uses the light ﬁeld as augmentation model. This

article is organized as follows. It starts reviewing the

current state of augmented reality techniques and the

light ﬁeld model. Next, it is described the implemen-

tation of the presented system. Finally, the obtained

results are evaluated and some works are proposed to

be done in the future, extending this work.

2 PREVIOUS WORK

In this section we will brieﬂy review the current

state of augmented reality techniques and the light

ﬁeld model. Some comprehensive reviews have been

published for both techniques (Azuma, 1997) (Shum

et al., 2003), so this section focuses on the aspects

directly related to the work presented in this article.

453

Quintana A., Quirós R., Remolar I. and Camahort E..

AN AUGMENTED REALITY SYSTEM BASED ON LIGHT FIELDS.

DOI: 10.5220/0003852204530459

In Proceedings of the International Conference on Computer Graphics Theory and Applications (GRAPP-2012), pages 453-459

ISBN: 978-989-8565-02-0

 2012 SCITEPRESS (Science and Technology Publications, Lda.)

2.1 Augmented Reality

The term Augmented Reality was developed in 1990

by Thomas Caudell to refer to a system used in Boe-

ing to assist the cable assembly of aircraft. In 1997

Azuma (Azuma, 1997) gave the ﬁrst deﬁnition of aug-

mented reality as a system that:

• combines reality with virtual elements,

• is interactive, and

• renders 3D objects.

One of the most important aspects to be consid-

ered in the design and implementation of an aug-

mented reality system is the environment that it is go-

ing to represent. This consideration will determine its

features and complexity. In the literature, several sys-

tems have been developed that manage both indoor

and outdoor environments. Indoor environments are

much less restrictive, allowing the use of more pow-

erful computers and ﬁxed systems based on a previous

training of the environment. In outdoor environments,

the user has to transport the entire system, which lim-

its the available processing capacity. Moreover, the

impossibility of preparing or controlling a hostile en-

vironment, subjected to extreme magnetic phenom-

ena, light or weather changes and other natural phe-

nomena appears.

Depending on the environment that is represented

and the future application, an augmented reality sys-

tem is composed of the following elements:

• A processing system.

• A visualization device.

• A monitoring system.

The processing system can be a ﬁxed or portable

computer, PDA or mobile phone. The choice depends

on the work environment and the process capability

required by the application.

The display device is the part of the system re-

sponsible to render the augmentation. The potential

devices can be divided into two groups: the ones

based on optical technologies and the ones based on

video. On one hand, in the systems based on optical

technologies, the user directly observes the real scene,

overlapped by the synthetic image by means of opti-

cal combiners. These devices are usually mounted on

the user’s head, such as virtual reality helmets. They

usually present some problems such as ghosting, eye

strain or fatigue. On the other hand, the video-based

systems, by contrast, combine a video sequence from

the actual scene, captured with one or more cameras,

with synthetic images by mixing video techniques.

The advantages of this kind of systems are their sim-

plicity and cost, since they only require a personal

computer and some USB or Firewire video cameras.

Furthermore, the availability of a sequence of real-

world video allows us to use tracking systems based

on the detection of characteristic traits using computer

vision techniques.

Finally, the tracking system is in charge of estimat-

ing the position and orientation of the the real-world

view to augment. The type of system to be used is

determined by the environment in which the applica-

tion is going to be applied and by the utilized display

system. In an optical-technology-based system, some

sensors have to be used to implement the monitoring

system. However, in a video-based system can be ap-

plied computer vision techniques that complement the

use of sensors, such as GPS receivers, magnetic sen-

sors (bars), inertial sensors (accelerometers and gyro-

scopes) among others.

In recent years, some augmented reality systems

based on video have been proposed that perform the

monitoring by detecting characteristics or features of

the image. The insertion of markers in known posi-

tions in the real environment facilitates the monitor-

ing, but requires prior preparation of it.

Since the initial proposals at the beginning of the

90’s, augmented reality systems have been applied in

ﬁelds such as medical visualization, repair, assembly

and maintenance of machinery, planning of actions

for robots, entertainment, education or construction,

among others (Azuma, 1997). Most of these appli-

cations require very complex and expensive systems

and they usually operate in interior environments. Al-

ternatively, in recent years, some applications have

been implemented that use a personal computer or

mobile device, a webcam and a set of low cost sen-

sors. Among them, we highlight the animated base-

ball cards for sale in the United States, or Wikitude

World Browser for the iPhone (Wikitude, 2011).

Many of the proposals about low cost applications

use open source libraries, such as ARToolKit (AR-

ToolKit, 2011). ARToolKit includes a tracking sys-

tem based on markers that allows us to show three-

dimensional objects superimposed on the real image

captured by a camera. Its main advantages are the

availability of the code and the few requirements on

devices (a computer and a video camera), so it be-

comes an ideal platform for developing augmented

reality applications. Its main drawback is the need

to prepare the work environment by placing mark-

ers, which makes it difﬁcult to use in outdoor envi-

ronments. However, this library has been used in nu-

merous applications indoors (Kwon and Park, 2005)

(Asai et al., 2004) (Nischelwitzer et al., 2007), and in

some outdoor applications combined with other mon-

itoring techniques (Guo et al., 2008).

GRAPP 2012 - International Conference on Computer Graphics Theory and Applications

454

In ARToolKit, objects superimposed on the mark-

ers can be displayed using the OpenGL library or

loading a VRML model. In both cases, these rep-

resentations are adapted for rendering on specialized

graphics hardware. However, it should be noted that

photorealistic rendering involves a considerable loss

in rendering speed (Kang et al., 2000).

2.2 Light Fields

The image-based rendering techniques (IBR) are tra-

ditionally proposed as an alternative to geometry-

based rendering for generating images of both real

and synthetic objects. These techniques are indepen-

dent of the geometric complexity of the represented

objects. Moreover, they are really efﬁcient in render-

ing images of a scene from different viewpoints by

combining samples of available images.

Different approaches have been proposed to

this technique: from the point-based representation

(Levoy and Whitted, 1985) to the most innovative

proposals in which a large number of images are used

to render different viewpoints, interpreting these im-

ages from a 4D function obtained from the plenoptic

function (Adelson and Bergen, 1991).

Light ﬁelds were ﬁrstly introduced by Levoy and

Hanrahan (Levoy and Hanrahan, 1996) and Gortler et

al. (Gortler et al., 1996). The light ﬁeld allows us

to represent complex geometric objects deﬁned as the

visible radiance in a point in one determined direc-

tion. From this data, it is possible to obtain the repre-

sentation of the light ﬂow of all lines passing through

the point of view of a scene. This method allows us

to synthesize non-existent images through a ﬁltering

process and the interpolation of some available im-

ages, ensuring a correct visual perspective.

The plenoptic function was initially deﬁned in

(Adelson and Bergen, 1991) as the intensity of the

light rays passing through the center of the camera

for any point (V

, V

) in all possible angles (θ, φ)

for each wavelength λ and in every time t, as it is ex-

pressed in Equation 1.

= P(V

, V

, θ, φ, λ, t) (1)

Adelson and Bergen (Adelson and Bergen, 1991)

considered as a essential task to achieve a useful and

compact description for the local properties of this

function. This idea was echoed by Wong et al. (Wong

et al., 1997) who introduced (θ, φ) as the light direc-

tion to ensure control of the lighting. McMillan and

Bishop (McMillan and Bishop, 1995) introduced the

concept of full plenoptic modeling from a 5D function

considering the static environment (Equation 2).

= P(V

, V

, θ, φ) (2)

This new deﬁnition of the plenoptic function was

reduced to a 4D function in (Levoy and Hanrahan,

1996) by considering a occlusion-free space as a re-

sult of the no-variable behavior of the radiance along

a line unless it is blocked. The analysis concludes

with a space-oriented line function, parameterized by

two planes at an arbitrary position. The formulation

is shown in Equation 3,

= P(u, v, s, t) (3)

where (u, v) and (s, t) are the coordinate systems

of the foreground and background plane respectively.

This type of parameterization is characterized by in-

troducing distortions when performs a representation

of the light ﬁeld. Other types of representations

present an isotropic parameterization, resulting in a

uniform light ﬁeld (Camahort et al., 1998).

The image-based rendering techniques offer a

simple acquisition capability and a very realistic rep-

resentation of complex lighting conditions. Among

their advantages, we can highlight their low render-

ing complexity, which depends only on the resolution

of the used images. Moreover, we can use some com-

pression and simplifying image algorithms, more efﬁ-

cient than those applicable to geometric data. Finally,

there is a possibility of using pre-acquired images of

both real and synthetic objects, or even a mixture of

both (Levoy and Hanrahan, 1996).

Different systems have been designed to allow the

acquisition of a light ﬁeld from a real object or scene

(Liang et al., 2007). The light ﬁeld model used in this

work is based on a direction and point parameteriza-

tion (DPP) with depth information associated to the

light ﬁeld radiance. This model can represent mul-

tiple objects with geometric information associated.

The light ﬁeld is represented as the sampling radiance

data of the lines that intersect the convex hull of the

object. The implementation of the DPP parametriza-

tion is based on a quasi-uniform discretization of the

set of directions in the 3D Cartesian space that con-

verts it to a 2D space. The use of depth information

ensures a higher quality image and low requirement

storage (Escriva et al., 2006).

3 GOALS

The main goal of this work is to design an augmented

reality system that previews a virtual object using a

light-ﬁeld model. The most important advantages of

using light ﬁelds are:

AN AUGMENTED REALITY SYSTEM BASED ON LIGHT FIELDS

455

(a) Gargoyle model. (b) Lion model. (c) Umma model.

Figure 1: Screenshots of the light-ﬁeld representations used in our tests.

Figure 2: Visual inspection of a light ﬁeld by the manipulation of a marker.

• The visualization time is constant and does not de-

pend on the object’s geometric complexity.

• They allow the visualization of synthetic and real

objects.

These advantages led us to propose an augmented

reality system capable of handling scenes with a high

level of complexity and/or real objects. This devel-

oped system will be very useful in many applications

related to cultural heritage or virtual museum visual-

ization, for instance. As a secondary goal, we pro-

pose the use of a low cost system consisting of a lap-

top computer or a mobile device, a web-cam, and an

open-source library.

4 IMPLEMENTATION

In order to implement the proposed system, the AR-

ToolKit library has been used as augmentation tool

and a DPP-based light-ﬁeld model with associated

depth information. ARToolKit’s tracking system

gives back a transformation matrix with the informa-

tion about real camera position and orientation. Us-

ing this information, the virtual camera position can

be established to ensure that the virtual object main-

tains a proper alignment with the marker detected in

the image.

Figure 3: Relationship between the coordinate systems of

the camera and the marker M

matrix.

The system has been implemented on a DELL

workstation with an Intel Xeon 2.8 GHz preproces-

sor and 1 GB of memory. Regarding the graphics

hardware, it has been used an NVIDIA GeForce 7800

GTX graphic card with 512 MB of memory.

ARToolkits tracking system runs frame by frame.

For each frame the visible marker in the image is de-

tected. Then, the marker with the highest conﬁdence

is selected and it is generated a trasformation matrix.

The tranformation maps the camera coordinate sys-

tem to the coordinate system of the marker selected.

This transformation is stored as a 4x4 matrix (M

)

and it is returned to the application for processing (see

Figure 3). The system uses matrix M

as the OpenGL

modelview matrix to render the virtual objects.

That way the objects are rendered using a syn-

thetic camera that has been registered with the real

GRAPP 2012 - International Conference on Computer Graphics Theory and Applications

456

camera. We consider extracting the camera coordi-

nate system from matrix M

directly. Note that M

a coordinate system transformation, and its ﬁrst three

rows contain the director vectors Forward, U p and

Side. The fourth column is related to the viewing po-

sition. These relationships between M

and the cam-

era coordinate system are illustrated in the following

equation:







− − − −







(4)

Moreover, we extend this library by including a

spherical light-ﬁeld rendering algorithm, a version of

the Lumigraph algorithm (Gortler et al., 1996). The

validation of the system has been made by testing dif-

ferent light-ﬁelds and comparing the response time of

the new algorithm versus the response time of AR-

ToolKit’s original algorithm with VRML objects.

5 RESULTS AND CONCLUSIONS

The new algorithm has been validated using several

synthetic light-ﬁelds created in a previous research

project. Figure 1 shows a capture for each one of

them. The rendering of these light-ﬁeld models is in-

teractive: moving the markers makes it possible to vi-

sualize the objects from different viewpoints. Figure

2 shows the user’s interaction with a light-ﬁeld and

the ARToolKit’s markers.

As it has been mentioned before, the time neces-

sary to render a light-ﬁeld is geometrically indepen-

dent. It just depends on the number of images used to

build the model and on their resolutions. Table 1 il-

lustrates some parameters of the used models, and the

time needed to visualize them (in seconds). The res-

olution of the images used to render the models was

256x256. It is important to highlight the results relat-

ing to the rendering time: all the models reach a frame

rate between 10 and 15 frames per second, very close

to real time, when the camera moves around the ob-

ject. The high level of photo-realism of the images

and the constant visualization time, independent of

the geometry, makes it possible to use this system in

the visual inspection of complex virtual models, or

even in light-ﬁelds of real objects.

Although the rendering cost of a light-ﬁeld model

is constant, if the object to represent is geometri-

cally simple, we will get better rendering time with

classical polygonal representations. In order to de-

termine the geometric complexity of a model and to

decide what kind of modeling should be used, some

Table 1: Light ﬁelds used in the tests.

Object Polygons Images Time (ms)

gargoyle 478.950 27.300 0.04281

lion 1.311.956 1.700 0.03297

umma 226.705 27.300 0.002969

Table 2: Characteristics of the different levels of detail.

LoD Polygons Vertices

1 871.414 437.645

2 1.742.828 875.290

3 3.485.656 1.750.580

4 6.971.312 3.501.160

5 13.942.624 7.002.320

6 27.885.248 14.004.640

tests have been developed. Our experiments com-

pare the rendering time of visualization a light-ﬁeld

representation with the one obtained with a classical

ARToolKit version that uses VRML as representation

format. The object to render is Brian Curless’ Dragon,

available in VRML format at the Stanford repository

and its light-ﬁeld model, also acquired in a previous

project. We evaluated the response time of our sys-

tem by averaging the visualization time (for every 100

process cycles) and comparing the result with the val-

ues returned by ARToolKit when different levels of

detail (LoD) are rendered. Table 2 shows the geo-

metric characteristics of the different levels of detail

involved in the experiment of the VRML model.

The light-ﬁeld of Brian Curless’ Dragon was

rendered from 27.300 images with a resolution of

256x256, obtained from 47.8950 views. Table 3 com-

pares the rendering time of all the models and Figure

5 shows a comparative chart of these results. The blue

line represents the visualization time, in milliseconds,

as a function of the number of polygons. The red

line represents the visualization time of the light-ﬁeld

model.

The visual aspect of the light-ﬁeld representation

Table 3: Rendering time of the different levels of detail.

Time (ms)

Based on Light Fields 0,04281

Based on VRML ﬁles

Level of detail 1 0.03578

Level of detail 2 0.03829

Level of detail 3 0.04375

Level of detail 4 0.05391

Level of detail 5 0.07515

Level of detail 6 0.12766

AN AUGMENTED REALITY SYSTEM BASED ON LIGHT FIELDS

457

(a) Light ﬁeld modeling. (b) VRML modeling (LoD 6).

Figure 4: Comparison between both representations in the developed system.

Figure 5: Graphical representation of the obtained results.

in the developed augmented reality system is better

than the one given by the geometric representation,

as Figure 4 shows. Moreover, as the Artoolkit library

was used, it is guaranteed in the developed system an

accurate tracking process, preserving a real sensation

on an augmented environment from different points

of view.

Analyzing the results shown in Table 3 and in Fig-

ure 5, it can be concluded that the use of a light-ﬁeld

model is more suitable as a way of representation for

geometric models with more than 3 million triangles.

However, if the model to represent is formed by less

than this number of polygons, it is better to render the

model using a geometric representation.

Finally, we can conclude that the use of light-

ﬁeld models offers a higher visual quality than those

that use geometrical representations, with a stable re-

sponse time. This advance allows us the use of com-

plex synthetic objects or those acquired from real ob-

jects.

6 FUTURE WORK

In order to render light-ﬁeld models from real ob-

jects, a big amount of images from different view-

points is needed. Nowadays, there exist many sys-

tems that capture spherical light-ﬁelds, but they are

limited by the size of the object to be represented, or

are even restricted to acquire the images in the lab,

which is sometimes impossible. It is necessary to

deﬁne some unstructured acquisition techniques that

enable the acquisition of an object with a hand-held

camera.

Another problem is the big amount of images

needed to render the actual light-ﬁeld models; this

implies a high cost of storage. For this reason, it is

necessary to devise new representation techniques in-

volving a smaller amount of images, thus improving

the response time.

Finally, the use of markers reduces the system’s

capabilities only to indoor space. To consider another

tracking system, based on low cost sensor technology,

and the detection of singular features, could be a bet-

ter solution. These modiﬁcations allow the use of this

kind of application in outdoor space with mobile de-

vices, depending of the graphic and processing capa-

bilities.

GRAPP 2012 - International Conference on Computer Graphics Theory and Applications

458

ACKNOWLEDGEMENTS

This work was supported by the Spanish Min-

istry of Science and Technology (Project TIN2010-

21089-C03-03) and Feder Funds, Bancaixa (Project

P1.1B2010-08) and Generalitat Valenciana (Project

PROMETEO/2010/028).

REFERENCES

Adelson, E. H. and Bergen, J. R. (1991). The plenoptic

function and the elements of early vision. In Landy,

M. S. and Movshon, A. J., editors, Computational

Models of Visual Processing, pages 3–20. MIT Press.

ARToolKit (2011). Artoolkit user manual, version 2.33.

Asai, K., Osawa, N., Sugimoto, Y. Y., and Kondo, K.

(2004). Operation-support system for transportable

earth station using augmented reality. In Masoodian,

M., Jones, S., and Rogers, B., editors, APCHI, volume

3101 of Lecture Notes in Computer Science, pages 9–

18. Springer.

Azuma, R. T. (1997). A survey of augmented reality.

Presence: Teleoperators and Virtual Environments,

6(4):355–385.

Camahort, E., A., and Fussell, D. S. (1998). Uniformly

sampled light ﬁelds. In Rendering Techniques, pages

117–130.

Escriva, M., Domingo, A., Abad, F., Vivo, R., and Cama-

hort, E. (2006). Modeling and rendering of dpp-based

light ﬁelds. Geometric Modeling and Imaging–New

Trends, pages 51–56.

Gortler, S. J., Grzeszczuk, R., Szeliski, R., and Cohen, M. F.

(1996). The lumigraph. In Proceedings of the 23rd

annual conference on Computer graphics and inter-

active techniques, SIGGRAPH ’96, pages 43–54.

Guo, Y., Du, Q., Luo, Y., Zhang, W., and Xu, L. (2008).

Application of augmented reality gis in architecture.

In ISPRS08, pages 331–336.

Gurrea, E. C. (2001). 4d light-ﬁeld modeling and rendering.

Kang, S. B., Szeliski, R., and Anandan, P. (2000). The

geometry-image representation tradeoff for rendering.

In Proceedings ICIP, pages 13–16.

Kwon, Y. and Park, J. (2005). Tangible tele-meeting sys-

tem with dv-arpn (augmented reality peripheral net-

work). In Computational science and its applications

– ICCSA 2005. International conference, Singapore,

May 9–12, 2005. Proceedings, Part I, pages 913–920.

Levoy, M. and Hanrahan, P. (1996). Light ﬁeld render-

ing. In Proceedings of the 23rd annual conference on

Computer graphics and interactive techniques, SIG-

GRAPH ’96, pages 31–42. ACM.

Levoy, M. and Whitted, T. (1985). The use of points as a

display primitive. In Technical Report 85-022, Com-

puter Science Department,Technical Report 85-022,

Computer Science Department.

Liang, C., Liu, G., and Chen, H. H. (2007). Light ﬁeld

acquisition using programmable aperture camera. In

ICIP (5), pages 233–236.

McMillan, L. and Bishop, G. (1995). Plenoptic model-

ing: an image-based rendering system. In Proceedings

of the 22nd annual conference on Computer graph-

ics and interactive techniques, SIGGRAPH ’95, pages

39–46.

Nischelwitzer, A., Lenz, F., Searle, G., and Holzinger, A.

(2007). Some aspects of the development of low-cost

augmented reality learning environments as examples

for future interfaces in technology enhanced learning.

In Proceedings of the 4th international conference on

Universal access in human-computer interaction: ap-

plications and services, pages 728–737.

Shum, H., Kang, S., and Chan, S. (2003). Survey of image-

based representations and compression techniques.

IEEE Trans. Circuits Syst. Video Techn., 13(11):1020–

1037.

Wagner, D. and Schmalstieg, D. (2009). Making augmented

reality practical on mobile phones, part 2. IEEE Com-

put. Graph. Appl., 29:6–9.

Wikitude (2011). Wikitude, http://www.wikitude.com/.

Wong, T., Heng, P., Or, S., and Ng, W. (1997). Image-

based rendering with controllable illumination. In

Proceedings of the Eurographics Workshop on Ren-

dering Techniques ’97, pages 13–22.

AN AUGMENTED REALITY SYSTEM BASED ON LIGHT FIELDS

459