LOW-RANK RADIOSITY USING SPARSE MATRICES

Eduardo Fern´andez

, Pablo Ezzatti

, Sergio Nesmachnow

and Gonzalo Besuievsky

Centro de C´alculo, Universidad de la Rep´ublica, Montevideo, Uruguay

Geometry and Graphics Group, Universitat de Girona, Girona, Spain

Keywords:

Radiosity, Real-time Global Illumination.

Abstract:

Radiosity methods are part of the global illumination techniques, which deal with the problem of generating

photorealistic images in 3D scenes with Lambertian surfaces. Low-rank radiosity is a O(nk) method, where

n is the number of polygons and k is the rank of the matrix used as a direct transport operator. This method

allows calculating, in real-time and with inﬁnite bounces, the illumination of a scene with static geometry

and dynamic lighting. In this paper we present a new methodology for low-rank radiosity calculation based

on the use of sparse matrices, which signiﬁcantly reduces the memory storage required and achieves speedup

improvements over the original low-rank method. Experimental analysis was performed in both traditional

computers and new graphics processing unit architectures.

1 INTRODUCTION

The radiosity method is a pioneering technique for

global illumination on scenes with Lambertian sur-

faces, which has been applied in many areas of design

and computer animation (Dutre et al., 2006).

Since the classic formulation of the radiosity tech-

nique in the decade of 1980, several iterative res-

olution methods have been proposed (Cohen et al.,

1993). These methods are not feasible for computing

realistic images in real-time (many bounces at 20 fps),

due to their iterative nature. After that, several other

methods have been proposed to achieve the real-time

computation of the global illumination, such as in-

stant radiosity (Keller, 1997), precomputed radiance

transfer (Sloan et al., 2002), radiance transport (Kon-

tkanen et al., 2006; Lehtinen et al., 2008) and photon

mapping (Wang et al., 2009), which have shown some

success using global illumination approximations.

By exploiting spatial coherence, the low-rank ra-

diosity method (LRR) (Fern´andez, 2009) allows

solving the real-time radiosity problem with inﬁnite

bounces, for scenes with static geometry and dynamic

lighting, using a low memory storage approach. The

original LRR implementation used full matrices, and

the promising idea of generating sparse matrices was

previously pointed out, but it was not developed.

This work presents a new approach for the utiliza-

tion of sparse matrices in the LRR method. Speciﬁc

implementations of algorithms to solve the radiosity

problem in both traditional computer architectures

(CPU) and modern graphic processing units (GPU)

are introduced. The experimental results show that

the GPU implementation of the LRR method based on

sparse matrices has an acceleration factor greater than

four, compared with the full matrix approach of the

LRR method. These promising results demonstrate

that our approach is an alternative for real-time radio-

sity applications. Comparing to other strategies, LRR

ensures always a full global illumination solution be-

ing capable of dealing efﬁciently with environments

with signiﬁcant indirect illumination contribution.

The rest of the article is organized as follows. Sec-

tion 2 introduces the radiosity problem and the pre-

vious work. The low-rank radiosity method is de-

scribed in Section 3. Section 4 introduces the utiliza-

tion of sparse matrices in LRR, and describes the im-

plementations of the proposed algorithms for sparse

low-rank radiosity in CPU and GPU. The experimen-

tal analysis is reported in Section 5. Finally, Section 6

summarizes the conclusions of the research and for-

mulates the main lines for future work.

2 THE RADIOSITY PROBLEM

This section starts by introducing the radiosity pro-

blem formulation. After that, the main proposals

found in the literature related with the pre-computed

radiosity and radiance transfer problem are reviewed.

260

Fernández E., Ezzatti P., Nesmachnow S. and Besuievsky G..

LOW-RANK RADIOSITY USING SPARSE MATRICES.

DOI: 10.5220/0003829402600267

In Proceedings of the International Conference on Computer Graphics Theory and Applications (GRAPP-2012), pages 260-267

ISBN: 978-989-8565-02-0

 2012 SCITEPRESS (Science and Technology Publications, Lda.)

2.1 Problem Formulation

The radiosity problem is modeled by a Fredholm in-

tegral equation, the radiosity equation (1). In Equa-

tion 1, B(x) is the radiosity in the point x, E(x) is the

light emission in x, ρ(x) is the Lambertian diffuse re-

ﬂectivity of x, and G(x, x

′

) is a geometric factor that

determines how much the radiosity in x

′

contributes

to the radiosity of x (Cohen et al., 1993).

B(x) = E(x) + ρ(x)

B(x

′

)G(x,x

′

)dA

′

(1)

Any Fredholm equation can be transformed into a dis-

crete equation, which implies solving a linear system.

In the radiosity problem the linear system is expressed

by Equation 2, where I is the identity matrix with di-

mension n×n (n is the number of polygons), R is a

diagonal matrix that stores the reﬂectivity of the poly-

gons, F is the matrix with the form factors F

–which

indicate the fraction of light reﬂected by the polygon

i that arrives to the polygon j–, B is a vector with the

radiosity value of each patch, and E is the vector with

the emission of each patch.

(I− RF)B = E (2)

This discrete equation can be also formulated through

the use of a direct transport operator T, which ex-

presses the direct transmission of light between two

surfaces of the scene. Using T, the discrete radiosity

equation (Equation 2) is expressed as (I− T)B = E.

Moreover, the operator G = (I − T)

−1

, is a global

transport operator relating the emitted light with the

ﬁnal radiosity of the scene, B = GE. For scenes where

the geometry is ﬁxed and only the light output varies,

the operators T and G are constant.

2.2 Related Work

The main class of elements used in light transport

includes the polygons, the parametric surfaces (Co-

hen et al., 1993), and the discrete points (Fasshauer,

2006). In all these cases it is possible to build oper-

ators T and G that indicate the transport relationship

between any pair of elements.

The O(n

) memory required to store T has been

clearly posed as a problem in early radiosity papers

(Cohen et al., 1988). Many solutions have been pro-

posed, such as two-level hierarchy (Cohen et al.,

1986), hierarchicalstructures (Hanrahan et al., 1991),

and wavelets (Gortler et al., 1993). The memory sav-

ings are determined by the spatial coherence of the

scenes, where it is highly probable that neighboring

elements i, j have similar transport values with many

other elements k of the scene, that is, T

≈ T

. In this

case, these values are considered equal, then, many

transport values can be replaced by a single value.

This is the main idea behind the hierarchical approx-

imation H(T) of any T operator, where H(T) is usu-

ally stored in O(n) memory. The balance between

the error introduced by the method and the memory

saved is highly convenient for most scenes. In these

works the calculation of B is done by iterative me-

thods, avoiding the need of computing H(G).

Recently, other kinds of methods that include

BRDF surfaces and the calculation of H(G) have

been developed. In (Kontkanen et al., 2006), the

BRDF operator (T) is transformed into H(T) through

the use of wavelets, and H(G) is calculated using the

Neumann series H(G) = I+ H(T) + H(T)

+ . .., by

removing all values of H(T)

smaller than a certain

threshold. After calculating the global incident radi-

ance of a scene, the BRDF function is applied again to

ﬁnd an image from a particular point of view. Later, in

(Lehtinen et al., 2008) a meshless hierarchical repre-

sentation of the objects in the scene, built upon a hier-

archy of points, is performed. Each point consists of

a pair (position, normal) and has a radius of inﬂuence

related with the hierarchy level to which it belongs.

Using a hierarchical function basis, a sparse matrix

H(T) is generated by elimination of all terms below

a threshold, and a ﬁnal sparse operator H(G − I) is

created, using also Neumann series.

Other methods to achieve real-time global illu-

mination are based on instant radiosity techniques

(Keller, 1997) and its variants (Laine et al., 2007),

but these methods compute only a few bounces.

In contrast, the LRR method proposed in this work

allows to solve the radiosity problem with inﬁnite

bouncesand relatively small computational resources.

This result is based on the premise that when T is a

low-rank matrix, the operator G can be easily calcu-

lated, as it is explained in the next section.

3 THE LOW-RANK RADIOSITY

METHOD

It is very likely that the (RF) matrix included in Equa-

tion 2 has a low numeric rank. This is because each

row (RF)

is computed based on the scene view from

the element i, and most pairs of close patches have a

very similar view of the scene (Figure 1). Then (RF)

has many similar rows, which results in a reduction of

the numerical rank.

The rank reduction of (RF) matrices allows

approximating them by the product of two matrices

, both with dimension n×k (n≫k), without los-

ing relevant information about the scene.

LOW-RANK RADIOSITY USING SPARSE MATRICES

261

(a) Two close patches i

and i

(b) Hemicube view from i

. (c) Hemicube view from i

(d) Spatial coherence in the scene.

Figure 1: Close patches have similar views of the scene and

generate similar rows in (RF). In (d) the spatial coherence

of each patch is measured (darker means lower coherence).

3.1 Low-rank Approximation

The product U

generates a matrix with dimension

n×n and rank k. The memory space needed to store

and V

is O(nk), signiﬁcantly less than the O(n

)

required to store the matrix (RF), especially when

n≫k. This memory saving allows reducing the space

needed to store the information for large scenes.

In the LRR equation (3), (RF) is replaced by the

low-rank approximation U

. The unknown is not

longer B, but an approximation

B of the radiosity.



I− U



B = E (3)

The matrix



I− U



is easily invertible using the

Sherman-Morrison-Woodbury formula (Golub and

Loan, 1996). Equation 4 shows the expression for

B = E + U





− V



−1







(4)

In order to ﬁnd

B through Equation 4, O(nk

) opera-

tions are required and O(nk) memory is needed.

In scenes with static geometry and dynamic light-

ing (i.e., only the independent term E varies in Equa-

tions 2 to 4), part of the computation can be executed

before the real-time stage. Equation 4 is transformed

into 5, where the n×k matrices Y

, U

, and V

are cal-

culated once. Thus, the real-time stage only has com-

plexity O(nk). This is an auspicious result in order to

develop a new methodology for real-time radiosity.

B = E − Y





, (5)

= −U



− V



−1

Equation 5 can also be formulated as

B = GE, so

G = I − Y

is considered as the global operator

that manages the inﬁnite bounces of light in a single

operation. This global operator is computed without

the use of Neumann series, as in (Kontkanen et al.,

2006; Lehtinen et al., 2008), taking advantage of the

fact that (RF) is a low-rank matrix.

3.2 U

and V

Calculation

and V

can be computed using factorization tech-

niques, such as singular value decomposition (SVD

or PCA) (Golub and Loan, 1996), CUR factoriza-

tion (Goreinov et al., 1997), and discrete transfor-

mations based on Fourier and wavelets (Press et al.,

2007). SVD is the only method that produces good

approximations to (RF) for low values of k, but it has

an O(n

) complexity. The other methods mentioned

have lower complexity orders, but they also produce

larger errors in the (RF) approximation.

To overcome the weakness of the traditional fac-

torization techniques, the Low-Rank Radiosity (LRR)

method proposed in (Fern´andez, 2009) uses the con-

cept of spatial coherence (Sutherland et al., 1974).

This method generalizes the two-level hierarchy me-

thodology (Cohen et al., 1986), where two meshes

with different granularity levels are generated for the

scene (a coarse mesh with k patches and a ﬁne mesh

with n elements). Using these two meshes it is possi-

ble to build the n×k matrices U

and V

Both, the coarse and the ﬁne meshes, can be uni-

form or nonuniform. In this paper, we proposed a

nonuniform coarse mesh that is adapted to the spatial

coherence of the scene. In the resulting algorithm,

named 2 Meshes with Spatial Coherence (2MSC), the

ﬁne mesh is given, assuming that inside each element

there is spatial coherence, but the coarse mesh is built

taking care of the spatial coherence inside each patch.

The 2MSC produces a nonuniform mesh, with

smaller patches in those areas with low spatial cohe-

rence. Algorithm 1 shows a top-down procedure for

GRAPP 2012 - International Conference on Computer Graphics Theory and Applications

262

creating the coarse mesh from a few large patches,

using the 2MSC algorithm. An initial set of large

patches are inserted into a list L (see Figure 2 (a)). Ev-

ery patch is taken from L, one at a time. All patches

without spatial coherence (i.e., when the inner scene

views vary more than a speciﬁed threshold) are split

into several subpatches, which are inserted into L.

The algorithm stops when the list L is empty (i.e.,

when all patches have spatial coherence), or when all

those patches without spatial coherence are smaller

than a predeﬁned value (A

min

). The number of patches

(k) depends on the value of the parameter A

min

and

also depends on the procedure used to evaluate the

spatial coherence. Figure 2 (b, c) shows a coarse mesh

generated using the 2MSC algorithm and a ﬁne mesh.

The discretization of the coarse mesh shown in Fig-

ure 2 (b) agree with the colors of spatial coherence

shown in Figure 1 (d), i.e., the darker the color, the

more need to split the surface.

Algorithm 1: 2MSC algorithm.

1: Store patches p in a list L.

2: while (L6={}) do

3: Remove a patch p from L.

5: if ¬(spatial coherence inside p) ∧ (Area(p) > A

min

)

6: Divide p in subpatches and store them in L.

7: end.

8: end.

After the scene meshes are deﬁned, the matrices

and V

are built using the information from an as-

signment function P : e → p that gives, for each ele-

ment e in the ﬁne mesh, the corresponding patch p in

the coarse mesh. The assignment can be deﬁned by

inclusion (element e belongs to single patch p) or by

proximity (p is the closest patch to element e).

If P is deﬁned by inclusion and all elements be-

longing to any patch have equal area, the construction

of matrices U

and V

follows Equation 6. There, F

is the form factor between an element e of the scene

and the elements assigned to the patch p, and c

is the

number of elements assigned to p.

(e, p) = F

(e,P(e)) = 1 , V

(e, p) = 0 ∀p 6= P(e) (6)

Other methodologies can be applied to build the

matrices U

and V

. Besides using SVD factoriza-

tion, where the columns of both matrices are orthog-

onal, the columns of the matrix U

generated using

Equation 6 can also be orthogonalized using QR fac-

torization. After the QR factorization is performed,

a new matrix U

is generated and V

can be re-

placed by V

= (RF)

. These methodologies

have been implemented by the authors to generate U

and V

, and the experiments showed no conclusive

(a) Original coarse mesh.

(b) Coarse mesh after 2MSC.

Figure 2: Meshes used to compute U

and V

results about which is best method. In this paper, the

Equations in 6 were employed to build the matrices,

due to both the simplicity of building a low rank ap-

proximation of the (RF) matrices and the possibility

to build a sparse version of the V

matrix, as it is ex-

plained in the next section.

4 SPARSE MATRICES IN LRR

The 2MSC algorithm and the Equations 6 gene-

rate sparse matrices V

with dimension n×k, with

a unique non-zero entry in each row (corresponding

with the P(e) column). This section introduces an

O(k) storage methodology for the V

matrix, and an

implementation of the LRR method for both CPU and

GPU platforms.

LOW-RANK RADIOSITY USING SPARSE MATRICES

263

4.1 Compressing the Information in V

The rows of V

can be reordered, grouping them by

the column where the non-zero value is in (k dis-

joint groups are created, see Figure 3). Using ideas

from the compressed storage of sparse matrices (Bar-

rett et al., 1994), an index vector I can be deﬁned. I

indicates that the rows of V

with a one in the ﬁrst co-

lumn are placed from I

= 1 to I

−1, that the rows of

with a one in the second column are placed from I

to I

−1, and so on. The O(k) entries of I are enough

to deﬁne the matrix V

Figure 3: Construction of V

and I from the function P.

Once the vector I is built and the values in the vec-

tor E are reordered accordingly to the rows of V

, the

product V

E can be formulated as the sum in Equa-

tion 7. The i-th value of the vector V

E is the sum of

the values of E between the positions I

and I

i+1

− 1.





i+1

−1

∑

s=I

, ∀i,1 ≤ i ≤ k (7)

The previously presented results allow to conclude

that if V

is a sparse matrix with the stated proper-

ties, the matrix-vector product V

E can be computed

in O(n) operations (instead of the O(nk) operations

needed when V

is a full matrix).

4.2 Sparse LRR Implementation

In order to validate the theoretical results presented in

the previous subsection, two implementations of the

sparse LRR method were developed to solve the ra-

diosity problem with a sparse V

matrix generated by

Equations 6. Each implementation uses a different

hardware to perform the calculations: one uses a tra-

ditional CPU, and the other uses a GPU.

The LRR implementation in CPU uses the BLAS

library (Lawson et al., 1979), and Equation 7 is solved

sequentially. On the other side, the LRR implemen-

tation in GPU uses the CUBLAS implementation of

BLAS for CUDA (NVIDIA, 2008), and k threads are

used to update the vector





in parallel, each

thread computing

∑

i+1

−1

s=I

for a different i.

Both LRR implementations use the

xGEMV

func-

tion from the BLAS library to compute the matrix-

vector product Y

E). The product V

E is not

explicitly computed, instead, it is calculated using the

sum in Equation 7. The pseudocode of both imple-

mentations for a real-time processing of frames from

a graphic application is sketched in Algorithm 2.

Algorithm 2: Sparse LRR.

{Pre-processing: Compute Y

and I}

1: [ONLY CPU] Store Y

, I into RAM memory

2: [ONLY GPU] Send and store Y

, I into GPU memory

3: for each frame do {in real-time}

4: Receive E from a graphic application

5: [ONLY GPU] Send and store E into GPU memory

6: Compute X





,∀i,1 ≤ i ≤ k, with

∑

i+1

−1

s=I

E(s)

7: Apply xGEMV to compute

B = E −Y

8: [ONLY GPU] Send

B to RAM memory

9: Store

B into RAM memory

10: end

5 EXPERIMENTAL ANALYSIS

The experimentalanalysis compares the time required

to compute

B using the LRR method, based on full

and sparse matrices, as well as on CPU and GPU ar-

chitectures.

5.1 Execution Platforms

A ﬁrst analysis was performed in a Pentium dual-core

E5200 (2.50 GHz), 2 GB RAM, with a NVIDIA 9800

GTX+ 512 MB. A second analysis using only sparse

matrices and larger scenes was executed in a Tesla

Intel Xeon QuadCore E5530 (2.27 GHz), 8 GB RAM,

with a NVIDIA c1060 (240 cores at 1.3 GHz) 4 GB.

5.2 Test Scenes

A ﬁrst analysis was performed using a Cornell

box with several discretizations (n= {3456, 13824,

55296, 221184} and k= {216, 864, 3456}). For

the experiments in the Tesla platform, six scenes

with dimension (n×k) 1769472×216, 221184×3456,

442368×1728, 884736×864, 1769472×432, and

3538944×216 were created from the same Cornell

GRAPP 2012 - International Conference on Computer Graphics Theory and Applications

264

box scene. Finally, we perform tests on a two-ﬂoor

patio model with dynamic lighting changes.

5.3 Experimental Results

In the following tables, the row categorization puts

together all scenes with approximately equal mem-

ory consumption. Each category consumes four times

more memory than the previous one.

Table 1 shows the speedup values (ratio of execu-

tion times) obtained when computing V

E using the

sparse LRR implementations in GPU and CPU, and

the comparison with the full matrix implementation.

The best speedup values are shown in bold font.

The experimental results show that the speedup

values were up to 335.50 in CPU, and up to 37.47 in

GPU. The best speedup values were obtained for the

scene with dimension 13824×3456,taking advantage

of the large number of calculations in the sum and the

parallel computing strategy in

xGEMV

. Table 1 also in-

dicates that the sparse V

E product is faster in CPU.

Table 1: Speedup when computing V

dimension

fullCPU

sparseCPU

fullGPU

sparseGPU

sparseCPU

sparseGPU

fullCPU

sparseGPU

3456× 216 15.67 8.00 0.60 9.40

3456× 864 68.33 9.00 0.60 41.00

13824× 216 21.67 11.36 0.64 13.93

3456× 3456 75.75 18.33 0.67 50.50

13824× 864 92.11 13.21 0.64 59.21

55296× 216 17.42 11.20 0.64 11.20

13824× 3456 335.50 37.47 0.67 223.67

55296× 864 94.03 13.09 0.65 61.55

221184× 216 24.01 10.99 0.58 13.84

Table 2 shows the speedup values when compu-

ting

B. These results demonstrate that the sparse im-

plementation is about two times faster than the full

implementation in CPU. Regarding the GPU imple-

mentations, the speedup values were between 5.21

and 2.05, and the best results were obtained when

k = 216. Since 216 is not a multiple of 32, the GPU

computing capacities are not fully used (Barrachina

et al., 2008), and better speedup values shall be ex-

pected for other values of k. The computation of

B is

faster in GPU, despite that V

E is faster in CPU.

Table 3 summarizes the speed of the LRR imple-

mentations regarding the number of frames per sec-

ond (fps) that they are able to compute.

The results in Table 3 show that the minimum fps

rate in CPU for the sparse LRR implementation (23

fps) is twice the value of the full LRR implementa-

tion from the previous work. In GPU, the minimum

speed largely increased from 23 fps to 116 fps with

the sparse LRR implementation. For the small scene

Table 2: Speedup when computing

dimension

fullCPU

sparseCPU

fullGPU

sparseGPU

sparseCPU

sparseGPU

fullCPU

sparseGPU

3456× 216 1.95 4.50 4.40 8.60

3456× 864 1.69 2.38 6.48 10.93

13824× 216 1.91 5.21 5.50 10.53

3456× 3456 1.43 2.05 5.70 8.17

13824× 864 1.88 3.41 9.73 18.32

55296× 216 1.90 5.04 5.58 10.61

13824× 3456 1.74 3.59 11.34 19.67

55296× 864 1.95 3.62 10.95 21.34

221184× 216 2.05 5.04 5.08 10.43

Table 3: Speed (fps) of the LRR implementations.

dimension

CPU GPU

full sparse full sparse

3456× 216 814 1591 1556 7000

3456× 864 221 372 1014 2414

13824× 216 196 374 395 2059

3456× 3456 70 100 278 569

13824× 864 51 96 273 933

55296× 216 47 90 100 504

13824× 3456 14 24 76 275

55296× 864 12 23 70 252

221184× 216 11 23 23 116

Average 159 299 421 1569

with dimension 3456×216, a large fps rate of 7000

was achieved in GPU. Overall, the average speed val-

ues duplicate in CPU and quadruplicate in GPU, when

compared the speed of the full LRR implementation.

5.4 Real-time LRR for Larger Scenes

The computational platform with the NVIDIA 9800

GTX+ GPU does not allow to scale up for computing

the radiosity of scenes with a million polygons, due

to its memory limitations. In order to show how the

hardware evolution helps to solve the radiosity pro-

blem, six new larger scenes up to more than 3.5 mi-

llion elements were solved in a Tesla GPU platform.

Table 4 presents the speed results (in fps) when

using the Tesla platform to compute the radiosity

for the three largest instances computed with the

NVIDIA 9800 GTX+ GPU, and the speedup compa-

rison with the Table 3. This analysis compares the ca-

pability of computing the radiosity in real-time using

the new hardware. The fps rates in Table 4 shows

that a signiﬁcant speed improvement is obtained with

the Tesla platform: it allows performing the radiosity

computation in real-time, while speedup values up to

3.20 in CPU and 1.64 in GPU were achieved.

Table 5 summarizes the speed of the radiosity cal-

culation for six new larger scenes up to 3.5 million

elements in the Tesla platform. The scene with di-

LOW-RANK RADIOSITY USING SPARSE MATRICES

265

Table 4: Comparative speed (fps) in the Tesla platform.

dimension

speed (fps) speedup

CPU GPU CPU GPU

13824× 3456 74.07 297.87 3.06 1.09

55296× 864 73.61 301.72 3.20 1.20

221184× 216 72.92 190.74 3.18 1.64

mension 1769472×216 requires 8 times more mem-

ory than the largest scenarios in the previous subsec-

tion. Each of the other ﬁve scenes requires 16 times

more memory than the largest scenarios in the pre-

vious subsection.

Table 5: Speed (fps) for larger scenes in the Tesla platform.

dimension speed in CPU speed in GPU

1769472×216 8.15 24.05

221184× 3456 4.94 21.81

442368× 1728 4.94 20.59

884736× 864 4.55 18.67

1769472×432 4.10 15.45

3538944×216 4.00 11.48

The results in Table 5 demonstrate that the radio-

sity computation using the sparse LRR method can

be performed in real-time in GPU, while none of the

studied scenes can be computed in real-time in CPU.

The GPU implementation was able to achieve a rea-

sonable level of speed (around 20 fps) for scenes up

to 1.7 million elements. A signiﬁcant speed reduc-

tion was detected when solving the scene with more

than 3.5 million elements, suggesting that there is still

room to improve the proposed method in order to face

even more complex scenarios.

Figure 4 show a wireframe view and three

frames of an animated sequence for a patio build-

ing model, where the emission lights change

for each frame. The complete video can be seen at

www.ﬁng.edu.uy/inco/grupos/cecal/cg/LRRSM.wmv.

In the patio building (Figure 4), the dimension (n×k)

is equal to 24128×1364 and the frame rate achieved

is of 430 fps.

6 CONCLUSIONS AND FUTURE

WORK

This work introduces the utilization of sparse matri-

ces in the low-rank radiosity technique. Four imple-

mentations of the LRR method (full/sparse matrices

on CPU/GPU architectures) have been developed to

solve the radiosity problem, and several examples of

sparse LRR calculation have been implemented on a

Figure 4: Patio building model. A wireframe view, and

three different frames of an animated sequence.

powerful GPU platform for greater n×k scenarios.

The main contribution of this work consists in the

inclusion of a sparse matrix implementation into the

LRR method. By reordering and grouping the in-

formation, V

is transformed into an index vector I

which requires only O(k) memory to store the same

information. As a result, the averaged acceleration

factor rises to four in GPU architectures, due to the

implementation of V

E product with O(n) sums.

The proposed method was able to solve the ra-

diosity calculation in real-time for scenes with more

than 1.7 million elements in a Tesla GPU platform.

GRAPP 2012 - International Conference on Computer Graphics Theory and Applications

266

This result shows the degree in which the sparse LRR

method has a better performance than the full LRR

method.

Two main lines of future work can be directly in-

ferred from this paper: to further improve the efﬁ-

ciency of the proposed method, and to develop new

ways to build U

factorizations. Regarding the

ﬁrst line of work, the speed of the radiosity calcula-

tion shall be reduced when using a hybrid strategy of

simultaneous computation in CPU and GPU, taking

advantage of the fast computation of the product V

in CPU. Respecting the second line of work, further

research must be performed to know the relation be-

tween low-rank approximations of (RF) and the rel-

ative error of

B. Preliminary results have shown that

in many situations, SVD it is not the optimal factori-

zation for the radiosity problem, because other factor-

izations (such as those presented in this paper) gene-

rate radiosity results with lower relative error.

Finally, another area of future work consists into

face the problem of processing scenes with dynamic

geometry in order to extend the applicability of this

new technique.

ACKNOWLEDGEMENTS

This work was partially funded by Programa de De-

sarrollo de las Ciencias B´asicas, Uruguay, and by the

TIN2010-20590-C02-02 project from Ministerio de

Educaci´on y Ciencia, Spain.

REFERENCES

Barrachina, S., Castillo, M., Igual, F., Mayo, R., and

Quintana-Orti, E. (2008). Evaluation and tuning of

level 3 CUBLAS for graphics processors. In Work-

shop on Parallel and Distributed Scientiﬁc and Engi-

neering Computing, pages 1–8, Miami, EE.UU.

Barrett, R., Berry, M., Chan, T. F., Demmel, J., Donato,

J., Dongarra, J., Eijkhout, V., Pozo, R., Romine, C.,

and der Vorst, H. V. (1994). Templates for the Solu-

tion of Linear Systems: Building Blocks for Iterative

Methods, 2nd Edition. SIAM, Philadelphia, PA.

Cohen, M., Greenberg, D., Immel, D., and Brock, P. (1986).

An efﬁcient radiosity approach for realistic image syn-

thesis. IEEE Comput. Graph. Appl., 6:26–35.

Cohen, M., Wallace, J., and Hanrahan, P. (1993). Radiosity

and realistic image synthesis. Academic Press Profes-

sional, Inc., San Diego, CA, USA.

Cohen, M. F., Chen, S. E., Wallace, J. R., and Greenberg,

D. P. (1988). A progressive reﬁnement approach to

fast radiosity image generation. SIGGRAPH Comput.

Graph., 22:75–84.

Dutre, P., Bala, K., Bekaert, P., and Shirley, P. (2006). Ad-

vanced Global Illumination. A. K. Peters, Ltd.

Fasshauer, G. E. (2006). Meshfree methods. In Hand-

book of Theoretical and Computational Nanotechnol-

ogy. American Scientiﬁc Publishers, pages 33–97.

Fern´andez, E. (2009). Low-rank radiosity. In Proceedings

IV Iberoamerican Symposium in Computer Graphics,

pages 55–62, Isla Margarita, Venezuela.

Golub, G. and Loan, C. V. (1996). Matrix Computations.

The Johns Hopkins University Press.

Goreinov, S., Tyrtyshnikov, E., and Zamarashkin, N.

(1997). A theory of pseudoskeleton approximations.

Linear Algebra and its Applications, 261:1–21.

Gortler, S. J., Schr¨oder, P., Cohen, M. F., and Hanrahan, P.

(1993). Wavelet radiosity. In Proceedings of the 20th

annual conference on Computer graphics and inter-

active techniques, SIGGRAPH ’93, pages 221–230,

New York, NY, USA. ACM.

Hanrahan, P., Salzman, D., and Aupperle, L. (1991). A

rapid hierarchical radiosity algorithm. SIGGRAPH

Comput. Graph., 25:197–206.

Keller, A. (1997). Instant radiosity. In Proceedings of

the 24th annual conference on computer graphics and

interactive techniques, pages 49–56, New York, NY,

USA. ACM Press/Addison-Wesley Publishing Co.

Kontkanen, J., Turquin, E., Holzschuch, N., and Sillion, F.

(2006). Wavelet radiance transport for interactive in-

direct lighting. In Heidrich, W. and Akenine-M¨oller,

T., editors, Rendering Techniques 2006 (Eurographics

Symposium on Rendering). Eurographics.

Laine, S., Saransaari, H., Kontkanen, J., Lehtinen, J., and

Aila, T. (2007). Incremental instant radiosity for real-

time indirect illumination. In Proceedings of Euro-

graphics Symposium on Rendering 2007, pages 277–

286. Eurographics Association.

Lawson, C. L., Hanson, R. J., Kincaid, D. R., and Krogh,

F. T. (1979). Basic Linear Algebra Subprograms for

Fortran Usage. ACM Transactions on Mathematical

Software, 5(3):308–323.

Lehtinen, J., Zwicker, M., Turquin, E., Kontkanen, J., Du-

rand, F., Sillion, F. X., and Aila, T. (2008). A meshless

hierarchical representation for light transport. ACM

Trans. Graph., 27:37:1–37:9.

NVIDIA (2008). CUDA CUBLAS Library. NVIDIA Cor-

poration, Santa Clara, California.

Press, W., Teukolsky, S., Vetterling, W., and Flannery, B.

(2007). Numerical Recipes: The Art of Scientiﬁc Com-

puting. Cambridge University Press.

Sloan, P., Kautz, J., and Snyder, J. (2002). Precomputed

radiance transfer for real-time rendering in dynamic,

low-frequency lighting environments. ACM Transac-

tions on Graphics, 21:527–536.

Sutherland, I., Sproull, R., and Schumacker, R. (1974).

A characterization of ten hidden-surface algorithms.

ACM Computing Surveys, 6:1–55.

Wang, R., Wang, R., Zhou, K., Pan, M., and Bao, H.

(2009). An efﬁcient GPU-based approach for interac-

tive global illumination. In SIGGRAPH International

Conference and Exhibition on Computer Graphics

and Interactive Techniques, pages 1–8, New York,

NY, USA. ACM.

LOW-RANK RADIOSITY USING SPARSE MATRICES

267