USING PHOTOMETRIC STEREO TO REFINE THE GEOMETRY OF

A 3D SURFACE MODEL

Zsolt Jank

Computer and Automation Research Institute and

otv

os Lor

and University, Budapest, Hungary

Keywords:

Surface reconstruction, Bumpiness recovery, Calibration–estimation problem, RANSAC.

Abstract:

In this paper we aim at reﬁning the geometry of 3D models of real objects by adding surface bumpiness

to them. 3D scanners are usually not accurate enough to measure ﬁne details, such as surface roughness.

Photometric stereo is an appropriate technique to recover bumpiness. We use a number of images taken from

the same viewpoint under varying illumination and an initial sparse 3D mesh obtained by a 3D scanner. We

assume the surface to be Lambertian, but the lighting properties are unknown. The novelty of our method is

that the initial sparse 3D mesh is exploited to calibrate light sources and then to recover surface normals. The

importance of reﬁning the geometry of a bumpy surface is demonstrated by applying the method to synthetic

and real data.

1 INTRODUCTION

Creating photorealistic 3D models of real objects is a

challenging problem. For realistic appearance precise

geometry is essential. 3D laser scanners are usually

used to acquire the geometry of real objects; however,

most of these scanners are not accurate enough to

measure ﬁne details, such as surface roughness. Re-

alistic appearance of a model is signiﬁcantly reduced

when 3D roughness is not sufﬁciently represented, or

when the lack of the roughness is concealed by tex-

tures.

Photometric stereo is a popular ﬁeld of computer

vision aiming at recovering surface orientation from

images. The essence of photometric methods is to

calculate surface normals from the changing of the in-

tensity through altering the incoming light. The great

advantage of these techniques is their accuracy, which

lets us measure ﬁne details, small bumps on the sur-

face.

In this paper we present a novel method based on

photometric stereo. As input the method obtains a

number of images taken from the same viewpoint un-

der varying illumination and a sparse 3D mesh cap-

tured by a 3D scanner. We assume that the surface

is Lambertian and that the size of the object is small

relative to its distance to the light source. The camera

is assumed to be calibrated, but lighting properties are

unknown. The novelty of our method is that the initial

sparse 3D mesh is exploited to calibrate light sources.

The problem is decomposed in two parts: ﬁrst, light-

ing properties are estimated using the given initial ge-

ometry; second, geometry is reﬁned using the already

calibrated light sources.

Related Work

The main goal of photometric stereo is to estimate

surface orientation from a number of images, where

the direction of incident illumination varies, but the

camera is ﬁxed. In the most general situation we have

information neither about the camera and the light

sources nor about surface reﬂectance. However, in

this case surface orientation cannot be perfectly ex-

tracted. One needs to set appropriate assumptions

about the environment or the surface to obtain precise

results.

A general assumption is that camera and light

sources are calibrated. When the surface is con-

sidered to be Lambertian (Woodham, 1978; Rush-

meier et al., 1997), normals of surface points can be

easily extracted by solving a linear system of equa-

355

Jankó Z. (2007).

USING PHOTOMETRIC STEREO TO REFINE THE GEOMETRY OF A 3D SURFACE MODEL.

In Proceedings of the Second International Conference on Computer Vision Theory and Applications - IU/MTSV, pages 355-362

 SciTePress

tions. A surface with more complex, spatially-varying

BRDF (Bidirectional Reﬂection Distribution Func-

tion) makes recovery more difﬁcult.

In their papers (Lensch et al., 2001; Lensch et al.,

2003) Lensch et al. also discuss the problem of BRDF

acquisition from photographic data. Their work re-

lates to ours as they require scanned geometry as in-

put. They cluster material estimates over the surface,

then use the known BRDF to reﬁne surface normals.

The main differences between their approach and ours

are that we assume Lambertian surface while they can

handle spatially-varying BRDFs; on the other hand,

they need to exactly know illumination properties to

recover BRDFs, while we use uncalibrated lights.

If illumination is unknown, one needs to set suit-

able constraints on surface reﬂectance and geome-

try. When Lambertian reﬂectance is assumed (Zhang

et al., 2003; Yuille et al., 2003), a few constraints are

sufﬁcient, but assuming more complex or unknown

reﬂection models (Yuille et al., 1999; Drbohlav and

Sara, 2001; Lim et al., 2005) requires further restric-

tions on the geometry and the environment. Typical

assumptions are: the radiances of light sources are

constant or known; BRDF is constant over the whole

surface; surface normals are integrable, i.e., they form

a consistent vector ﬁeld.

Note that most of the techniques mentioned above

apply the integrability constraint that forces the re-

covered normals to form a continuous surface. In

most cases this assumption is reasonable, but in our

case, when a smoothed continuous surface is given

and roughness is to be recovered, this constraint is not

applicable.

The structure of this paper is the following. First,

we formalise the problem of surface bumpiness recov-

ery, then the proposed method is discussed. Finally,

the results of the method for synthetic and real data

are presented.

2 BUMPINESS RECOVERY

2.1 Problem Formulation

Consider the input setup of m images and a surface

mesh of the same object. The images are supposed to

have been taken from the same view with ﬁxed and

known camera parameters but under varying and un-

known lighting conditions. The mesh consists of a set

P of 3D points and a set N of corresponding normals.

The surface is assumed to be Lambertian.

The goal is to reﬁne the surface and determine

local bumpiness by extracting a dense and accurate

normal map. According to Lambert’s cosine law, in-

tensity in surface point X under lighting condition la-

belled by i ∈ [1 ..m] can be calculated as

I(X, i) =

(X)n(X)

ρ(i)l(X, i), (1)

where a

(X) denotes the albedo, n(X) the surface nor-

mal in surface point X, while ρ(i) is the radiance,

l(X, i) the direction to the light source i from point

X. Vectors n(X) and l(X,i) are unit length.

We wish to solve equation (1) when I(X,i) is

known and we have a rough approximation for n(X)

if X ∈P . Albedos and light source attributes are com-

pletely unknown. The goal is to estimate the positions

and the radiances of the light sources, as well as to ex-

tract “pixel-dense” normal and albedo maps.

Let us modify the notations of (1) and extend it

over the whole image domain. Eliminating surface

points that are invisible from the camera, a one-to-one

correspondence exists between remaining 3D surface

points and image points, thus the functions are modi-

ﬁed to depend not on 3D points X, but on image points

u. Furthermore, the functions of parameter i are sub-

stituted by m independent functions. For instance, I

denotes image i taken under lighting conditions la-

belled by i; it is a function of image points u. Equa-

tion (1) is then modiﬁed as

(u) =

(u)n(u)

(u), i ∈ [1 ..m], (2)

where u is in the domain R

of image i. Note that

the ﬁxed camera means that all images have the same

domain, denoted by R. Practically, domain R is re-

stricted to contain only such image points that the

corresponding 3D points are visible from all light

sources. This domain can easily be determined from

the input images.

According to (2), the problem is posed as the min-

imisation of the following function:

φ(a

,n,ρ

,...,ρ

) =

∑

i=1

∑

u∈R



(u) −

(u)n(u)

(u)



. (3)

We decompose the problem in two steps: ﬁrst,

normals n ∈N , which are only rough approximations

of real normals, are used to calibrate light sources;

second, the calibrated light sources are used to create

dense albedo and normal maps.

2.2 Calibration–Estimation Problem

In (3) the direction l

(u) of light source i from surface

point X(u) is a function of u, which makes it more

VISAPP 2007 - International Conference on Computer Vision Theory and Applications

356

difﬁcult to estimate the light source’s position. For-

tunately, in most cases the size of the object is small

compared to the distance between the object and the

light source, thus direction l does not change signiﬁ-

cantly over the surface. Substituting X by the centroid

of the surface is therefore reasonable, which implies

that direction l

is constant and depends on u no more.

Consequently, the problem of calibrating light sources

using normals n ∈ N can be deﬁned as the minimisa-

tion of

,ρ

,...,ρ

) =

∑

i=1

∑

X(u)∈P



(u) −

(u)n(u)



. (4)

Estimates of ρ

and l

are possible to ﬁnd up to

a constant linear transformation using the Singular

Value Decomposition (SVD). For this, let us collect

intensities I

), i ∈ [1 ..m] in the k ×m matrix D,

where k =

is the number of points of the surface

mesh and u

, j ∈[1 ..k], is a series of image points for

which X(u

) ∈P . Using the notations of (Koenderink

and Van Doorn, 1997), estimates of point properties

(such as albedo and normal vector) are collected in the

k ×3 matrix E, and calibration data of light sources

are in the m ×3 matrix C. Thus, the calibration–

estimation problem deﬁned in (4) is equivalent to the

problem of factorising matrix D into E C

, which re-

quires also rank reduction, since the rank of D is usu-

ally greater than 3.

As discussed in (Koenderink and Van Doorn,

1997; Yuille et al., 1999), SVD decomposes matrix

D in the form D = UWV

, where W contains the sin-

gular values of D. If our model is correct, the rank of

matrix D must be 3, i.e., W must have only 3 nonzero

elements. Even if this usually does not hold, SVD

is still guaranteed to provide the best least squares

solution for factorisation and rank reduction. For

mostly Lambertian objects the three largest eigenval-

ues and the corresponding columns of U and V rep-

resent the Lambertian component of the reﬂectance

function (Yuille et al., 1999). If W

is the 3 ×3 diag-

onal matrix containing only the 3 largest singular val-

ues from W , and U

) is a k ×3 (m ×3) matrix with

the columns of U (V ) corresponding to the 3 largest

singular values, then E = U

√

and C = V

√

Now D ≈ EC

Although the above solution seems convenient, it

may fail to be a desirable one: for an invertible 3 ×3

matrix A, EA and CA

−T

is also a solution. One has

to set appropriate constraints to restrict the number

of possible solutions. One of the most frequently

used constraints is the so called integrability con-

straint (Yuille et al., 1999), where integrability con-

ditions are used to ensure that surface normals form

a consistent surface. It assumes smooth surface with-

out roughness. However, we already have a smooth

surface approximating the real one, and bumpiness is

what we wish to extract, thus integrability constraint

cannot be applied. Instead of it, other constraints are

used as discussed below.

2.3 Support Function Based on Feasible

Constraints

Let us denote the i-th row of C by c

, the j-

th row of E by e

, and the q-th column of A

by a

. We are looking for an invertible matrix A, for

which e

A =

)n(u

)

, j ∈[1..k], and A

−1

, i ∈ [1 ..m]. Normals n(u

), j ∈[1 ..k ] are known;

they are only initial approximations of real surface

normals, but they can be used to set a constraint on A

as e

A should be collinear with n(u

)

. Formally, we

wish matrix A to satisfy equalities

)

). This constraint can be expressed

in form of a homogeneous linear system of equations

= 0, where x = [a

]

and

H =







)

−

)

−

)

−

)

···

)

−

)

−

)

−

)







Unfortunately, uniqueness of the solution is not guar-

anteed, since the dimension of the null space of the

linear transformation H can be greater than 1. In this

case the best solution of the null space needs to be

found by applying further constraints.

Two further constraints are considered: ﬁrst, albe-

dos should be greater than zero; second, the angle be-

tween the light source’s direction and the camera’s

given direction should not exceed a given threshold.

The latter is also reasonable since if the angle were

great, the surface would be mostly invisible. Thus,

one needs to search through the null space of linear

transformation H for the element which best satisﬁes

these two constraints.

Matrix A—and its vector form x—can be consid-

ered as a model ﬁtted to the given data. One needs to

deﬁne an appropriate support function that gives the

goodness of a model.

USING PHOTOMETRIC STEREO TO REFINE THE GEOMETRY OF A 3D SURFACE MODEL

357

The ﬁrst constraint, namely that the albedo is

greater than zero, is true, if the angle between the

two vectors e

A and n(u

)

is sufﬁciently small.

Formally, we say that the j-th element of the esti-

mates supports the model if ∠(e

A,n(u

)

) ≤ 1.0

◦

We note that at the calculation of the angle one should

consider that e

A is usually not unit length.

The number of the supporting normals gives a true

description to the goodness of the model, but it is use-

less without the second constraint. The light sources

should be close in angle to the camera, otherwise most

of the surface is invisible. We say that the i-th light

source supports the model if the angle between A

−1

and the given vector towards the camera is below a

threshold. Our experience shows that this threshold

should be set between 30

◦

and 45

◦

The support of a model is given by the number

of supporting normals and supporting light sources.

However, light sources should get higher priority,

since loss of a light source incurs an invalid model

with larger probability than loss of a normal. Hence

the support of a model x is calculated as follows:

S(x) = q ·Nof(supporting lights)

+ Nof(supporting normals), (5)

where Nof is Number of, and q = k /2 is a reasonable

choice.

2.4 Applying Ransac

The method described above is not robust. Experi-

ence shows that inaccuracy of approximated normals

and noise of images lead to unreasonable results. Our

method uses least squares, it is therefore very sensi-

tive to outliers appearing among normals. We decided

to apply the RANSAC (RANdom SAmple Consen-

sus) algorithm of Fischler and Bolles (Fischler and

Bolles, 1981), which is a general and efﬁcient robust

estimator.

The idea is the following: instead of using all nor-

mals from N , a subset N

⊂ N is randomly selected

and the method is executed only on N

. This is re-

peated a number of times for various N

. For each

a consensus set N

is formed by the supporting

normals. The consensus set deﬁned by the best sam-

ple gives the inliers of N . Finally, the model is re-

estimated using this consensus set. Informally, the

technique uses the fact that if



is small, and the

selection is repeated sufﬁcient times, then the prob-

ability of that a subset without any outliers has been

selected is great. We have chosen to set



to 5,

which is a little greater than the necessary 3, in order

to avoid frequent under-determinedness. The num-

ber of iterations and the minimal size of an acceptable

consensus set can be calculated from the estimated in-

cidence rate of outliers, as discussed in (Hartley and

Zisserman, 2000).

Notice that the above problem can be considered

as clustering. Normals of N approximate real nor-

mals of the surface. We suppose to have a smooth

surface and want to recover the real bumpy surface.

Most of the normals in N approximate wrongly (de-

note this subset by N

) and only a few of them ap-

proximate well (denote by N

). However, correct nor-

mals are consistent, which helps us ﬁnd their cluster.

Consistency means that if N

is chosen to be a sub-

set of N

, then the result is correct and satisﬁes the

constraints. On the other hand, if N

contains one or

more normals from N

, outliers deteriorate the result.

This is also the reason why further ambigui-

ties (including the Generalised Bas-Relief Ambigu-

ity (Belhumeur et al., 1999)) are avoided, in contrast

to other methods, e.g., (Yuille et al., 2003). It is

known in photometric stereo that view sources can be

precisely calibrated, without any ambiguities if sur-

face normals are known. We do not have accurate

normals, but we have a set N of normals that has a

subset of correct normals. And this subset is deter-

mined by the method described above.

2.5 Extracting Normals

After calibrating light sources, normals can be deter-

mined in each point u as follows. Consider again the

formula of the Lambertian reﬂection model: I

(u) =

(u)n(u)

, i ∈ [1..m]. Due to the light source

calibration, ρ

and l

are known, thus a

(u) and n(u)

can easily be calculated, separately for each image

point u, by solving the over-determined linear system

of equations By

= b

, where

B =





···





and b





(u)

···

(u)





Since n(u) is assumed to be unit length, a

(u) =

πky

k and n(u) =

3 RESULTS

To demonstrate the efﬁciency, we have tested the

method both on synthetic and real data. The syn-

thetic dataset contains a 3D mesh of a bumpy sphere,

a smoothed mesh without bumpiness, and ﬁve images

of the bumpy sphere taken using ﬁve different light

sources. For each image one of the light sources was

turned on, while the others were turned off.

VISAPP 2007 - International Conference on Computer Vision Theory and Applications

358

The method was run with the input of the

smoothed mesh and the ﬁve images, yielding a dense

normal map. The estimated normal map was then

compared to the ground truth normal map: the an-

gles between the corresponding normals were mea-

sured, in degrees, and the mean of the angles gave

the magnitude of the error. Since the method is non-

deterministic because of RANSAC, it was run 10

times resulting the error of 1.85

◦

±0.1

◦

that is sig-

niﬁcantly smaller than the error of the normals of

the input smoothed mesh, which is 7.55

◦

. Note that

the latter is also not too big. This is because the

mean of the angles was considered: the surface had

only a relatively small bumpy part, while larger parts

of the surface were smooth. Obviously, normals of

smooth parts were approximated precisely either be-

fore and after photometric estimation, and the small

errors (≈ 1

◦

) of them prevented the mean from grow-

ing too large.

To demonstrate the efﬁciency of the method more

clearly, the pictures of the normal maps are also pre-

sented. Fig. 1 contains the ground truth and the es-

timated normal maps, as well as their difference be-

fore and after applying the method. Pixel intensities

of the normal maps represent the deviation of the nor-

mal vectors from a ﬁxed unit vector. These intensities

are calculated as 255 −3n

, where n

is the angular

deviation in degrees. Pixel intensities of the differ-

ence maps show the errors in angle: when the error is

degrees, pixel intensity is 128 + 2n

. The bumps

are clearly visible in the difference map before using

the method, but almost perfectly disappear after that.

We have tested the method for real datasets, as

well. The ﬁrst dataset consists of the 3D mesh of a

Plaquette and seven images of the object taken from

the same viewpoint under varying illumination. To

provide lighting, a simple table-lamp was used and

moved in space. Fig. 2 shows three of the input im-

ages, the input 3D model and the resulting normal

map.

The second dataset consists of a Frog model and

ﬁve input images. (See Fig. 3.) Although it is hard to

evaluate the result based on the map presented, com-

paring it to the input images shows that the locations

of the bumps are precise.

The third dataset (Fig. 4) contain a wooden object

in the shape of a Bottle. The dataset consists of the 3D

mesh and seven images about the object. The results

of the method applied for the three datasets demon-

strate that our technique is suitable for detecting even

ﬁne roughness.

4 CONCLUSION

In this paper we presented a novel method to reﬁne

the geometry of 3D models of real objects. The pho-

tometric stereo based method uses a number of input

images taken from the same viewpoint under vary-

ing lighting, supplemented with an initial sparse 3D

mesh. The surface reﬂectance is assumed to be Lam-

bertian, but normals, albedos and lighting properties

are unknown.

The method solves the problem in two steps. First,

the initial surface mesh is used to calibrate light

sources. This problem is traced back to the well

known calibration–estimation problem, and the so-

lution is robustiﬁed by applying the RANSAC algo-

rithm. Second, dense normal and albedo maps are ex-

tracted using the calibrated setup. The effectiveness

of the method was demonstrated both on synthetic and

real data.

ACKNOWLEDGEMENTS

The author thank Dmitry Chetverikov and Levente

Hajder for their comments on this paper. This work

is supported by EU Network of Excellence MUSCLE

(FP6-507752).

REFERENCES

Belhumeur, P. N., Kriegman, D. J., and Yuille, A. L. (1999).

The bas-relief ambiguity. International Journal of

Computer Vision, 35(1):33–44.

Drbohlav, O. and

Sara, R. (2001). Unambigous determina-

tion of shape from photometric stereo with unknown

light sources. In Proc. 8

IEEE International Confer-

ence on Computer Vision, pages 581–586.

Fischler, M. A. and Bolles, R. C. (1981). Random sample

consensus: a paradigm for model ﬁtting with appli-

cations to image analysis and automated cartography.

Communications of the ACM, 24(6):381–395.

Hartley, R. and Zisserman, A. (2000). Multiple View Geom-

etry in Computer Vision. Cambridge University Press.

Koenderink, J. J. and Van Doorn, A. J. (1997). The generic

bilinear calibration-estimation problem. International

Journal of Computer Vision, 23(3):217–234.

Lensch, H. P., Kautz, J., Goesele, M., Heidrich, W., and Sei-

del, H.-P. (2001). Image-based reconstruction of spa-

tially varying materials. In Proc. 12

Eurographics

Workshop on Rendering Techniques, pages 103–114.

Lensch, H. P., Kautz, J., Goesele, M., Heidrich, W., and

Seidel, H.-P. (2003). Image-based reconstruction of

spatial appearance and geometric detail. ACM Trans-

actions on Graphics, 22(2):234–257.

USING PHOTOMETRIC STEREO TO REFINE THE GEOMETRY OF A 3D SURFACE MODEL

359

Lim, J., Ho, J., Yang, M.-H., and Kriegman, D. (2005). Pas-

sive photometric stereo from motion. In Proc. 10

IEEE International Conference on Computer Vision,

pages 1635–1642.

Rushmeier, H., Taubin, G., and Gu

eziec, A. (1997). Apply-

ing shape from lighting variation to bump map cap-

ture. In Proc. 8

Eurographics Rendering Workshop,

pages 35–44.

Woodham, R. J. (1978). Photometric stereo: A reﬂectance

map technique for determining surface orientation

from image intensity. In Image Understanding Sys-

tems and Industrial Applications, Proc. SPIE, volume

155, pages 136–143.

Yuille, A., Coughlan, J. M., and Konishi, S. (2003). KGBR

viewpoint–lighting ambiguity. Journal of the Optical

Society of America A, 20(1):24–31.

Yuille, A. L., Snow, D., Epstein, R., and Belhumeur, P. N.

(1999). Determining generative models of objects un-

der varying illumination: Shape and albedo from mul-

tiple images using SVD and integrability. Interna-

tional Journal of Computer Vision, 35(3):203–222.

Zhang, L., Curless, B., Hertzmann, A., and Seitz, S. M.

(2003). Shape and motion under varying illumination:

Unifying structure from motion, photometric stereo,

and multi-view stereo. In Proc. 9

IEEE International

Conference on Computer Vision, pages 618–625.

VISAPP 2007 - International Conference on Computer Vision Theory and Applications

360