USING PHOTOMETRIC STEREO TO REFINE THE GEOMETRY OF
A 3D SURFACE MODEL
Zsolt Jank
´
o
Computer and Automation Research Institute and
E
¨
otv
¨
os Lor
´
and University, Budapest, Hungary
Keywords:
Surface reconstruction, Bumpiness recovery, Calibration–estimation problem, RANSAC.
Abstract:
In this paper we aim at refining the geometry of 3D models of real objects by adding surface bumpiness
to them. 3D scanners are usually not accurate enough to measure fine details, such as surface roughness.
Photometric stereo is an appropriate technique to recover bumpiness. We use a number of images taken from
the same viewpoint under varying illumination and an initial sparse 3D mesh obtained by a 3D scanner. We
assume the surface to be Lambertian, but the lighting properties are unknown. The novelty of our method is
that the initial sparse 3D mesh is exploited to calibrate light sources and then to recover surface normals. The
importance of refining the geometry of a bumpy surface is demonstrated by applying the method to synthetic
and real data.
1 INTRODUCTION
Creating photorealistic 3D models of real objects is a
challenging problem. For realistic appearance precise
geometry is essential. 3D laser scanners are usually
used to acquire the geometry of real objects; however,
most of these scanners are not accurate enough to
measure fine details, such as surface roughness. Re-
alistic appearance of a model is significantly reduced
when 3D roughness is not sufficiently represented, or
when the lack of the roughness is concealed by tex-
tures.
Photometric stereo is a popular field of computer
vision aiming at recovering surface orientation from
images. The essence of photometric methods is to
calculate surface normals from the changing of the in-
tensity through altering the incoming light. The great
advantage of these techniques is their accuracy, which
lets us measure fine details, small bumps on the sur-
face.
In this paper we present a novel method based on
photometric stereo. As input the method obtains a
number of images taken from the same viewpoint un-
der varying illumination and a sparse 3D mesh cap-
tured by a 3D scanner. We assume that the surface
is Lambertian and that the size of the object is small
relative to its distance to the light source. The camera
is assumed to be calibrated, but lighting properties are
unknown. The novelty of our method is that the initial
sparse 3D mesh is exploited to calibrate light sources.
The problem is decomposed in two parts: first, light-
ing properties are estimated using the given initial ge-
ometry; second, geometry is refined using the already
calibrated light sources.
Related Work
The main goal of photometric stereo is to estimate
surface orientation from a number of images, where
the direction of incident illumination varies, but the
camera is fixed. In the most general situation we have
information neither about the camera and the light
sources nor about surface reflectance. However, in
this case surface orientation cannot be perfectly ex-
tracted. One needs to set appropriate assumptions
about the environment or the surface to obtain precise
results.
A general assumption is that camera and light
sources are calibrated. When the surface is con-
sidered to be Lambertian (Woodham, 1978; Rush-
meier et al., 1997), normals of surface points can be
easily extracted by solving a linear system of equa-
355
Jankó Z. (2007).
USING PHOTOMETRIC STEREO TO REFINE THE GEOMETRY OF A 3D SURFACE MODEL.
In Proceedings of the Second International Conference on Computer Vision Theory and Applications - IU/MTSV, pages 355-362
Copyright
c
SciTePress
tions. A surface with more complex, spatially-varying
BRDF (Bidirectional Reflection Distribution Func-
tion) makes recovery more difficult.
In their papers (Lensch et al., 2001; Lensch et al.,
2003) Lensch et al. also discuss the problem of BRDF
acquisition from photographic data. Their work re-
lates to ours as they require scanned geometry as in-
put. They cluster material estimates over the surface,
then use the known BRDF to refine surface normals.
The main differences between their approach and ours
are that we assume Lambertian surface while they can
handle spatially-varying BRDFs; on the other hand,
they need to exactly know illumination properties to
recover BRDFs, while we use uncalibrated lights.
If illumination is unknown, one needs to set suit-
able constraints on surface reflectance and geome-
try. When Lambertian reflectance is assumed (Zhang
et al., 2003; Yuille et al., 2003), a few constraints are
sufficient, but assuming more complex or unknown
reflection models (Yuille et al., 1999; Drbohlav and
ˇ
Sara, 2001; Lim et al., 2005) requires further restric-
tions on the geometry and the environment. Typical
assumptions are: the radiances of light sources are
constant or known; BRDF is constant over the whole
surface; surface normals are integrable, i.e., they form
a consistent vector field.
Note that most of the techniques mentioned above
apply the integrability constraint that forces the re-
covered normals to form a continuous surface. In
most cases this assumption is reasonable, but in our
case, when a smoothed continuous surface is given
and roughness is to be recovered, this constraint is not
applicable.
The structure of this paper is the following. First,
we formalise the problem of surface bumpiness recov-
ery, then the proposed method is discussed. Finally,
the results of the method for synthetic and real data
are presented.
2 BUMPINESS RECOVERY
2.1 Problem Formulation
Consider the input setup of m images and a surface
mesh of the same object. The images are supposed to
have been taken from the same view with fixed and
known camera parameters but under varying and un-
known lighting conditions. The mesh consists of a set
P of 3D points and a set N of corresponding normals.
The surface is assumed to be Lambertian.
The goal is to refine the surface and determine
local bumpiness by extracting a dense and accurate
normal map. According to Lambert’s cosine law, in-
tensity in surface point X under lighting condition la-
belled by i [1 ..m] can be calculated as
I(X, i) =
1
π
a
d
(X)n(X)
T
ρ(i)l(X, i), (1)
where a
d
(X) denotes the albedo, n(X) the surface nor-
mal in surface point X, while ρ(i) is the radiance,
l(X, i) the direction to the light source i from point
X. Vectors n(X) and l(X,i) are unit length.
We wish to solve equation (1) when I(X,i) is
known and we have a rough approximation for n(X)
if X P . Albedos and light source attributes are com-
pletely unknown. The goal is to estimate the positions
and the radiances of the light sources, as well as to ex-
tract “pixel-dense” normal and albedo maps.
Let us modify the notations of (1) and extend it
over the whole image domain. Eliminating surface
points that are invisible from the camera, a one-to-one
correspondence exists between remaining 3D surface
points and image points, thus the functions are modi-
fied to depend not on 3D points X, but on image points
u. Furthermore, the functions of parameter i are sub-
stituted by m independent functions. For instance, I
i
denotes image i taken under lighting conditions la-
belled by i; it is a function of image points u. Equa-
tion (1) is then modified as
I
i
(u) =
1
π
a
d
(u)n(u)
T
ρ
i
l
i
(u), i [1 ..m], (2)
where u is in the domain R
i
of image i. Note that
the fixed camera means that all images have the same
domain, denoted by R. Practically, domain R is re-
stricted to contain only such image points that the
corresponding 3D points are visible from all light
sources. This domain can easily be determined from
the input images.
According to (2), the problem is posed as the min-
imisation of the following function:
φ(a
d
,n,ρ
1
,l
1
,...,ρ
m
,l
m
) =
m
i=1
uR
I
i
(u)
1
π
a
d
(u)n(u)
T
ρ
i
l
i
(u)
2
. (3)
We decompose the problem in two steps: first,
normals n N , which are only rough approximations
of real normals, are used to calibrate light sources;
second, the calibrated light sources are used to create
dense albedo and normal maps.
2.2 Calibration–Estimation Problem
In (3) the direction l
i
(u) of light source i from surface
point X(u) is a function of u, which makes it more
VISAPP 2007 - International Conference on Computer Vision Theory and Applications
356
difficult to estimate the light source’s position. For-
tunately, in most cases the size of the object is small
compared to the distance between the object and the
light source, thus direction l does not change signifi-
cantly over the surface. Substituting X by the centroid
of the surface is therefore reasonable, which implies
that direction l
i
is constant and depends on u no more.
Consequently, the problem of calibrating light sources
using normals n N can be defined as the minimisa-
tion of
φ
0
(a
d
,ρ
1
,l
1
,...,ρ
m
,l
m
) =
m
i=1
X(u)P
I
i
(u)
1
π
a
d
(u)n(u)
T
ρ
i
l
i
2
. (4)
Estimates of ρ
i
and l
i
are possible to find up to
a constant linear transformation using the Singular
Value Decomposition (SVD). For this, let us collect
intensities I
i
(u
j
), i [1 ..m] in the k ×m matrix D,
where k =
|
P
|
is the number of points of the surface
mesh and u
j
, j [1 ..k], is a series of image points for
which X(u
j
) P . Using the notations of (Koenderink
and Van Doorn, 1997), estimates of point properties
(such as albedo and normal vector) are collected in the
k ×3 matrix E, and calibration data of light sources
are in the m ×3 matrix C. Thus, the calibration–
estimation problem defined in (4) is equivalent to the
problem of factorising matrix D into E C
T
, which re-
quires also rank reduction, since the rank of D is usu-
ally greater than 3.
As discussed in (Koenderink and Van Doorn,
1997; Yuille et al., 1999), SVD decomposes matrix
D in the form D = UWV
T
, where W contains the sin-
gular values of D. If our model is correct, the rank of
matrix D must be 3, i.e., W must have only 3 nonzero
elements. Even if this usually does not hold, SVD
is still guaranteed to provide the best least squares
solution for factorisation and rank reduction. For
mostly Lambertian objects the three largest eigenval-
ues and the corresponding columns of U and V rep-
resent the Lambertian component of the reflectance
function (Yuille et al., 1999). If W
3
is the 3 ×3 diag-
onal matrix containing only the 3 largest singular val-
ues from W , and U
3
(V
3
) is a k ×3 (m ×3) matrix with
the columns of U (V ) corresponding to the 3 largest
singular values, then E = U
3
W
3
and C = V
3
W
3
.
Now D EC
T
.
Although the above solution seems convenient, it
may fail to be a desirable one: for an invertible 3 ×3
matrix A, EA and CA
T
is also a solution. One has
to set appropriate constraints to restrict the number
of possible solutions. One of the most frequently
used constraints is the so called integrability con-
straint (Yuille et al., 1999), where integrability con-
ditions are used to ensure that surface normals form
a consistent surface. It assumes smooth surface with-
out roughness. However, we already have a smooth
surface approximating the real one, and bumpiness is
what we wish to extract, thus integrability constraint
cannot be applied. Instead of it, other constraints are
used as discussed below.
2.3 Support Function Based on Feasible
Constraints
Let us denote the i-th row of C by c
T
i
, the j-
th row of E by e
T
j
, and the q-th column of A
by a
q
. We are looking for an invertible matrix A, for
which e
T
j
A =
1
π
a
d
(u
j
)n(u
j
)
T
, j [1..k], and A
1
c
i
=
ρ
i
l
i
, i [1 ..m]. Normals n(u
j
), j [1 ..k ] are known;
they are only initial approximations of real surface
normals, but they can be used to set a constraint on A
as e
T
j
A should be collinear with n(u
j
)
T
. Formally, we
wish matrix A to satisfy equalities
e
T
j
a
1
n
1
(u
j
)
=
e
T
j
a
2
n
2
(u
j
)
=
e
T
j
a
3
n
3
(u
j
)
=
1
π
a
d
(u
j
). This constraint can be expressed
in form of a homogeneous linear system of equations
Hx
= 0, where x = [a
T
1
,a
T
2
,a
T
3
]
T
and
H =
e
T
1
n
1
(u
1
)
e
T
1
n
2
(u
1
)
0
0
e
T
1
n
2
(u
1
)
e
T
1
n
3
(u
1
)
e
T
1
n
1
(u
1
)
0
e
T
1
n
3
(u
1
)
···
e
T
k
n
1
(u
k
)
e
T
k
n
2
(u
k
)
0
0
e
T
k
n
2
(u
k
)
e
T
k
n
3
(u
k
)
e
T
k
n
1
(u
k
)
0
e
T
k
n
3
(u
k
)
.
Unfortunately, uniqueness of the solution is not guar-
anteed, since the dimension of the null space of the
linear transformation H can be greater than 1. In this
case the best solution of the null space needs to be
found by applying further constraints.
Two further constraints are considered: first, albe-
dos should be greater than zero; second, the angle be-
tween the light source’s direction and the camera’s
given direction should not exceed a given threshold.
The latter is also reasonable since if the angle were
great, the surface would be mostly invisible. Thus,
one needs to search through the null space of linear
transformation H for the element which best satisfies
these two constraints.
Matrix A—and its vector form x—can be consid-
ered as a model fitted to the given data. One needs to
define an appropriate support function that gives the
goodness of a model.
USING PHOTOMETRIC STEREO TO REFINE THE GEOMETRY OF A 3D SURFACE MODEL
357
The first constraint, namely that the albedo is
greater than zero, is true, if the angle between the
two vectors e
T
j
A and n(u
j
)
T
is sufficiently small.
Formally, we say that the j-th element of the esti-
mates supports the model if (e
T
j
A,n(u
j
)
T
) 1.0
.
We note that at the calculation of the angle one should
consider that e
T
j
A is usually not unit length.
The number of the supporting normals gives a true
description to the goodness of the model, but it is use-
less without the second constraint. The light sources
should be close in angle to the camera, otherwise most
of the surface is invisible. We say that the i-th light
source supports the model if the angle between A
1
c
i
and the given vector towards the camera is below a
threshold. Our experience shows that this threshold
should be set between 30
and 45
.
The support of a model is given by the number
of supporting normals and supporting light sources.
However, light sources should get higher priority,
since loss of a light source incurs an invalid model
with larger probability than loss of a normal. Hence
the support of a model x is calculated as follows:
S(x) = q ·Nof(supporting lights)
+ Nof(supporting normals), (5)
where Nof is Number of, and q = k /2 is a reasonable
choice.
2.4 Applying Ransac
The method described above is not robust. Experi-
ence shows that inaccuracy of approximated normals
and noise of images lead to unreasonable results. Our
method uses least squares, it is therefore very sensi-
tive to outliers appearing among normals. We decided
to apply the RANSAC (RANdom SAmple Consen-
sus) algorithm of Fischler and Bolles (Fischler and
Bolles, 1981), which is a general and efficient robust
estimator.
The idea is the following: instead of using all nor-
mals from N , a subset N
1
N is randomly selected
and the method is executed only on N
1
. This is re-
peated a number of times for various N
1
. For each
N
1
a consensus set N
c
is formed by the supporting
normals. The consensus set defined by the best sam-
ple gives the inliers of N . Finally, the model is re-
estimated using this consensus set. Informally, the
technique uses the fact that if
N
1
is small, and the
selection is repeated sufficient times, then the prob-
ability of that a subset without any outliers has been
selected is great. We have chosen to set
N
1
to 5,
which is a little greater than the necessary 3, in order
to avoid frequent under-determinedness. The num-
ber of iterations and the minimal size of an acceptable
consensus set can be calculated from the estimated in-
cidence rate of outliers, as discussed in (Hartley and
Zisserman, 2000).
Notice that the above problem can be considered
as clustering. Normals of N approximate real nor-
mals of the surface. We suppose to have a smooth
surface and want to recover the real bumpy surface.
Most of the normals in N approximate wrongly (de-
note this subset by N
w
) and only a few of them ap-
proximate well (denote by N
g
). However, correct nor-
mals are consistent, which helps us find their cluster.
Consistency means that if N
1
is chosen to be a sub-
set of N
g
, then the result is correct and satisfies the
constraints. On the other hand, if N
1
contains one or
more normals from N
w
, outliers deteriorate the result.
This is also the reason why further ambigui-
ties (including the Generalised Bas-Relief Ambigu-
ity (Belhumeur et al., 1999)) are avoided, in contrast
to other methods, e.g., (Yuille et al., 2003). It is
known in photometric stereo that view sources can be
precisely calibrated, without any ambiguities if sur-
face normals are known. We do not have accurate
normals, but we have a set N of normals that has a
subset of correct normals. And this subset is deter-
mined by the method described above.
2.5 Extracting Normals
After calibrating light sources, normals can be deter-
mined in each point u as follows. Consider again the
formula of the Lambertian reflection model: I
i
(u) =
1
π
a
d
(u)n(u)
T
ρ
i
l
i
, i [1..m]. Due to the light source
calibration, ρ
i
and l
i
are known, thus a
d
(u) and n(u)
can easily be calculated, separately for each image
point u, by solving the over-determined linear system
of equations By
u
= b
u
, where
B =
ρ
1
l
T
1
···
ρ
m
l
T
m
and b
u
=
I
1
(u)
···
I
m
(u)
.
Since n(u) is assumed to be unit length, a
d
(u) =
πky
u
k and n(u) =
y
u
ky
u
k
.
3 RESULTS
To demonstrate the efficiency, we have tested the
method both on synthetic and real data. The syn-
thetic dataset contains a 3D mesh of a bumpy sphere,
a smoothed mesh without bumpiness, and five images
of the bumpy sphere taken using five different light
sources. For each image one of the light sources was
turned on, while the others were turned off.
VISAPP 2007 - International Conference on Computer Vision Theory and Applications
358
The method was run with the input of the
smoothed mesh and the five images, yielding a dense
normal map. The estimated normal map was then
compared to the ground truth normal map: the an-
gles between the corresponding normals were mea-
sured, in degrees, and the mean of the angles gave
the magnitude of the error. Since the method is non-
deterministic because of RANSAC, it was run 10
times resulting the error of 1.85
±0.1
that is sig-
nificantly smaller than the error of the normals of
the input smoothed mesh, which is 7.55
. Note that
the latter is also not too big. This is because the
mean of the angles was considered: the surface had
only a relatively small bumpy part, while larger parts
of the surface were smooth. Obviously, normals of
smooth parts were approximated precisely either be-
fore and after photometric estimation, and the small
errors ( 1
) of them prevented the mean from grow-
ing too large.
To demonstrate the efficiency of the method more
clearly, the pictures of the normal maps are also pre-
sented. Fig. 1 contains the ground truth and the es-
timated normal maps, as well as their difference be-
fore and after applying the method. Pixel intensities
of the normal maps represent the deviation of the nor-
mal vectors from a fixed unit vector. These intensities
are calculated as 255 3n
d
, where n
d
is the angular
deviation in degrees. Pixel intensities of the differ-
ence maps show the errors in angle: when the error is
n
e
degrees, pixel intensity is 128 + 2n
e
. The bumps
are clearly visible in the difference map before using
the method, but almost perfectly disappear after that.
We have tested the method for real datasets, as
well. The first dataset consists of the 3D mesh of a
Plaquette and seven images of the object taken from
the same viewpoint under varying illumination. To
provide lighting, a simple table-lamp was used and
moved in space. Fig. 2 shows three of the input im-
ages, the input 3D model and the resulting normal
map.
The second dataset consists of a Frog model and
five input images. (See Fig. 3.) Although it is hard to
evaluate the result based on the map presented, com-
paring it to the input images shows that the locations
of the bumps are precise.
The third dataset (Fig. 4) contain a wooden object
in the shape of a Bottle. The dataset consists of the 3D
mesh and seven images about the object. The results
of the method applied for the three datasets demon-
strate that our technique is suitable for detecting even
fine roughness.
4 CONCLUSION
In this paper we presented a novel method to refine
the geometry of 3D models of real objects. The pho-
tometric stereo based method uses a number of input
images taken from the same viewpoint under vary-
ing lighting, supplemented with an initial sparse 3D
mesh. The surface reflectance is assumed to be Lam-
bertian, but normals, albedos and lighting properties
are unknown.
The method solves the problem in two steps. First,
the initial surface mesh is used to calibrate light
sources. This problem is traced back to the well
known calibration–estimation problem, and the so-
lution is robustified by applying the RANSAC algo-
rithm. Second, dense normal and albedo maps are ex-
tracted using the calibrated setup. The effectiveness
of the method was demonstrated both on synthetic and
real data.
ACKNOWLEDGEMENTS
The author thank Dmitry Chetverikov and Levente
Hajder for their comments on this paper. This work
is supported by EU Network of Excellence MUSCLE
(FP6-507752).
REFERENCES
Belhumeur, P. N., Kriegman, D. J., and Yuille, A. L. (1999).
The bas-relief ambiguity. International Journal of
Computer Vision, 35(1):33–44.
Drbohlav, O. and
ˇ
Sara, R. (2001). Unambigous determina-
tion of shape from photometric stereo with unknown
light sources. In Proc. 8
th
IEEE International Confer-
ence on Computer Vision, pages 581–586.
Fischler, M. A. and Bolles, R. C. (1981). Random sample
consensus: a paradigm for model fitting with appli-
cations to image analysis and automated cartography.
Communications of the ACM, 24(6):381–395.
Hartley, R. and Zisserman, A. (2000). Multiple View Geom-
etry in Computer Vision. Cambridge University Press.
Koenderink, J. J. and Van Doorn, A. J. (1997). The generic
bilinear calibration-estimation problem. International
Journal of Computer Vision, 23(3):217–234.
Lensch, H. P., Kautz, J., Goesele, M., Heidrich, W., and Sei-
del, H.-P. (2001). Image-based reconstruction of spa-
tially varying materials. In Proc. 12
th
Eurographics
Workshop on Rendering Techniques, pages 103–114.
Lensch, H. P., Kautz, J., Goesele, M., Heidrich, W., and
Seidel, H.-P. (2003). Image-based reconstruction of
spatial appearance and geometric detail. ACM Trans-
actions on Graphics, 22(2):234–257.
USING PHOTOMETRIC STEREO TO REFINE THE GEOMETRY OF A 3D SURFACE MODEL
359
Lim, J., Ho, J., Yang, M.-H., and Kriegman, D. (2005). Pas-
sive photometric stereo from motion. In Proc. 10
th
IEEE International Conference on Computer Vision,
pages 1635–1642.
Rushmeier, H., Taubin, G., and Gu
´
eziec, A. (1997). Apply-
ing shape from lighting variation to bump map cap-
ture. In Proc. 8
th
Eurographics Rendering Workshop,
pages 35–44.
Woodham, R. J. (1978). Photometric stereo: A reflectance
map technique for determining surface orientation
from image intensity. In Image Understanding Sys-
tems and Industrial Applications, Proc. SPIE, volume
155, pages 136–143.
Yuille, A., Coughlan, J. M., and Konishi, S. (2003). KGBR
viewpoint–lighting ambiguity. Journal of the Optical
Society of America A, 20(1):24–31.
Yuille, A. L., Snow, D., Epstein, R., and Belhumeur, P. N.
(1999). Determining generative models of objects un-
der varying illumination: Shape and albedo from mul-
tiple images using SVD and integrability. Interna-
tional Journal of Computer Vision, 35(3):203–222.
Zhang, L., Curless, B., Hertzmann, A., and Seitz, S. M.
(2003). Shape and motion under varying illumination:
Unifying structure from motion, photometric stereo,
and multi-view stereo. In Proc. 9
th
IEEE International
Conference on Computer Vision, pages 618–625.
VISAPP 2007 - International Conference on Computer Vision Theory and Applications
360