Optimal Surface Normal from Affine Transformation
Barath Daniel
1,2
, Jozsef Molnar
2
and Levente Hajder
1,2
1
Geometric Modelling and Computer Vision Laboratory, MTA SZTAKI, H-1111 Kende utca 13-17, Budapest, Hungary
2
Department of Image Processing and Computer Graphics, Institute of Informatics,
University of Szeged,
´
Arp
´
ad t
´
er 2., H-6720 Szeged, Hungary
Keywords:
3D Reconstruction, Normal Estimation, Calibrated Stereo Images.
Abstract:
This paper deals with surface normal estimation from calibrated stereo images. We show here how the affine
transformation between two projections defines the surface normal of a 3D planar patch. We give a formula
that describes the relationship of surface normals, camera projections, and affine transformations. This formula
is general since it works for every kind of cameras. We propose novel methods for estimating the normal of
a surface patch if the affine transformation is known between two perspective images. We show here that the
normal vector can be optimally estimated if the projective depth of the patch is known. Other non-optimal
methods are also introduced for the problem. The proposed methods are tested both on synthesized data and
images of real-world 3D objects.
1 INTRODUCTION
Although computer vision has been an intensively
researched area in computer science from many
decades, several unsolved problems exist in the field.
This paper proposes a novel optimal method for es-
timating the normal vector of a planar surface patch
if the affine transformation of the patch between two
calibrated (stereo) images is known.
The normal vector estimation problem itself can
most accurately be solved by photometric stereo (PS)
that was introduced many decades ago (Woodham,
1978). The main drawback of this method is that it
can only be used in laboratories where light condi-
tions are totally controlled. PS usually assumes that
the object to be reconstructed is illuminated by di-
rectional light source(s) (Woodham, 1978), but point-
light sources can also be applied. (Fodor et al., 2014).
The image-based normal vector estimation is usu-
ally carried out by decomposing the homography of
the tangent plane between stereo images (Faugeras
and Lustman, 1988; Malis and Vargas, 2007). Unfor-
tunately, these methods ambiguous as it was shown in
several studies (e.g. in (Liu, 2012)). To the best of
our knowledge, the problem of image-based normal
vector estimation from affine transformation has not
been solved yet. The first similar work was published
in two papers by Habbecke and Kobbelt in (Habbecke
and Kobbelt, 2006; Habbecke and Kobbelt, 2007).
They estimate the parameters of a flat patch based on
photo-consistency. The plane is parameterized in 3D
by the implicit parameters of a general plane. (These
are three real values as the implicit parameters of a
3D plane are defined up to an arbitrary scale.) Our
method only concentrates on the estimation of the
spatial normal vector since the point of the plane can
be estimated in 3D by triangulation if corresponding
projections on two patches are known (Hartley and
Sturm, 1997; Hartley and Zisserman, 2003).
The closest work to our study is that of Megyesi
et al. (Z. Megyesi and D.Chetverikov, 2006). They
compute a dense 3D reconstruction using normal vec-
tors. The normal vectors themselves are calculated
from the affine parameters between a rectified stereo
image pair. For this reason, only two parameters of
the affine transformation have to be estimated. The
drawback of this work is that the rectification itself
cannot be perfect due to noise and computational in-
accuracy. Our method proposing here is more general
as it works on arbitrary stereo image pairs. The only
restriction is that the stereo images have to be taken
by perspective cameras. (Or the non-perspective dis-
tortion of the images has to be undistorted.)
To the best of our knowledge, this is the first
study that deals with surface normal computation
from affine transformation using calibrated stereo im-
ages. The main contribution of this paper is twofold:
(i) We show here the general relationship among sur-
305
Daniel B., Molnar J. and Hajder L..
Optimal Surface Normal from Affine Transformation.
DOI: 10.5220/0005303703050316
In Proceedings of the 10th International Conference on Computer Vision Theory and Applications (VISAPP-2015), pages 305-316
ISBN: 978-989-758-091-8
Copyright
c
2015 SCITEPRESS (Science and Technology Publications, Lda.)
face normal vector, affine transformation and cam-
era parameters. The formulas proposing here is valid
for every kind of cameras. (ii) Different surface nor-
mal estimators are proposed here including an opti-
mal one that finds the optimal normal vector in the
least squares sense if the affine parameters are con-
taminated with noise.
2 GEOMETRIC BACKGROUND
Two projections of a 3D surface are given in stereo
images. If the neighboring pixels are selected around
image locations, these pixels form the so-called
patches. The affine transformation between two cor-
responding patches are assumed to be known. The
goal of this study is to show how the surface normal
n can be estimated if the images are calibrated. The
problem is visualized in Figure 1.
Figure 1: 3D patch perspectively projected to stereo images.
x = Π
x
(X,Y,Z) y = Π
y
(X,Y,Z)
The surface point [X ,Y,Z]
T
is written in parametric
form
X = X (u,v), Y = Y (u,v), Z = Z(u, v).
As it is well-known in differential geome-
try (Kreyszig, 1991), the tangent vectors of the
plane are written by the partial derivatives of the
spatial coordinates, while the surface normal is given
by the cross product of the tangent vectors.
S
u
=
X(u,v)
u
Y (u,v)
u
Z(u,v)
u
S
v
=
X(u,v)
v
Y (u,v)
v
Z(u,v)
v
n = S
u
× S
v
It is known that the 3D point [X,Y,Z]
T
, and tangent
vectors S
u
and S
v
determine the tangent plane. Lo-
cally, the surface can be approximated by its tangent
plane. We assumed that we have images taken from
the object. Now, a point of the surface close to the
given 3D location [X,Y,Z]
T
is approximated by the
first order Taylor-series:
x + x
y + y
Π
x
(X,Y,Z)
Π
y
(X,Y,Z)
+
"
∂Π
x
(X,Y,Z)
u
∂Π
x
(X,Y,Z)
v
∂Π
y
(X,Y,Z)
u
∂Π
y
(X,Y,Z)
v
#
u
v
Let us see that the partial derivatives of the projec-
tion functions give the affine transformation between
3D and 2D surface patches.
x
y
A
u
v
A =
"
∂Π
x
(X,Y,Z)
u
∂Π
x
(X,Y,Z)
v
∂Π
y
(X,Y,Z)
u
∂Π
y
(X,Y,Z)
v
#
The partial derivatives can be reformulated using the
chain rule. For instance,
∂Π
x
(X,Y,Z)
u
=
∂Π
x
(X,Y,Z)
X
X
u
+
∂Π
x
(X,Y,Z)
Y
Y
u
+
∂Π
x
(X,Y,Z)
Z
Z
u
= ∇Π
T
x
S
u
,
where ∇Π
x
is the gradient vector of the projection
function w.r.t. the spatial coordinates X, Y , and Z of
the surface patch. Similarly,
∂Π
x
v
= ∇Π
T
x
S
v
∂Π
y
u
= ∇Π
T
y
S
u
∂Π
y
v
= ∇Π
T
y
S
v
.
Therefore, the affine matrix can be written as
A =
∇Π
T
x
∇Π
T
y
S
u
S
v
.
In stereo vision, two images are given. The affine
transformation between the image patches is obtained
by concatenating the inverse of affine transformation
A
1
(between the patches of image #1 and the spatial
patch), and the affine transformation A
2
(between 3D
patch and that in image #2). Formally, it can be writ-
ten as
x
2
y
2
T
= A
2
A
1
1
x
1
y
1
T
VISAPP2015-InternationalConferenceonComputerVisionTheoryandApplications
306
A
2
A
1
1
is the affine transformation between the im-
ages. The inverse of the affine matrix A can be written
as
A
1
=
1
det (A)
Π
T
x
S
u
Π
T
y
S
u
Π
T
x
S
v
Π
T
y
S
v
where det(A) = Π
T
x
S
u
Π
T
y
S
v
Π
T
x
S
v
Π
T
y
S
u
. If one
makes elementary modification utilizing the fact that
S
v
S
T
u
S
u
S
T
v
= [N]
×
, then the affine transformation
A
2
A
1
1
can be written as
A
1
1
A
2
=
1
Π
1
x
T
[N]
×
Π
1
y
"
Π
2
x
T
[N]
×
Π
1
y
Π
1
x
T
[N]
×
Π
2
x
Π
2
y
T
[N]
×
Π
1
y
Π
1
x
T
[N]
×
Π
2
y
#
Note that the scale of the normal is arbitrary since
both the determinant and the matrix elements are mul-
tiplied with the scale of [N]
×
.
The expression a
T
[N]
×
b is also called the scalar
triple product. Remark that a
T
[n]
×
b equals to n
T
(b ×
a). Therefore, the final equation of the affine transfor-
mation is written as
a
1
a
2
a
3
a
4
= A
1
1
A
2
=
1
n
T
w
5
n
T
w
1
n
T
w
2
n
T
w
3
n
T
w
4
(1)
where w
1
= ∇Π
1
y
× ∇Π
2
x
, w
2
= ∇Π
2
x
× ∇Π
1
x
, w
3
=
∇Π
1
y
× ∇Π
2
y
, w
4
= ∇Π
2
y
× ∇Π
1
x
, and w
5
= ∇Π
1
y
×
∇Π
1
x
. Equation 1 is a very important formula which
shows the relations of the surface normal and the pro-
jection of the surface to the stereo image pair. A very
important advantage of this formula is that it is valid
for every kind of camera since only the two projec-
tive equations must be known. We show here that
the above formula can be used to compute the surface
normal if the perspective parameters are calibrated.
2.1 Pin-hole Camera Model
When the standard perspective camera model is used,
the projection is written as
[x,y,1]
T
=
1
s
P
persp
[X,Y,Z, 1]
T
, (2)
where [x, y] are the projected coordinates in image
space, s is the projective depth, P
persp
is the so called
projection matrix with size 3 × 4. Let us denote the
rows of the projective matrix by p
T
1
, p
T
2
, and p
T
3
. The
projection formulas and their gradients can be written
as
Π
x
=
p
T
1
[X,Y,Z,1]
T
s
Π
y
=
p
T
2
[X,Y,Z,1]
T
s
∇Π
x
=
1
s
P
11
+ xP
31
P
12
+ xP
32
P
13
+ xP
33
∇Π
y
=
1
s
P
21
+ yP
31
P
22
+ yP
32
P
23
+ yP
33
where P
i j
is the element in the i
th
row and j
th
column
in projection matrix P
persp
. The projective depth is
obtained as s = p
T
3
[X,Y,Z, 1]
T
. The affine transfor-
mation becomes
a
1
a
2
a
3
a
4
=
1
αn
T
w
5
n
T
w
1
n
T
w
2
n
T
w
3
n
T
w
4
(3)
where α = s
1
/s
2
is the ratio of the projec-
tive depths in the first and second images, and
w
1
= s
1
s
2
∇Π
1
y
× ∇Π
2
x
, w
2
= s
1
s
2
∇Π
2
x
× ∇Π
1
x
,
w
3
= s
1
s
2
∇Π
1
y
× ∇Π
2
y
, w
4
= s
1
s
2
∇Π
2
y
× ∇Π
1
x
,
and w
5
= s
2
s
2
∇Π
1
x
× ∇Π
1
y
.
A very important remark is that if the projective
depth s
i
is unknown, but the upper left 3 × 3 subma-
trices of the projection matrices P
1
and P
2
are known
then the gradient vectors can be calculated up to an
unknown scale. (This scale is the multiplicative in-
verse of the projective depth s
i
.) Also note that the
vectors w
1
.. .w
4
are scaled by s
1
s
2
while w
5
by s
2
s
2
.
Therefore, the normal vector is independent of the
translation between the two cameras since the last
columns of the projection matrices are the product
of the camera intrinsic parameters and the translation.
For this reason, the following two cases must be dis-
tinguished:
1. Both projection matrixes P
1
and P
2
are known.
(In other words, the cameras are calibrated.)
2. Only the upper-left 3 × 3 submatrices of the pro-
jections are known. In this case, the projective
depth of the points is not known. However, the
gradients can be computed up to a scale where this
scale is the inverse of the projective depth.
Also remark that the normal vector cannot be es-
timated if w
5
= 0. This can only be true if ∇Π
1
x
and
∇Π
1
y
are parallel which is not a realistic case as it is
only possible if the first and second rows of the 3 × 3
submatrix of projection matrix P
persp
are parallel. If
the camera calibration is valid it cannot be true. The
problem itself is numerically stable if the angles be-
tween the vectors w
1
.. .w
4
are relatively large. To
our experiments, this is true for realistic reconstruc-
tion problems.
OptimalSurfaceNormalfromAffineTransformation
307
3 NORMAL VECTOR
ESTIMATION
This section shows different normal vector estimators.
The first one is very fast and simple, later more so-
phisticated and accurate methods are introduced.
3.1 Fast Normal Estimation (FNE)
The base matrix equation (Eq. 3) consists of 4 ele-
ments. If two of those are selected and they are di-
vided by each other, then an equation is obtained. If
the same procedure is repeated for the rest of the ma-
trix elements, then the second equation can be simi-
larly yielded. For instance, two elements of the first
and second rows give the equations
w
T
1
n
w
T
2
n
=
a
1
a
2
(4)
w
T
3
n
w
T
4
n
=
a
3
a
4
(5)
These equations can be trivially modified as
a
2
w
T
1
a
1
w
T
2
n = 0
a
4
w
T
3
a
3
w
T
4
n = 0
The normal vector n is perpendicular to both a
2
w
T
1
a
1
w
T
2
and a
4
w
T
3
a
3
w
T
4
. Therefore, the normal can be
obtained as the cross product of these vectors:
n =
a
2
w
T
1
a
1
w
T
2
×
a
4
w
T
3
a
3
w
T
4
. (6)
Of course, the obtained vector n should be normal-
ized, its length must be 1. A very nice property of
this normal estimation is that it is independent of the
scales appearing in vectors w
1
.. .w
4
. Therefore, the
projective depths of the patch are not required to es-
timate the normal, because they influence only the
length of n.
Remark, that the equation pairs (Eq. 4 & 5) can be
selected in other two ways. In those cases, the normal
vector is given by
n =
a
3
w
T
1
a
1
w
T
3
×
a
4
w
T
2
a
2
w
T
4
(7)
or
n =
a
4
w
T
1
a
1
w
T
4
×
a
3
w
T
2
a
2
w
T
3
(8)
3.2 Optimal Normal Estimation with
Known Projective Depth (OPT)
The aim of the optimal method is to minimize the er-
ror in the matrix base equation (Eq. 3). Formally, the
estimation itself can be written as the minimization of
Frobenius norm of Equation 3 with respect to normal
n. This is equivalent to
argmin
n
4
k=1
n
T
w
k
n
T
w
5
a
k
2
(9)
It minimizes the normal vector in the least square
sense assuming that the affine parameters are contam-
inated with noise. (This assumption is valid since the
affine parameters are estimated as described later in
Sec. 4.2 in short, and this estimation cannot be per-
fect since the images themselves contain noise.) The
optimal solution is given in first part of the Appendix
with α = 1.
3.3 Normal Estimation with Unknown
Projective Depths using Alternation
(ALT)
If the projective depth is unknown then the base opti-
mization equation (Eq. 9) cannot be applied since the
parameter α = s
1
/s
2
is not known. The cost function
defined in Eq. 9 has to be modified as
argmin
n
4
k=1
n
T
w
k
αn
T
w
5
a
k
2
(10)
Unfortunately, this problem cannot be optimally
solved to the best of our knowledge. We propose here
an alternating-like approach which is overviewed in
Alg. 1. The alternation has two steps:
1. EstimateAlpha: The cost function (Eq. 10) is a
linear one with respect to 1/α since it can be
written as A
1
α
= b where A =
h
n
T
w
1
n
T
w
5
,. ..,
n
T
w
4
n
T
w
5
i
T
and b = [a
1
,. .., a
4
]
T
. The optimal solution of an
overdetermined linear system can be solved opti-
mally. In this case, that is obtained as
1
α
=
n
T
w
5
j
(n
T
w
j
)
2
j
n
T
w
j
a
j
(11)
2. EstimateNormal: The normal vector estimation
is very similar to the optimal method described
above, the only difference is that the parameter
α appears in the denominators. However, the
method described in Appendix can solve the sub-
problem optimally.
The alternation requires initial values for the parame-
ters n and α to be optimized. We propose to use the
linear methods described later in Sec 3.4.2 in order to
compute the initial values. The alternation converges
to the closest (local) minimum since it optimizes a
non-negative cost function and each step decreases
(or does not increase) the cost. Unfortunately, we
VISAPP2015-InternationalConferenceonComputerVisionTheoryandApplications
308
could not prove theoretically that the global optimum
is reached in this way, however, to our practice, the
method usually improves the initial normal n.
Algorithm 1 : Alternation for Normal Estimation
(ALT).
n, α Parameter Initialization by LNE-UPD
repeat
α EstimateAlpha(n,w
1
,. .., w
5
)
n EstimateNormal(α,w
1
,. .., w
5
)
until convergence
3.4 Linear Normal Estimation (LNE)
The base matrix equation (Eq. 3) is a nonlinear one.
The elements can be linearized if they are multiplied
with their common denominator αw
T
5
n. Then a cost
function can be formed for the elements as
argmin
n
4
k=1
n
T
w
k
αa
k
n
T
w
5
2
(12)
This is a usual trick, and the solution will not be op-
timal if this modification is carried out. However,
the problem becomes linear, and it can be solved eas-
ily (Bj
¨
orck, 1996).
3.4.1 Linear Normal Estimation for Known
Projective Depth (LNE-KPD)
If the projective depth is known, then α = 1 and the
problem can be rewritten as an overdetermined ho-
mogenous linear equation system An = 0 subject to
n
T
n = 1, where
A =
w
1
a
1
w
5
w
2
a
2
w
5
w
3
a
3
w
5
w
4
a
4
w
5
(13)
The optimal solution of this system is the eigenvector
of matrix A
T
A corresponding to the smallest eigen-
value (Bj
¨
orck, 1996).
3.4.2 Linear Normal Estimation for Unknown
Projective Depth (LNE-UPD)
If the projective depth is unknown, then the function
to be optimized in Eq. 12 gives an overdetermined ho-
mogenous linear system Bb = 0, similarly to the pre-
vious case (Sec. 3.4.1), but the matrix of coefficients
B and the vector b differ as follows.
B
T
=
w
T
1
, a
11
w
T
2
, a
12
w
T
3
, a
21
w
T
4
, a
22
b =
n
αw
T
5
n
The solution is given from the eigenvector of
matrix B
T
B corresponding to the smallest eigen-
value (Bj
¨
orck, 1996). If this vector is denoted by
ˆ
b,
then the estimation for the normal vector n is given by
the first three coordinates of
ˆ
b, but this vector should
be normalized in order to fulfill the n
T
n = 1 con-
straint. The parameter α = s
1
/s
2
can also be com-
puted if n is known from the fourth coordinate of
ˆ
b.
4 EXPERIMENTAL RESULTS
The proposed normal vector estimators have been
tested both on synthesized data and real world images.
4.1 Test on Synthesized Data
During synthesized tests, our main goal was to gener-
ate different normal vectors with corresponding affine
parameters. For this reason, a stereo image pair was
first generated represented by their 3 × 4 projection
matrices. Then a 3D sphere was generated as well
sampled by spherical coordinates. The normal vector
of the locations on the sphere can easily be calculated
as it is the direction pointing from the sphere center to
the current surface points. The synthetic sphere with
the ground truth normals is visualized in Fig. 2. 72
different patches sampled by spherical coordinates are
used in order to compare the methods.
Figure 2: Sphere with normal vectors for synthesized test.
The affine parameters between the stereo images
were calculated as follows: (i) The tangent plane of
the sphere was determined first, (ii) then it was pro-
jected to the stereo images. (iii) The projections of
OptimalSurfaceNormalfromAffineTransformation
309
the plane determine two homographies with respect to
the 3D tangent plane. (iv) The homography between
the two images were given by concatenating the two
3D2D homographies. (v) The affine transforma-
tion is the first order approximation of the 2D2D
homography at the given locations.
The error values are defined as the average of the
angle error between the estimated and ground truth
normal vectors. We have tested all the methods de-
scribed in this study. In each test case, 72 patches of
the sphere were generated, and the tests were repeated
50 times. Thus, the average error values come from
72 · 50 = 3600 run of the competitor methods. Two
test cases were simulated: zero-mean Gaussian noise
was added to the (i) affine parameters and (ii) to the
elements of the projection matrices.
Remark that all the synthesized tests have been
implemented on Octave
1
.
4.1.1 Test with Contaminated Affine Parameters
We have compared the efficiency of the fast (FNE),
alternation (ALT) and optimal (OPT) normal estima-
tors. It is clear that the optimal estimator (OPT) out-
performs the others as it is visualized in Fig. 3 since
it optimally estimates the normal vector in the least
square sense. It is also obvious that the fast method is
the less accurate one as the other two methods (OPT
and ALT) are significantly more sophisticated.
Figure 3: Comparison of methods when affine parameters
contaminated.
We have also compared the linear methods to the
corresponding non-linear ones. Namely, the linear
method with unknown projective depth (LIN-UPD)
algorithm is compared to the alternation (Fig. 5) and
linear with known projective depth method (LIN-
KPD) to the optimal one ((Fig. 6)). The differences
are significant only if the optimal (OPT) method is
used instead of its linear version.
We have also examined the expected values and
the spreads of the ve proposed methods. The ex-
pected value for the length of the difference between
1
www.octave.org.
Figure 4: Comparison of methods when projective parame-
ters contaminated.
the ground truth and estimated vectors are close to
zero as it is excepted. Therefore, the estimators are
consistent. Their spreads are listed in Table 1. It
shows that the optimal method has the lowest spread
as it is expected, FNE is the highest one. It is inter-
esting that the linear method with known projective
depths gives significantly better result than the meth-
ods with unknown depths (LIN-UPD and ALT),
Table 1: Spread of error vector lengths.
FNE LIN-UPD ALT LIN-KPD OPT
0.55 0.449 0.433 0.352 0.2919
4.1.2 Test with Contaminated Projection
Parameters
Another interesting experiment is when the elements
of the projection matrices are contaminated with
noise. We have examined the same test cases as
in Sec.4.1.1.
When the base FNE, ALT and OPT methods
are compared, the dominance of the optimal method
(OPT) is more obvious. The performance of the fast
(FNE) and alternation (ALT) methods are closer to
each other than in the previous test case (Sec. 4.1.1).
The accuracy of the result is also the best (highest) for
the optimal method. It seems that the other two meth-
ods (FNE and ALT) are very sensitive to noise since
the corresponding charts (Fig. 4) contain many peaks.
Figure 5: Comparison of linear and corresponding nonlin-
ear methods.
VISAPP2015-InternationalConferenceonComputerVisionTheoryandApplications
310
Figure 7: Comparison of normal vector estimators. HOM: normal from homography decomposition AFF: normal from affine
parameters by proposed OPT method. Left: transformation estimated from 4 points, Center: 6 points, Right: 8 points.
Figure 6: Comparison of linear and corresponding nonlin-
ear methods.
To conclude this synthetic test, it can be declared
that the optimal method is the best solution if the pro-
jective depth of the 3D point is known. If it is not, the
alternation method serves the most efficient method,
but its advantage over its linear version is very small.
The alternation itself is an iterative algorithm, some-
times it can be very slow. Therefore, we propose the
LIN-UPD method for time-critical application, ALT
is the best selection for offline algorithms when the
projective depths are unknown.
4.1.3 Normal Vector from Affine Parameters
Versus Homography
The mainstream solution for computing the normal
vector from two patches is to estimate the homogra-
phy between the patches (Malis and Vargas, 2007).
It has eight degrees of freedom, and it can be decom-
posed into the pose (3 DoF), the location (3 DoF), and
the normal (2 DoF) of the plane.
We compare the accuracy of the homography-
estimated normal vector with our optimal (OPT) esti-
mator. The synthesized data is given by sampling the
surface of a sphere similarly to the synthesized tests
above. However, the homography and the affine trans-
formation are both estimated from projected points:
points are generated randomly on the 3D tangent
plane (close to the location on sphere surface), and
these points are projected to the image pair. Then
noise is added to the projected coordinates in image
space. The homography and the affine transformation
are estimated using the corresponding points in im-
age pairs. The estimation of the affine transformation
is easier since it is trivial that affine parameter esti-
mation is a linear problem. We estimate the homogra-
phy via numerical optimization method, the initial pa-
rameters are computed by DLT (Direct Linear Trans-
formation) algorithm (Hartley and Zisserman, 2003).
Note that at least 4 points are required to estimate the
homography, while 3 points are sufficient to compute
the affine transformation.
The results of the comparison is visualized in
Fig. 7. The transformations are estimated from the
same point correspondences. The number of cor-
responding points are 4, 6, and 8 as it is seen in
Fig. 7. The methods are denoted by ’HOM‘ (nor-
mals from estimated homography) and ’AFF‘ (nor-
mals from affine transformation). It is interesting that
AFF serves better results when the noise is low. It is
true especially for the P = 4 case (left image in Fig. 7).
It is because the homography is determined exactly by
the given 4 points, while the affine transformation is
overdetermined. When the number of points grows
(center and right plots on Fig. 7), the normal from ho-
mography estimation becomes better than that from
affine transformation since the projected coordinates
are obtained via perspective projection, and homog-
raphy represents the correct transformation between
corresponding planes in image space.
4.2 Test on Real Image Pairs
Real Tests on Calibrated Images. The proposed op-
timal normal estimator has been also tested on real
data. In order to use the normal estimator, the projec-
tion matrices have to be known. We have downloaded
building images and reconstruction data with camera
parameters from the web page of the Visual Geometry
Group at Oxford University
2
.
The Oxford data sets contain point correspon-
dences, but we have used ASIFT method (Yu and
Morel, 2011) of Yu et al. for this purpose instead of
using the original data. The affine transformations for
the pairs are computed as follows. (1) Two patches
around the corresponding locations are cropped from
2
http://www.robots.ox.ac.uk/ vgg/data/
OptimalSurfaceNormalfromAffineTransformation
311
Figure 8: Stereo image pair with estimated normals (Library sequence).
Figure 10: Estimated normals on sequence House (left) and Corridor (right).
the images. Their size are from 60 × 60 to 80 × 80
depending on test sequences. (2) Then the ASIFT
method (Yu and Morel, 2011) is applied again for the
patch pair obtaining point correspondences between
patches. Estimating the affine transformation is
an affine 2D registration problem based on point
correspondences. It is easy to solve since the problem
is linear w.r.t. affine parameters, the parameters can
be obtained optimally (Bj
¨
orck, 1996) even if the
problem is overdetermined. Remark that the affine
estimator should be robust since ASIFT can give false
correspondences. We used a RANSAC (Fischler and
Bolles, 1981)-like algorithm to discard the outliers.
The proposed optimal method is carried out on the
computed affine transformations of the Library,
House, and Corridor pairs as it is seen in Figs. 8.
10. The normals are visualized by white rods. It is
evident that the quality of a normal depends on the
baseline of the stereo images. The last stereo pair
(Corridor) has shorter baseline than the other two,
the accuracy of its estimated normals (on the floor
and wall) are lower.
We were able to reconstruct the 3D surface of the
estimated positions and corresponding normals using
the Marching Cubes (APSS) filter of MeshLab
3
. It is
visualized in Fig 9.
3
www.meshlab.org
Figure 9: Reconstructed 3D model.
Normal from Real Planar Surfaces. The proposed
normal vector estimator (OPT) is also tested on im-
ages of buildings as it is pictured in Fig 11. These
objects mainly consist of planar walls and they can be
matched by homography-based pairing methods es-
pecially when the images are rectified (Tanacs et al.,
2014). Thought the homography itself can be de-
composed (Faugeras and Lustman, 1988) if the cam-
eras are calibrated, and then the plane normal is ob-
tained with the camera extrinsic parameters. How-
ever, the decomposition has ambiguity as it is dis-
cussed in (Liu, 2012) and two realistic normal vector
VISAPP2015-InternationalConferenceonComputerVisionTheoryandApplications
312
Figure 11: Estimated normals on planar surfaces.
OptimalSurfaceNormalfromAffineTransformation
313
can be achieved.
We reconstructed the plane normal via the affine
transformation. The affine parameters can eas-
ily be calculated from homography as it is shown
in (Moln
´
ar et al., 2014). The cameras are calibrated
via point-based 3D reconstruction by bundle adjust-
ment (B. Triggs and P. McLauchlan and R. I. Hart-
ley and A. Fitzgibbon, 2000). Then the normals are
computed by the proposed optimal method. We have
tested the OPT algorithm on five different stereo pairs
as it is visualized in Fig 11. They are short base-
line stereo images. The yielded normal vectors and
points of the planes are drawn on the input images.
The corresponding points on the wall surfaces are de-
noted by small dots, the normals are drawn both in-
side and outside the plane. The proposed method is
robust enough, it computes very accurately the sur-
face normals.
5 CONCLUSION AND FUTURE
WORK
Novel normal estimators have been proposed here that
can estimate the normal of a surface patch if two per-
spective images of the patch are given and the affine
transformations of the projected patches are known
between the images. One of the proposed methods
is optimal: if only the elements of the affine trans-
formation are contaminated with noise, the proposed
method (OPT) serves the optimal estimation in the
least square sense. It can be applied if the perspec-
tive cameras are fully calibrated.
It is also obvious that normal estimation is very
sensitive to the noise appearing in affine transforma-
tions. In the future, we plan to improve the affine
transformation estimation in order to get more real-
istic results. We will also deal with developing novel
reconstruction methods that use both point correspon-
dences and estimated normals in order to obtain a
more realistic 3D reconstruction of real-world 3D ob-
jects.
ACKNOWLEDGEMENT
This research was supported by the EU and the
State of Hungary, co-financed by the European So-
cial Fund through project FuturICT.hu (grant no.: TA-
MOP4.2.2.C11/1/KONV20120013)
REFERENCES
B. Triggs and P. McLauchlan and R. I. Hartley and
A. Fitzgibbon (2000). Bundle Adjustment A Mod-
ern Synthesis. In Triggs, W., Zisserman, A., and
Szeliski, R., editors, Vision Algorithms: Theory and
Practice, LNCS, pages 298–375. Springer Verlag.
Bj
¨
orck,
˚
A. (1996). Numerical Methods for Least Squares
Problems. Siam.
Faugeras, O. and Lustman, F. (1988). Motion and struc-
ture from motion in a piecewise planar environment.
Technical Report RR-0856, INRIA.
Fischler, M. and Bolles, R. (1981). RANdom SAmpling
Consensus: a paradigm for model fitting with appli-
cation to image analysis and automated cartography.
Commun. Assoc. Comp. Mach., 24:358–367.
Fodor, B., Kaz
´
o, C., Zsolt, J., and Hajder, L. (2014). Normal
map recovery using bundle adjustment. IET Computer
Vision, 8:66 – 75.
Habbecke, M. and Kobbelt, L. (2006). Iterative multi-view
plane fitting. In In VMV06, pages 73–80.
Habbecke, M. and Kobbelt, L. (2007). A surface-growing
approach to multi-view stereo reconstruction. In
CVPR.
Hartley, R. I. and Sturm, P. (1997). Triangulation. Computer
Vision and Image Understanding: CVIU, 68(2):146–
157.
Hartley, R. I. and Zisserman, A. (2003). Multiple View Ge-
ometry in Computer Vision. Cambridge University
Press.
Kreyszig, E. (1991). Differential geometry. Dover Publica-
tions.
Liu, H. (2012). Deeper Understanding on Solution Ambigu-
ity in Estimating 3D Motion Parameters by Homogra-
phy Decomposition and its Improvement. PhD thesis,
University of Fukui.
Malis, E. and Vargas, M. (2007). Deeper understanding of
the homography decomposition for vision-based con-
trol. Technical Report RR-6303, INRIA.
Moln
´
ar, J., Huang, R., and Kato, Z. (2014). 3d recon-
struction of planar surface patches: A direct solution.
ACCV Big Data in 3D Vision Workshop.
Tanacs, A., Majdik, A., Molnar, J., Rai, A., and Kato, Z.
(2014). Establishing correspondences between planar
image patches. In International Conference on Dig-
ital Image Computing: Techniques and Applications
(DICTA).
Woodham, R. J. (1978). Photometric stereo: A reflectance
map technique for determining surface orientation
from image intensity. In Image Understanding Sys-
tems and Industrial Applications, Proc. SPIE, volume
155, pages 136–143.
Yu, G. and Morel, J.-M. (2011). ASIFT: An Algorithm for
Fully Affine Invariant Comparison. Image Processing
On Line, 2011.
Z. Megyesi, G. and D.Chetverikov (2006). Dense 3d re-
construction from images by normal aided matching.
Machine Graphics and Vision, 15:3–28.
VISAPP2015-InternationalConferenceonComputerVisionTheoryandApplications
314
APPENDIX
Algorithm for Optimal Normal Estimation. The
task is to minimize the cost function defined in Eq. 10
with respect to normal vector n. The scale of the vec-
tor is arbitrary, only the direction of the normal is re-
quired. Such kind of problems are typically solved
using Lagrange-multipliers, however, it cannot be ap-
plied here since the derivatives become very difficult.
For this reason, we utilize another constraint for the
normal: let the sum of the coordinates be 1. Thus, n
is written as n = [n
x
,n
y
,1 n
x
n
y
]
T
. Eq. 10 can be
reformulated as,
argmin
m
4
k=1
m
T
q
k
+ w
k,z
αm
T
q
5
+ αw
5,z
a
k
2
,
where m = [n
x
,n
y
], q
i
= [w
i,x
w
i,z
,w
i,y
w
i,z
]
T
. (In-
dices x, y, and z denote the first, second, and third
coordinates of vectors, respectively.)
The minima/maxima can be obtained by taking
the derivative with respect to vector m:
2
4
k=1
β
k
γ
k
= 0
where
β
k
=
m
T
q
k
+ w
k,z
αm
T
q
5
+ αw
5,z
a
k
γ
k
=
α
(m
T
q
5
+ w
5,z
)q
k
(m
T
q
k
+ w
k,z
)q
5
(αm
T
q
5
+ αw
5,z
)
2
After taking the lowest common multiple of the
fractions, the left side should be equal to zero as
4
k=1
δ
k
κ
k
= 0, where
δ
k
=
m
T
q
k
+ w
k,z
a
k
αm
T
q
5
a
k
αw
5,z
κ
k
=
(m
T
q
5
+ w
5,z
)q
k
(m
T
q
k
+ w
k,z
)q
5
It can be simplified as
4
k=1
e
1
k
e
2
k
= 0, where
e
1
k
=
m
T
(q
k
a
k
αq
5
) + (w
k,z
a
k
αw
5,z
)
e
2
k
=
(m
T
q
5
)q
k
(m
T
q
k
)q
5
+ w
5,z
q
k
w
k,z
q
5
This is an equation with a 2D-vector:
4
k=1
r
m
T
q
5
q
k,x
q
i
q
5,x
+ w
5,z
q
k,x
w
k,z
q
5,x
m
T
q
5
q
k,y
q
i
q
5,y
+ w
5,z
q
k,y
w
k,z
q
5,y
= 0
where r =
m
T
(q
k
a
k
αq
5
) + (w
k,z
a
k
αw
5,z
)
By introducing the m = [x, y]
T
notation, the vector
equation is modified as follows
4
k=1
(
k
x + Ψ
k
y + Γ
k
)
1
k
x + Ψ
1
k
y + Γ
1
k
2
k
x + Ψ
2
k
y + Γ
2
k
= 0
where
k
= q
k,x
αq
5,x
a
k
Ψ
k
= q
k,y
αq
5,y
a
k
Γ
k
= w
k,z
a
k
αw
5,z
1
k
= 0
Ψ
1
k
= q
5,y
q
k,x
q
k,y
q
5,x
Γ
1
k
= w
5,z
q
k,x
w
k,z
q
5,x
2
k
= q
5,x
q
k,y
q
k,x
q
5,y
Ψ
2
k
= 0
Γ
2
k
= w
5,z
q
i,y
w
i,z
q
5,y
The rows of the vector equation give two special
quadratic curves. They are written by their implicit
equations as
4
k=1
A
l
k
x
2
+
4
k=1
B
l
k
y
2
+
4
k=1
C
l
k
xy +
4
k=1
D
l
k
x +
4
k=1
E
l
k
y +
4
k=1
F
l
k
= 0, where A
l
k
=
k
l
k
, B
l
k
= Ψ
k
Ψ
l
k
, C
l
k
=
l
k
Ψ
k
+Ψ
l
k
k
, D
l
k
=
l
k
Γ
k
+
Γ
l
k
k
, E
l
k
= Ψ
l
k
Γ
k
+ Γ
l
k
Ψ
k
and F
l
k
= Γ
k
Γ
l
k
, l 1,2.
They are special because A
1
k
= 0 and B
2
k
= 0.
The solution of the optimal method described in
the study (within appendix) is given by the intersec-
tion of two quadratic equations.
B
1
y
2
+C
1
xy + D
1
x + E
1
y + F
1
= 0
A
2
x
2
+C
2
xy + D
2
x + E
2
y + F
2
= 0
Parameter y can be obtained from the latter equation
as
y =
A
2
x
2
+ D
2
x + F
2
C
2
x + E
2
Substituting y into the first equation the following
expression is obtained
B
1
A
2
x
2
+ D
2
x + F
2
C
2
x + E
2
2
C
1
x
A
2
x
2
+ D
2
x + F
2
C
2
x + E
2
+ D
1
x
E
1
A
2
x
2
+ D
2
x + F
2
C
2
x + E
2
+ F
1
= 0
If both sides are multiplied with (C
2
x + E
2
)
2
, then
the equation modifies as follows
B
1
(A
2
x
2
+ D
2
x + F
2
)
2
C
1
x
A
2
x
2
+ D
2
x + F
2
(C
2
x + E
2
) +
D
1
x (C
2
x + E
2
)
2
E
1
A
2
x
2
+ D
2
x + F
2
(C
2
x + E
2
)
+F
1
(C
2
x + E
2
)
2
= 0
This is a fourth-order polynomial where the coef-
ficients are as follows
x
4
: B
1
A
2
2
C
1
A
2
C
2
x
3
:
2B
1
A
2
D
2
C
1
A
2
E
2
C
1
D
2
C
2
+ D
1
C
2
2
E
1
A
2
C
2
x
2
:
B
1
D
2
2
+ 2B
1
A
2
F
2
C
1
D
2
E
2
C
1
F
2
C
2
+ 2D
1
C
2
E
2
E
1
A
2
E
2
E
1
D
2
C
2
+ F
1
C
2
2
x
1
:
2B
1
D
2
F
2
C
1
F
2
E
2
+
D
1
E
2
2
E
1
D
2
E
2
E
1
F
2
C
2
+
2F
1
C
2
E
2
x
0
:
B
1
F
2
2
E
1
F
2
E
2
+ F
1
E
2
2
OptimalSurfaceNormalfromAffineTransformation
315
Remark that the equation C
2
x+E
2
= 0 can also be
considered. (In this case the first equation is indepen-
dent from y.)
Figure 12: Quadratic curves.
An example for two quadratic curves with three
real intersections is visualized in Fig. 12. (The pa-
rameters of curves are B1 = 1.9055, C1 = 2.2632,
D1 = 2.8577, E1 = 9.4392, F1 = 7.7081, and A2 =
2.2632, C2 = 1.9055, D2 = 4.2074, E2 = 2.3903,
F2 = 1.1190.)
VISAPP2015-InternationalConferenceonComputerVisionTheoryandApplications
316