tion.
Most approaches assume the presence of edges or any
kind of local features. Furthermore, object symme-
tries and shape ambiguities are not considered. Our
method in contrast also works with untextured, trans-
parent and shiny objects. They can also have smooth
surface geometry which is not easily approximated by
polyhedral models. In our work, we rely only on ob-
ject contour information. To overcome the problem
of contour symmetry and ambiguous views, we pro-
pose a multi-mirror system in combination with a sin-
gle camera and light source. The result is a cheap,
perfectly synchronized multiview system, which can
be self-calibrated from any known reference object.
Furthermore, our pose estimation procedure is insen-
sitive to local minima, because an exhaustive search
over the space of object contours is performed.
2 CATADIOPTRIC GEOMETRY
A central perspective camera projection matrix is
given by a 3 × 4 matrix P, computed from camera
calibration matrix K, which includes the five intrin-
sic camera parameters, rotation R and translation t:
P = K[R|t] (1)
A 3D plane is defined as:
n
T
x+ d = 0, (2)
with normal vector n and the distance to the ori-
gin d. Reflections in 3D space are Euclidean trans-
formations, which additionally perform orientation
changes. Algebraically, a reflection is given by a ma-
trix D
4×4
, which is able to reflect points x, planes Π
and cameras P over a reflection plane:
x
′
= D
T
x, Π
′
= D
−1
Π, P
′
= PD
T
. (3)
Catadioptric devices use reflective devices in a cam-
era’s field of view to capture an object from more than
one viewpoint. Additionally, catadioptric stereo has
geometric and radiometric advantages. One geomet-
ric advantage is the reduced number of camera param-
eters. One radiometric advantage is the replication of
light sources due to mirror reflections. The relation
between the real camera and its virtual reflection is
defined by the mirror reflection matrix D, which is
defined by the mirror normal n, the camera-mirror-
distance d and the camera coordinate frame origin
e
c (Gluckman and Nayar, 2001):
D =
I− 2nn
T
e
c− 2dn
0 1
=
R t
0 1
. (4)
A virtual camera is computed from P
real
as
P
virtual
= P
real
D
T
. (5)
3 POSE ESTIMATION
Pose estimation is based on matching measured con-
tours from all mirror views against a set of pre-
generated synthetic contours. Using a known 3D
model, and camera-mirror geometry as obtained from
calibration, the object is rendered in different poses.
From each rendered image, contours are extracted
and added to a database. The set of all rendered im-
ages covers the space of possible object orientations
in front of the camera, sampled in discrete intervals.
Pose estimation subsequently is reduced to an exhaus-
tive search within this database. The classification re-
sult is guaranteed to be globally optimal with respect
to the discretized space of orientations.
3.1 System Calibration
We assume the camera projection center to be lo-
cated at the world coordinate origin. Hence, camera
rotation R is the identity matrix and translation t is
zero. Intrinsic calibration is performed as proposed
by Zhang (Zhang, 1999). The mirror calibration pro-
cedure used within our approach is based on the work
of Hu et al. (Hu et al., 2005), where first the mir-
ror plane normal n is estimated, followed by camera-
mirror-distance d. For computation of n, two pairs
of corresponding points between real view and each
mirror view are required. These correspondences are
obtained via the object convexhull in the real and mir-
ror views. There are exactly two lines, which are tan-
gent to both convex hulls. They are called limitation
lines and provide a pair of corresponding points each.
Furthermore, their intersection provides the vanish-
ing point vp of n, which coincidentally describes the
epipole e of the virtual camera. Mirror normal n is
computed by evaluating the direction of the viewing
ray through e in the image:
n = (n
x
,n
y
,n
z
)
T
= K
−1
e. (6)
Camera-mirror-distance d is computed with knowl-
edge of a single object point (x,y) and its mirrored
correspondence (x
′
,y
′
):
d =
∆uz
0
2(u
′
n
z
− n
x
)
, (7)
where (u,v) are normalized image coordinates, and
∆u = u
′
− u. The nominal distance between camera
center and 3D world point z
0
is set to 1, which results
in a system calibration up to an unknown scaling fac-
tor (Hu et al., 2005). In the case of multiple mirrors,
these correspondencescannot be uniquely determined
over all views. Hence, an additional point correspon-
dence is established by evaluating the centroid of a
VISAPP 2010 - International Conference on Computer Vision Theory and Applications
424