STRUCTURAL ICP ALGORITHM FOR POSE ESTIMATION BASED

ON LOCAL FEATURES

Marco A. Chavarria and Gerald Sommer

Cognitive Systems Group. Christian-Albrechts-University of Kiel, D-24098 Kiel, Germany

Keywords:

Pose estimation, ICP algorithm, monogenic signal.

Abstract:

In this paper we present a new variant of the ICP (iterative closest point) algorithm for ﬁnding correspondences

between image and model points. This new variant uses structural information from the model points and

contour segments detected in images to ﬁnd better conditioned correspondence sets and to use them to compute

the 3D pose. A local representation of 3D free-form contours is used to get the structural information in

3D space and in the image plane. Furthermore, the local structure of free-form contours is combined with

orientation and phase as local features obtained from the monogenic signal. With this combination, we achieve

a more robust correspondence search. Our approach was tested on synthetical and real data to compare the

convergence and performance of our approach against the classical ICP approach.

1 INTRODUCTION

Many actual applications in robotics and computer vi-

sion deal with objects modelled by e.g. 3D free-form

contours. Such models are widely used for problems

like monocular and binocular pose estimation and ob-

ject recognition among others. The more information

available about the nature of these entities, the better

are the chances to solve the correspondence problem

in a more efﬁcient and robust way. With respect to

contour models, the simplest and most common rep-

resentation in the literature uses parametric functions

(Zhang, 1994). Active contour models, also known as

”snakes” are also widely used for motion tracking and

stereo matching (Kass et al., 1987).

Recently, geometric algebra (Sommer, 2001) has

been introduced in computer vision as a problem

adaptive algebraic language in case of modelling ge-

ometric related problems. It turned out that the con-

formal geometric algebra (CGA) is especially useful

because its ability of handling stratiﬁed geometrical

spaces (Rosenhahn and Sommer, 2005a). The basic

geometrical entities (e.g. points, lines and planes) can

be embedded in the conformal space, see (Rosenhahn

and Sommer, 2005a). Also the rigid body motion has

a linear representation (called motor) with respect to

all geometric entities derived from spheres. In the

work (Rosenhahn et al., 2004), sets of coupled twists

are used to model free-form contours and surfaces in

the framework of conformal geometric algebras. In

a further work (Rosenhahn and Sommer, 2005b) the

pose estimation constraints (point-line, point-plane

and line-plane) were also used in that algebra. We

propose a new local representation of free-form con-

tours which allows to extract local structural informa-

tion, which can be also embedded in CGA. Thus, it is

also compatible with the pose estimation constraints.

Finding correspondences is one of the most chal-

lenging problems for computer vision applications.

Two points correspond to each other if a similarity

criteria is fulﬁlled. The most common and simple ap-

proach is the ICP algorithm (Besl and McKay, 1992).

Zhang (Zhang, 1994) uses a modiﬁed ICP algorithm

to deal with the occlusion problem. ICP algorithms

combined with different metrics are also used, for ex-

ample point-point (Benjemaa and Schmitt, 1997) and

point-line (Dorai et al., 1997). Chen and Medioni

(Chen and Medioni, 1992) use the sum of square dis-

tance between scene and model point in their ICP

variant. An extension of this work was made by Do-

rai and Jain (Dorai et al., 1997), where an optimal

uniform weighting of points is used. A compari-

341

A. Chavarria M. and Sommer G. (2007).

STRUCTURAL ICP ALGORITHM FOR POSE ESTIMATION BASED ON LOCAL FEATURES.

In Proceedings of the Second International Conference on Computer Vision Theory and Applications - IU/MTSV, pages 341-346

 SciTePress

son of variants of the ICP algorithm is presented in

(Rusinkiewicz and Levoy, 2001), where the different

variants are applied to align artiﬁcially generated 3D

meshes. The above cited methods assume that the

scene is almost aligned with the model (tracking as-

sumption). Since these variants use as feature only

point information, it is possible to optimize the algo-

rithms for real time applications. Other methods com-

bine the ICP algorithm with other image processing

approaches like optical ﬂow (Rosenhahn et al., 2006)

or bounded Hough transform (Shang et al., 2005).

These methods seem to be robust but they are very

time consuming, not suitable for real time applica-

tions. All the methods based on punctual informa-

tion have to consider the tracking assumption in or-

der to perform efﬁciently. For the case of the ICP

variants combined with complex image processing

approaches, the tracking assumption can be slightly

overcome in some cases.

The basic variant of the ICP algorithm ﬁnds cor-

responding point pairs (image-model) by measuring

the minimal Euclidean distance. In this case point co-

ordinates can be considered as a local feature. One

important question when analyzing local features is

how ”local” actually the feature should be. The mini-

mal entity which can be described is a point. The only

feature available is its position in the 3D space. A sin-

gle point does not give much information about the

object in general. From two neighbor points the local

orientation can be derived and three neighbor points

are enough to get local curvature. As the neighbor-

hood is increased, more feature information can be

extracted and therefore, more information about the

nature of the object.

In this paper we present a new variant of structural

ICP algorithm, which integrates local features (from

model and image) and the structural phase informa-

tion delivered from the monogenic signal (Felsberg

and Sommer, 2001). One advantage of our ICP vari-

ant is that it can be perfectly applied for free-form

contours and it is robust against the tracking assump-

tion. A local 3D contour representation is used to

extract a feature set for contour segments, like con-

cavity, convexity and straightness. Our ICP variant

reaches a compromise between computational cost

and robustness against the tracking assumption.

For image feature extraction we use the mono-

genic scale-space approach presented by Felsberg

and Sommer (Felsberg and Sommer, 2004), which is

brieﬂy described in section 2. In section 3 we intro-

duce the local representation of 3D contours based on

local motors. The feature set is obtained from a single

motor and the extended set is obtained from contour

segments. The ICP structural algorithm is introduced

in Section 4. Finally, in Section 5 experiments made

on synthetical and real data are presented to validate

the efﬁciency and robustness of our algorithm.

2 IMAGE FEATURES IN

SCALE-SPACE

The monogenic scale-space representation and phase-

based image processing techniques were introduced

in (Felsberg and Sommer, 2004). If p(x;s) and q(x;s)

are the ﬁlter responses of an image convolved with the

Poisson and conjugate Poisson kernels respectively,

local amplitude a(x;s) and phase r(x;s) are obtained

for a scale s as shown in equation (1).

a(x;s) =

|q(x;s)|

+ |p(x;s)|

r(x;s) =

q(x;s)

|q(x;s)|

arctan



|q(x;s)|

p(x;s)



(1)

The local amplitude is related to the local energy

of the signal. The local orientation and local phase are

combined in the local phase vector. The local phase

gives information about the local symmetry of the sig-

nal and the local orientation gives the orientation of

the highest signal variance. Then, for an edge point

we chose the local features orientation and phase an-

gles in x and y directions F

= {φ

, k r

k, k r

k}.

Once that the local amplitude and phase are ob-

tained for a scale factor s, a contour search algorithm

based on the local amplitude and orientation is ap-

plied to extract the contour segments. By changing

the scale factor, low contrast edges can also be de-

tected.

3 LOCAL CONTOUR

REPRESENTATION

The idea of the local representation is to construct a

motor to approximate a contour segment. A motor is

parameterized by a rotation axis and angle. This is

illustrated in the ﬁgure 1. A plane is constructed with

the 3D points x

i−1

, x

i+1

, which is parameterized by

its normal n and distance to the origin d. In that plane,

a local coordinate system is deﬁned by

−x

i−1

−x

i−1

, i

= n, i

×i

(2)

To ﬁnd the rotation axis of the motor we need to

calculate the center of the circle. To make the com-

putations easier, the problem is translated from 3D to

2D. That is the plane deﬁned by the basis vectors i

and i

(see right picture of ﬁgure 1). The center of the

Figure 1: Local motor for a 3D contour (left). Local coor-

dinate system (middle) needed to get the circle parameters

of the motor and the structural features (right).

circle c and the radius vector r are easily calculated in

2D. Then the coordinates of the center of the circle in

3D are recovered. Thus, the rotation axis of the mo-

tor in 3D is obtained with the center c and the normal

vector n. The rotation angle θ

is the angle deﬁned by

the segment

i−1

i+1

. Finally the orientation vector

is deﬁned by the orthogonal to the radius vector r

For every point of the 3D contour the local curva-

ture vector and bending angle are calculated by

= (x

− x

i−1

) ×(x

i+1

− x

)

= acos

−x

i−1

)·(x

i+1

−x

)

k(x

−x

i−1

)kk(x

i+1

−x

, (3)

where the points x

, x

i+1

and x

i−1

are considered in

the local coordinate system. In this case, the e

com-

ponent of the resulting curvature vector k

= x

+ x

changes its sign when the point is con-

cave or convex. When the scalar x

has a negative

sign, the point is considered locally convex. Other-

wise, it will be locally concave. If the bending angle

has a value closed to zero, the point is considered

as a part of a straight line.

An extended feature set allows to get more robust

features, especially in the image plane where noise is

present and digital contours are extracted. In this case

we are getting features not only from a single point.

The neighborhood of the point is extended to larger

segments in order to take average feature values as

shown in equation (4).

∑

j=1

× v

∑

j=1

acos

·v

kkv

, (4)

where v

= x

− x

i−j

and v

= x

i+j

− x

By taking the point x

as a reference, motors

are constructed iteratively with the adjacent points.

Then the contour segment is deﬁned by the points

i−j

···x

i−1

, x

i+1

···x

i+j

} and the features of that

point corresponds to the structure of the neighbor-

hood.

3.1 3d and 2d Contour Features

We deﬁne the following structural features for a 3D

point x

= {o

, k

, β

}, (5)

where o

is the local orientation vector at the point x

is the curvature vector and β

the bending angle. To

get the corresponding 2D features, the contour model

points are projected onto the image plane ( see ﬁgure

2), motors are constructed and the features are cal-

culated as described in the last section with the cor-

responding points in image coordinates x

′

i−1

, x

′

and

′

i+1

. The normalized orientation vector o

′

is obtained

and its corresponding orientation angle α

The concept of phase in the image plane delivers

information of the local structure of the image derived

from the monogenic signal. In the case of edges, the

phase encodes a transition from one gray value to an-

other in x and y directions. For 3D contours it is not

possible to compute directly phase information in that

sense. Despite of that, it is possible to assign a fea-

ture value for a projected 3D contour point that rep-

resents such transition. We call this feature transition

index. Figure 2 shows the idea of transitions t

and t

for a point. The transition takes the values +1 or −1

(equivalent to the phase responses k r

k and k r

depending on the orientation of the vector o

′

. Thus,

for a projected 3D contour point we obtain as features

the orientation and transition indexes in x and y direc-

tions F

con

= {α

, t

} .

Figure 2: Example of motor construction and the transition

index in the image plane (left). Transition index of an pro-

jected model contour and phase response of the monogenic

signal (right). The lines show the corresponding pairs of

image and model.

4 STRUCTURAL ICP VARIANT

Our ICP variant combines error metrics with image

feature constraints. Thus, in the image plane we

have the following feature sets for projected model

segments F

2Dm

= {α

, t

, k

2Dm

, β

2Dm

} and for de-

tected contour segments F

2Dp

= {φ

, k r

k, k r

, k

2Dp

, β

2Dp

}. Two points (image and model) form a

correspondence pair if the structural constraints are

met. The phase-transition index constraint is deﬁned



1 if k r

k= t

∧ k r

k= t

0 otherwise

(6)

In the following we will use the sign ∧ to denote the

logical ”and” operation. The straightness constraint is

deﬁned from the local bending angles β

2Dm

and β

2Dp



1 if β

2Dm

< t ∧ β

2Dp

< t

0 otherwise

, (7)

where t is a threshold value. Finally, the concavity-

convexity constraint is deﬁned from the sign of the e

component of the vectors k

2Dm

= x

+ x

and k

2Dp

= y

+ y



1 if sign(x

) = sign(y

) ∧ C

= 0

0 otherwise

(8)

Figure 3: Example of correspondence pairs for normal (left)

and structural (right) ICP variants.

The ﬁgure 3 shows the idea of ICP combined with

structural constraints (straight, concave or convex).

The ﬁgure on the left shows the case where only the

minimal distance is considered, on the right for the

structural variant. As can be seen, for a point in

the bottom curve, its corresponding point in the up-

per curve will be the nearest point with the same lo-

cal structure. This is analogous for the ICP plus the

phase-transition index constraint, see left picture of

ﬁgure 2.

5 EXPERIMENTS

We used for our experiments 3D planar contour mod-

els (see ﬁgure 4) rich in structure like the ”cactus” and

”puzzle” models and also the ”mouse” model, which

has less structure. In the ﬁrst experiment we compare

the convergence behavior of a normal ICP algorithm

and our structural ICP variant. The initial position of

the model is known, then it is translated and rotated to

its actual position and projected onto the image plane

to generate an artiﬁcial image. On this artiﬁcial im-

age the corresponding contour segments and the lo-

cal features are extracted. Then the pose is calculated

and compared with the ground truth. For these exper-

iments relatively large displacements were applied to

Figure 4: The object is translated in all directions in the

plane. For every translation the pose is calculated and com-

pared with the ground truth (left). Different models used in

the experiments (right): cactus, puzzle and mouse models.

Figure 5: Convergence sequence for normal (top row) and

structural (bottom row) ICP variants applied to the cactus

model.

the model in order to test the robustness against the

tracking assumption.

In the sequence of images in the ﬁgure 5, we com-

pare the convergence behavior of a normal ICP algo-

rithm against our structural variant when the tracking

assumption is not met. For such cases, the pose esti-

mation algorithm with the normal ICP variants does

not converge to the actual model position. A direct

comparison of the convergence behavior can be seen

in the the ﬁrst row of ﬁgure 6. Two different pose

estimation algorithms were tested with our ICP vari-

ant, the 2D-3D (Rosenhahn and Sommer, 2005b) and

projective ones (Araujo et al., 1998). In both cases,

the structural ICP variants needs less iterations to con-

verge.

The normal variants of the ICP algorithm consider

as a correspondence constraint only the Euclidean

distance plus a weighting error factor or a different

search strategy. This has the effect that, in the ﬁrst

iterations many bad conditioned correspondences are

found and therefore the convergence is slower or in

some cases, the algorithm does not converge at all.

The structural variant will also consider the constrains

of equations (6), (8) and (7). This increases the prob-

ability to ﬁnd better conditioned correspondences and

therefore the convergence rate of the algorithm is in-

creased.

A second experiment was made to test the robust-

ness of our algorithm against the tracking assumption.

For this case, the model was rotated around its z axis

for zero to 50 degrees. As can be seen in the second

row of ﬁgure 6, with the structural ICP algorithm the

pose error is minimal for rotations up to 30 degrees for

0 10 20 30

100

Iteration

Absolute error (mm)

2D−3D algorithm

Normal ICP

Structral ICP

0 5 10 15

Iteration

Projective algorithm

Normal ICP

Structral ICP

Absolute error (mm)

0 10 20 30 40 50

100

150

200

Absolute error (mm)

2D −3D algorithm

Normal ICP

Structral ICP

Rotation angle (degree)

0 10 20 30 40 50

Rotation angle (degree)

Absolute error (mm)

Normal ICP

Structral ICP

Projective algorithm

Figure 6: First row, convergence behavior comparisons of

the normal and structural ICP variants applied to the 2D-

3D pose estimation algorithm (left) and the projective algo-

rithm (right). Second row, robustness against rotations for

the 2D-3D (left) and the projective algorithms (right).

the 2D-3D algorithm and 40 degrees for the projective

one. This shows that our structural ICP variants al-

lows larger model rotations than the normal ICP vari-

ants. The robustness of the structural ICP algorithm

against the tracking assumption depends on the na-

ture of the object and its contour. For contours which

are rich in structural information larger rotations and

translations are allowed.

The next experiment was made to test the mag-

nitude and direction of the maximal possible transla-

tions allowed for the ICP structural algorithm. In this

case, as can be seen in ﬁgure 4, the object model was

translated to all directions in the plane where it is de-

ﬁned. For every position the pose was calculated with

the structural ICP algorithm and the projective pose

estimation (Araujo et al., 1998).

The results for the cactus, puzzle and mouse mod-

els are shown in ﬁgure 7. These ﬁgures show the con-

vergence regions of the algorithm when translations

are applied. For the cactus, the algorithm is more sen-

sitive to translations in y direction, which corresponds

to translations in the major axis direction (see ﬁgure

4), while relatively large translations are allowed in x

direction (minor axis direction). The same effect can

be seen for the puzzle model. The ﬁgures show that

for certain positions the correspondence search is bet-

ter conditioned. As the translation increases, the prob-

ability to ﬁnd more bad conditioned correspondences

also increases and therefore the pose error. The puz-

zle model and the cactus are complex objects, with

enough structure to deal with relatively large trans-

lations. The bottom ﬁgure shows the result for the

mouse model. In this case the mouse model does not

have much structural information. Therefore, as can

be seen in the ﬁgure 7, for large translations the error

100

−100

100

Translation in X

(mm)

Pose error for the translation case

Translation inY

(mm)

Pose error (mm)

100

−100

100

Translation in X

(mm)

Pose error for the translation case

Translation in Y

(mm)

Pose error (mm)

−50

100

150

Translation in X

(mm)

Pose error for the translation case

Translation in Y

(mm)

Pose error (mm)

Figure 7: Pose error for the translation case for the cactus

model (top left), for the puzzle model (top right) and for the

mouse model (bottom).

increases considerably.

Finally, we applied our algorithm to image se-

quences of a real scenario. The algorithm was tested

on a Linux based system with a 3 Ghz. Intel Pentium

4 processor. Some examples of the test sequences are

shown in the ﬁgure 8. The left column shows the ini-

tial position of our model and the right column the

pose result using our ICP structural variant and the

projective pose estimation algorithm. For every im-

age the monogenic signal response was obtained and

a contour search algorithm based on the local orien-

tation and phase information was applied to detect

the edge segments, then from these detected contour

points the structural features were calculated. The av-

erage computing time per frame, for the hole image

processing module, was 225 milliseconds. Due to the

relatively large displacement of the object, more iter-

ation steps are needed for the algorithm to converge

and therefore the computational time increases. For

these sequences the average computation time (image

processing plus pose estimation) was 2.65 seconds.

6 CONCLUSIONS AND FUTURE

WORK

A new variant of the ICP algorithm for pose estima-

tion of 3D free-form contours based on local struc-

tural model and image features was presented. The

experimental test proved that our structural ICP algo-

rithm performs efﬁciently for rich structured objects,

for large translations and rotations between scene and

object model. The experiments show that our ICP

algorithm combined with the projective pose estima-

Figure 8: Initial position (left images) and estimated pose

(right images) for the cactus, puzzle and mouse model.

tion approach can handle larger object displacements.

That means, the feature constraints used to search cor-

respondences and the pose estimation constraints in-

volved in the minimization problem are better con-

ditioned in the image plane. Although our approach

does not reach requirements for real time applica-

tions (Rusinkiewicz and Levoy, 2001), the computa-

tion times reported for the test sequences are a good

tradeoff if we consider that the tracking assumption

has been signiﬁcatively overcome. A natural exten-

sion for our approach is to consider the pose estima-

tion of non-planar free-form contours and surfaces

and to combine local and global structural features

(from model and image) to develop an approach capa-

ble to deal with even larger translations and rotations.

REFERENCES

Araujo, H., Carceroni, R., and Brown, C. (1998). A

fully projective formulation to improve the accuracy

of Lowe’s pose-estimation algorithm. Comput. Vis.

Image Underst., 70(2):227–238.

Benjemaa, R. and Schmitt, F. (1997). Fast global regis-

tration of 3d sampled surfaces using a multi-z-buffer

technique. In NRC ’97: Proceedings of the Interna-

tional Conference on Recent Advances in 3-D Digital

Imaging and Modeling, page 113, Washington, DC,

USA. IEEE Computer Society.

Besl, P. and McKay, N. (1992). A method for registration

of 3-d shapes. IEEE Transactions on Pattern Analysis

and Machine Intelligence, 14(2):239–256.

Chen, Y. and Medioni, G. (1992). Object modelling by reg-

istration of multiple range images. Image Vision Com-

put., 10(3):145–155.

Dorai, C., Weng, J., and Jain, A. (1997). Optimal registra-

tion of object views using range data. IEEE Trans-

actions on Pattern Analysis and Machine Intelligence,

19(10):1131–1138.

Felsberg, M. and Sommer, G. (2001). The monogenic

signal. IEEE Transactions on Signal Processing,

49(12):3136–3144.

Felsberg, M. and Sommer, G. (2004). The monogenic scale-

space: A unifying approach to phase-based image pro-

cessing in scale-space. J. Math. Imaging Vis., 21(1):5–

26.

Kass, M., Witkin, A., and Terzopoulos, D. (1987). Snakes:

Active contour models. International Journal of Com-

puter Vision, 4(1):321–331.

Rosenhahn, B., Brox, T., Cremers, D., and Seidel, H.

(2006). A comparison of shape matching methods for

contour based pose estimation. In 11th International

Workshop on Combinatorial Image Analysis (IWCIA).

Berlin, Germany, LNCS Springer-Verlag.

Rosenhahn, B., Perwass, C., and Sommer, G. (2004). Free-

form pose estimation by using twist representations.

Algorithmica, 38:91–113.

Rosenhahn, B. and Sommer, G. (2005a). Pose estimation

in conformal geometric algebra, part I: The stratiﬁca-

tion of mathematical spaces. Journal of Mathematical

Imaging and Vision, 22:27–48.

Rosenhahn, B. and Sommer, G. (2005b). Pose estimation in

conformal geometric algebra, part II: Real-time pose

estimation using extended feature concepts. Journal

of Mathematical Imaging and Vision, 22:49–70.

Rusinkiewicz, S. and Levoy, M. (2001). Efﬁcient variants

of the ICP algorithm. In Proceedings of the Third Intl.

Conf. on 3D Digital Imaging and Modeling, pages

145–152, Quebec City, Canada.

Shang, L., Jasiobedzki, P., and Greenspan, M. (2005). Dis-

crete pose space estimation to improve ICP-based

tracking. In 3DIM ’05: Proceedings of the Fifth In-

ternational Conference on 3-D Digital Imaging and

Modeling, pages 523–530, Washington, DC, USA.

IEEE Computer Society.

Sommer, G., editor (2001). Geometric Computing with Clif-

ford Algebras. Springer-Verlag, Heidelberg.

Zhang, Z. (1994). Iterative point matching for registration

of free-form curves and surfaces. Int. J. Comput. Vi-

sion, 13(2):119–152.