In this study the focus is on the situation where
a vehicle is moving on a planar surface. The vi-
sion input is predominantly distributed in one dimen-
sion, allowing the rotation and translation to be deter-
mined by analysis of the image point set covariance
structure. Errors in the visual input are discarded re-
cursively based on a priori motion estimates from a
Kalman Filter (KF) where GPS is fused with the vi-
sion estimates. The proposed algorithm is evaluated
against ground truth in a realistic outdoor experimen-
tal setup.
2 MATERIAL AND METHOD
2.1 Visual Motion Estimation
The problem of feature based visual motion estima-
tion is to determine the rotation R and translation T of
corresponding points in two or more successive image
pairs, Q
c
the current, and Q
p
the previous position:
Q
c
= Q
p
R+ T (1)
In this study a new method for visual motion estima-
tion based on feature matching will be introduced.
The method address the situation when the corre-
sponding points between two successive stereo image
pairs give a point set that is predominately distributed
in one dimension. This is a situation that is likely to
occur using computer vision for outdoor navigation in
open field conditions. In this case detectable and re-
liable features are often distributed along the horizon
or boundaries in the landscape.
The method is based on an analysis of the point
sets covariance structure. If the points are predomi-
nately distributed along a line the eigenvalues of their
covariance matrix will have one large eigenvalue, if
the points are distributed in a planar surface it will
have two large values, and if the points are well dis-
tributed in space it will have three nearly even val-
ues. In the last case the method by (Matthies, 1989)
or (Stephen Se and Little, 2005) may be used. How-
ever, in the case where the corresponding points are
not well distributed these method are ill posed and
will become numerical unstable.
2.1.1 The 2d Case
In this study we will only consider the situation where
a robot is moving on a planar surface and hence its
relative movement is restricted to take place only in
two dimension.
Let there be n corresponding points c
i
= (x
i
,y
i
)
and p
i
= (x
i
,y
i
)in Q
c
and Q
p
between two successive
stereo pairs. The covariance matrices Γ
c,p
of Q
c
and
Q
p
is then determined.
The eigenvalues λ
c,p
and vectors
~
ν
c,p
of Γ
c,p
are
obtained and sorted in descending order according to
the eigenvalues. The eigenvector due to the largest
eigenvalue will correspond to the line which fits the
point set with the lowest variance, i.e. corresponding
to the line obtain by orthogonal regression (Jackson,
1991). As we have corresponding points in the two
sets the rotation between the lines will correspond to
the rotation of the robots due to its movement. The
rotation (θ) between the two lines is easily determined
by:
θ = arccos(
~
ν
c
·
~
ν
p
) (2)
The point set from the previous position Q
p
may now
be counter rotated so it becomes aligned with the
robots local coordinate system for the current posi-
tion. To account for the uncertainty in the 3D re-
construction the stereo error is modeled according to
the method introduced in (Matthies and Shafer, 1987).
As a result the translation is calculated as a weighted
mean using:
w
i
= (det(κ
i
) + det(ψ
i
))
−1
(3)
as a weight for point i. κ
i
and ψ
i
are the covariances of
the two points c
i
and p
i
due to the stereo error model.
The final estimate of the translation is then:
T =
1
∑
n
i=1
w
i
n
∑
i=1
w
i
(c
i
− p
i
) (4)
where p
i
is rotated according to θ.
The covariance estimate for the translation is
found as the pooled covariance of κ and ψ where ψ
is rotated according to θ.
2.1.2 The 3d Case
In the three dimensional case the proposed method is
not as applicable. In this situation it will be necessary
to find estimates of the yaw, pitch, and roll angles for
the line in space. This will clearly be difficult to esti-
mate for a degenerated point set.
2.1.3 Feature Detection, Description, and
Matching
For feature detection, description and matching the
method introduced by (Brown et al., 2005) with mi-
nor modifications is used. First the input image is in-
crementally smoothed with an Gaussian kernel. Next
an image pyramid is constructed by down sampling
of the image at the scale just above the current, as il-
lustrated in Figure 1. Extreme locations is found by