Uncertainty Hypervolume in Point Feature-Based Visual Odometry
InJun Mun
1a
and Sukhan Lee
2,* b
Intelligent Systems Research Institute, Department of Artificial Intelligence,
Sungkyunkwan University, Suwon 16419, South Korea
Keywords: Localization, Visual Odometry, Essential Matrix, Optimal Feature Selection, Uncertainty Hypervolume,
Bucketing.
Abstract: Visual odometry based on point feature matching has been well-established. Notably, methods based on
essential and fundamental as well as homography matrices have been widely used. It is known that the
accuracy of visual odometry is affected by the choice of matched feature point pairs. However, no
mathematically rigorous formula relating the choice of feature point pairs to the uncertainty involved in visual
odometry is available. Instead, point selection heuristics based on feature point distribution combined with
RANSAC-based refinement are mostly adopted to ensure accuracy. In this paper, we present “Uncertainty
Hypervolume” as a rigorous mathematical formula that relates the selected feature point pairs to the
uncertainty of visual odometry. The uncertainty hypervolume associated with selected feature point pairs
provides a precise metric for evaluating the selected feature point pairs and the resulting visual odometry.
This metric is useful in practice not only for selecting the best feature point pairs but also for selecting poor
feature point pairs available for visual odometry. Furthermore, it accurately identifies the uncertainty in visual
odometry, which helps better manage the performance of visual odometry applications.
1 INTRODUCTION
Visual odometry (
Nistér et al., 2004)
is a fundamental
technique in computer vision and robotics that
facilitates estimating camera movement by analyzing
sequential images. The accuracy of pose estimation,
a critical aspect of visual odometry, depends not only
on the chosen pose estimation method but also
significantly on the quality of the selected
correspondence feature points. S. Poddar, R. Kottath,
and V.Karar (Poddar et al., 2019) conducted a
comprehensive review of feature selection strategies
for visual odometry, outlining key steps including
feature detection, description, inlier/outlier detection,
feature distribution, and consideration of feature
quality. This multi-step process emphasizes the
intricate relationship between the accuracy of pose
estimation and the characteristics of the selected
feature points.
In feature selection, the initial removal of outliers
is paramount, as mismatched feature pairs can lead to
a
https://orcid.org/0009-0009-8524-0719
b
https://orcid.org/0000-0002-1281-6889
* Corresponding author
erroneous pose estimation. To solve this issue,
pioneering work by Fischler (Fischler et al., 1981)
introduced the random sample consensus (RANSAC)
algorithm, which uses geometric constraints to
remove outliers from the feature set effectively.
However, the effort to improve accuracy in pose
estimation extends beyond the methodological level.
Researchers have recognized that the distribution and
uniformity of corresponding feature points in space
also play a critical role in determining visual
odometry performance (Cvišić et al., 2015). As
pointed out in Poddar's review, traditional feature
selection methods often result in a non-uniform
distribution of feature points across the image. As a
result, clusters of closely spaced feature points can
lead to suboptimal pose estimation results.
To overcome this limitation and achieve
improved accuracy, an innovative approach has
emerged. This approach, known as the bucketing
technique (Zhang et al., 1995, Kitt et al., 2010)
attempts to achieve uniformity in the distribution of
correspondence feature points. By partitioning the
290
Mun, I. and Lee, S.
Uncertainty Hypervolume in Point Feature-Based Visual Odometry.
DOI: 10.5220/0013019300003822
Paper published under CC license (CC BY-NC-ND 4.0)
In Proceedings of the 21st International Conference on Informatics in Control, Automation and Robotics (ICINCO 2024) - Volume 2, pages 290-299
ISBN: 978-989-758-717-7; ISSN: 2184-2809
Proceedings Copyright © 2024 by SCITEPRESS Science and Technology Publications, Lda.
image into a grid of M × M buckets and selecting only
a small number of features from each bucket, the
bucketing technique ensures a well-distributed
selection of features. In particular, this uniform
distribution of features has the potential to improve
both the accuracy and computational efficiency of
pose estimation.
The previously mentioned approaches for feature
selection aimed to select dependable features through
the removal of outliers or consideration of feature
point distribution. While these methods have proven
effective in enhancing the accuracy of VO, they do
not ensure optimality in terms of mathematical
formalism, which guarantees the minimum
uncertainty of VO.
Recently, the concept of the Orthogonality Index
(Nguyen and Lee, 2019) has been introduced to
analytically derive optimal feature selection. This
approach demonstrates optimal feature selection
through a well-defined mathematical format instead
of random selection. The process increases the
orthogonal exponent of individual equations and
applies constraints to computation to reduce
uncertainty when estimating Essential, Fundamental,
or Homography matrices associated with visual
odometry. However, while the Orthogonality Index
provides a mathematical method for optimal feature
selection, they do not account for uncertainty in
feature points due to measurement or other noise.
This issue must be addressed as it significantly
impacts VO estimation. Therefore, a method that
reflects these factors is necessary to ensure optimal
feature selection.
To this end, our study capitalizes on insights
gained from simulation experiments, which have
shown that the measurement error variance and the
spatial distribution of the extracted feature points
significantly affect pose estimation. We propose a
novel approach that incorporates both of these
factors.
Our approach can be summarized as follows: If
the matched feature point pairs are well-matched with
minimal measurement error and are uniformly
distributed throughout the image, the estimated
essential matrix is expected to be close to the ground
truth essential matrix. However, due to the
uncertainty of the matched feature point pairs used to
estimate the essential matrix and the error of the
equations generated using them, the estimated
essential matrix forms a stochastically distributed
distribution centered on the ground truth(GT). We
experimentally demonstrate that the degree of
dispersion depends on the magnitude of the
uncertainty in estimating the essential matrix. We
found that the spatial distribution they form should be
taken into account when selecting matching feature
point pairs, and present a novel "Uncertainty
Hypervolume" approach that takes both into account.
The estimated essential matrix is stochastically
distributed around the reference ground truth
essential matrix, and we quantify this with
hypervolume. Through experiments, we show a
significant correlation between hypervolume and the
error of the pose derived from the essential matrix.
Based on these results, we propose a mathematically
well-structured Uncertainty Hypervolume based
approach for feature point pair selection to obtain the
optimal solution.
In the following sections, we detail our
methodology, experimental setup, and results,
culminating in a comprehensive analysis of the
interplay between feature selection, spatial
distribution, and pose estimation accuracy.
2 PROBLEM DEFINITION AND
APPROACH
2.1 Preliminary
Figure 1: Epipolar Geometry, A 3D point 𝑃 is projected
onto the normalized image plane of each camera at 𝑝 and 𝑞.
The points 𝑒 and 𝑒′ where the line connecting the two
camera origins and the image plane meet are called epipole,
and the straight lines 𝑙 and 𝑙′ connecting the projection
points and the epipole are called epiline (epipolar line).
In epipolar geometry (Deriche et al., 1994), given a
point 𝑃 in space, cameras 𝐶
and 𝐶
view the point 𝑃
from two different perspectives. The point 𝑃 is then
projected onto the normalized image plane of each
camera 𝐶
and 𝐶
as 𝑝 and 𝑞 ( 𝑝 and 𝑞 are
homogeneous normalized image coordinates). It is
known that there is always a 3x3 essential matrix
(Nistér, 2004), 𝑬 between the projected points 𝑝 and
𝑞 that satisfies the epipolar constraint 𝑝
𝐸 𝑞= 0.
Uncertainty Hypervolume in Point Feature-Based Visual Odometry
291
This essential matrix obeys the following
constraints.
𝑑𝑒𝑡
(
𝐸
)
=0 (1)
2𝐸
𝐸
𝐸
−𝑡𝑟
(
𝐸
𝐸
)
𝐸
=0 (2)
The second expression is a matrix constraint that
gives nine equations for the elements of 𝑬. However,
only two of them are algebraically independent. Thus,
with the two essential matrix constraints mentioned
above, we can determine the essential matrix with
only five corresponding point pairs (Deriche et al.,
1994).
Once 𝑬 is determined, the rotation matrix 𝑹 and
the translation vector 𝒕 can be obtained by performing
a Singular Value Decomposition (SVD).
While 𝑹 has 3 degrees of freedom and 𝒕 has 3
degrees of freedom, if we consider the essential
matrix as a projection element, it has 5 degrees of
freedom with the scale factor removed. Therefore, we
can estimate 𝑬 with five pairs of corresponding
feature points and the Essential Matrix constraint.
The epipolar constraint 𝑝
𝐸 𝑞 = 0 can be
expressed simply as follows.
𝑣
𝐸
=0 (3)
where,
𝑣=
𝑝
q
,𝑝
𝑞
,𝑝
𝑞
,𝑝
𝑞
,𝑝
𝑞
,𝑝
𝑞
,𝑝
𝑞
,𝑝
𝑞
,𝑝
𝑞
(4)
𝑎𝑛𝑑 𝐸
=
𝐸

,𝐸

,𝐸

,𝐸

,𝐸

,𝐸

,𝐸

,𝐸

,𝐸

(5)
𝑬 can be determined based on the five pairs of
corresponding feature points, 𝑝 and 𝑞 that define the
following 5x9 matrix equation:
𝐴 𝐸
=0 (6)
where, 𝐴=
𝑣
𝑣
𝑣
𝑣
𝑣
(7)
Then, VO between the two camera frames can be
derived from E obtained by (1), (2), and (6).
2.2 Problem Definition
The accuracy of the estimated essential matrix
determines the accuracy of the transformation
relationship between the two cameras. This is
equivalent to the performance of Visual Odometry. In
VO, we use pairs of corresponding feature points that
match in both image planes to compute the essential
matrix. In other words, it is obvious that the accuracy
of the estimated essential matrix will increase if the
pairs of corresponding feature points with good
quality and evenly distributed in the image plane are
selected and computed. In this paper, we investigate
how the accuracy of VO is affected by selecting well-
distributed and high-quality corresponding feature
points when estimating the essential matrix. We
propose a new approach to feature point selection
using a canonicalization metric called "Uncertainty
Hypervolume".
2.3 Approach
Corresponding pairs of feature points (𝑝
,𝑞
) have
uncertainties due to measurement error, matching
error, noise error, etc. In the epipolar constraint of (3),
the solution of 𝐸 lies in the space perpendicular to 𝑣
.
The uncertainty of 𝑣
leads to the uncertainty of 𝐸,
and the estimated 𝐸 is stochastically distributed
around the GT due to the uncertainty of 𝑣
. The
solution subspace formed around the GT changes in
size as a function of the error associated with 𝑣
, i.e.,
the larger the uncertainty of the corresponding pair of
feature points, the more stochastically spread the
solution subspace becomes. This is equivalent to
saying that the volume size of the solution subspace
represents the uncertainty. From now on, we will
refer to the size of the solution subspace in higher
dimensions as the "Hypervolume". Our goal is to
choose (𝑝
,𝑞
) for 𝑣
such that the size of this
hypervolume is minimized (i.e., we choose 𝑣
such
that the uncertainty of the solution is small). To
achieve our goal, we will explore how the uncertainty
of a corresponding pair of feature points affects the
uncertainty of the solution subspace, and more
specifically, we will define and explain the concept of
a hypervolume.
Consider the simplest quadratic form of the
problem (the solution of the Essential matrix we want
to find is high-dimensional, with 9 dimensions. So,
we extend the concepts from lower to higher
dimensions).
𝑎
𝑏
𝑎
𝑏

𝑥
𝑥
=
𝑐
𝑐
(8)
Let
𝐴=
𝑎
𝑏
𝑎
𝑏
, 𝑋=
𝑥
𝑥
, 𝑐=
𝑐
𝑐
and the
solution we want to find is 𝑋. Knowing 𝐴 and 𝑐, there
is only one solution ( 𝑎
≠𝑎
,𝑏
≠𝑏
,𝐴0).
However, if we consider the case where there is
uncertainty due to the error of 𝐴, it is equivalent to
(9).
𝑎
±∆𝑎
𝑏
±∆𝑏
𝑎
±∆𝑎
𝑏
±∆𝑏

𝑥
𝑥
=
𝑐
𝑐
(9)
𝐴′
=
𝑎
±∆𝑎
𝑏
±∆𝑏
𝑎
±∆𝑎
𝑏
±∆𝑏
(10)
ICINCO 2024 - 21st International Conference on Informatics in Control, Automation and Robotics
292
Figure 2: (Left) Graph plotted when the equation of two
linear lines has a negative slope. The intercept is determined
by ±∆𝑎
, ±∆𝑏
, ±∆𝑎
, and ±∆𝑏
, which is the variance due
to uncertainty. (Right) When one linear equation has a
positive slope and another has a negative slope. There are a
total of four possible cases where 𝐴
,
(
𝑖=1,,4
)
can be
determined, but the equations of this line have four vertices,
and the trapezoid is the hypervolume.
The magnitude of the error is denoted by ∆𝑎
,
∆𝑏
, ∆𝑎
, and ∆𝑏
for the elements of 𝐴 . The
meaning of ± is the variance of the error, which
corresponds to the uncertainty by having a value in
the range. Fig. 2 shows the graph of (9). In the graph,
the solid line is the equation of the original straight
line before adding the error, and the equation of the
straight line represented by the dashed line is the
equation of 𝐴
(𝑖=1,,4) when the error variance
is maximum.
𝐴
=
∆𝑎
∆𝑏
∆𝑎
∆𝑏
, 𝐴
=
∆𝑎
∆𝑏
−∆𝑎
−∆𝑏
(11)
𝐴′
=
−∆𝑎
−∆𝑏
∆𝑎
∆𝑏
, 𝐴′
=
−∆𝑎
−∆𝑏
−∆𝑎
−∆𝑏
The solution of the original system of equations
without error is the intersection of the equations of the
two straight lines represented by the solid lines and is
determined to be one. However, the solution of the
system of equations with the uncertainty given by the
error will probabilistically lie within a trapezoid
whose vertices are the intersections of the equations
of the dotted lines shown in the graph. The larger the
error, the greater the width of this trapezoid, i.e., the
greater the uncertainty of the solution. The
"Uncertainty hypervolume" defined in the previous
section 2.3 corresponds to the area of the trapezoid in
this problem. Thus, by calculating the size of the
hypervolume, we can quantitatively represent the
uncertainty. The hypervolume we are talking about
forms an n-dimensional hypercube depending on the
dimension of the problem to be solved, which is two-
dimensional in the case of equation (9), so it becomes
a two-dimensional trapezoid, that is, the area of the
shape. In this case, the number of vertices that form
the boundary of the shape is 2
, where 𝑛 is the
number of unknown variables. In the next higher
dimension, three-dimensional, we can think of the
hypervolume as a three-dimensional cube, which is a
crumpled cube with eight vertices (one dimension:
line, two dimensions face, three dimensions: cube, ,
𝑛-dimensional: 𝑛-hypercube).
Figure 3: 𝐸

is the essential matrix computed with five
pairs of corresponding feature points.
𝐸
,⋯𝐸

, obtained
by (13), form the vertices of a hypercube, which is a
clustering of groups that have the maximum variance and
are probabilistically likely to be the solution subspace. Its
hypervolume reflects the uncertainty, and the goal is to find
the corresponding feature point set that minimizes it.
Let's return to our original problem and extend the
concept of hypervolume defined in a low-
dimensional space to a higher-dimensional domain.
The essential matrix we want to find is a 3x3 matrix
with 9 elements, as shown in (5). The solution of 𝐸
exists somewhere in the 9-dimensional space and
must be singularized. However, due to the uncertainty
caused by the errors of 𝑣
, the corresponding feature
point pairs, the estimated solution of 𝐸 will be
stochastically distributed around the ground truth 𝐸.
In the case of the 5-point algorithm we use for VO
estimation, the solution can be obtained using only 5
pairs of corresponding feature points due to the
additional constraints (1) and (2). Therefore, using
equation (6) to express the error variance to represent
the uncertainty in the manner of (9), 2
=32 pairs of
v
,(𝑖=1,,32) are generated (13). If the Essential
matrix is estimated using this as input to the 5-point
algorithm, 𝐸
(𝑖=1,32) is generated. This means
that the manifold formed by the two constraints
projects the solution subspace that exists in the 9-
dimension to a lower dimension in the 5-dimension.
Therefore, the 𝐸
(𝑖=1,32) calculated using this
method are stochastically distributed around the
Uncertainty Hypervolume in Point Feature-Based Visual Odometry
293
ground truth 𝐸 and form the outermost vertices of the
solution subspace. Fig. 3 shows the projection of 𝐸 in
9-dimensional space into 3-dimensional space (not an
exact projection, but an approximation for illustration
purposes). The uncertainty due to multiple error
factors is represented by a 5-dimensional hypercube
with 32 vertices, and the solution probabilistically
exists inside the hypercube. The volume of the
hypercube is the hypervolume, and its size represents
the degree of uncertainty. We try to minimize the
uncertainty of VO estimation by selecting a set of
feature points with the minimum hypervolume size.
2.4 Hypervolume Calculation Using
Qhull Algorithm
In the previous section, we detailed the procedure for
acquiring 32 vertices 𝐸
(𝑖=1,…,32) that constitute
the hypervolume and its significance. Additionally, it
was proposed that the size of the hypervolume
corresponds to the level of uncertainty, which can be
quantified through its acquisition. We utilized the
MATLAB function convexhulln to obtain the
hypervolume with the 32 vertices 𝐸
that comprise its
outermost layer. The function is founded on the Qhull
algorithm, which functions in the following manner.
a. Point Sorting: First, use the technique of aligning
data points appropriately. Because data points can
be randomly distributed in higher-dimensional
spaces, setting up the ordered order makes it more
efficient in subsequent steps. Sorted data helps
you calculate convex shells.
b. Centrum Location: Calculates the center position
of the data. This is used to clip points based on the
center position and to quickly calculate the
convex shell. The center position can be related to
the mean or median of high-dimensional data.
c. Create Point Clipping: Use the center position to
clip data points and apply techniques to remove
unnecessary points. This reduces unnecessary
calculations and optimizes memory usage. In
high-dimensional data, many points may not
contribute to the formation of convex shells.
d. Convex Hull of Clipped Points: Calculates the
convex shell for clipped points. This forms most
of the final convex shell. The process of
calculating convex shells for clipped data is
efficient.
e. Return Results: Finally, the Qhull algorithm
returns the calculated convex shell. This gives
results as convex polygons or convex polygons
surrounding a given data point in a high-
dimensional space.
Through these various geometric and
computational techniques, the Qhull algorithm
effectively computes the convex shells of high-
dimensional data. And then, The optimal feature
selection based on the proposed hypervolume method
is described as follows:
Algorithm 1: Hypervolume-based optimal feature selection.
Data: 𝐿 correspondence feature point sets
Result: five-
p
oint sets with the lowest h
y
pervolume
Step 1. Random selection of five feature point sets,
{
(
𝑝
,𝑞
)
, 𝑖=1,,5} from the 𝐿 feature pairs
detected from the images of two camera views
subject to VO. It is worth noting that a bucketing
approach can be incorporated into this step to
improve the initial selection of five feature point
sets.
Step 2. Generate the corresponding coefficient
vectors, {
𝑣
,𝑖=1,⋯,5 }. Then, compute the
essential matrix 𝐸

using the five points algorithm
with {
𝑣
,𝑖=1,⋯,5 } as input.
Step 3. Generate 32 pairs of
𝑣
= ( 𝑝
, 𝑞
), {𝑖 =
1,32} taking into account the error of the
corresponding feature points to { 𝑣
,𝑖=1,⋯,5 }.
With this, the 32-vertex essential matrix is
computed. And, compute the hypervolume of the
hypercube composed of these 32 vertices using the
Convex Hull algorithm.
Step 4. Select feature sets with hypervolume values
less than a threshold 𝐾. Estimate VO using the best
set of selected feature points.
𝑝
=
𝑝
±𝑒
𝑝
±𝑒
𝑝
±𝑒
𝑝
±𝑒
𝑝
±𝑒
𝑝
±𝑒
𝑝
±𝑒
𝑝
±𝑒
𝑝
±𝑒
𝑝
±𝑒
𝑞
=
𝑞
±𝑒
𝑞
±𝑒
𝑞
±𝑒
𝑞
±𝑒
𝑞
±𝑒
𝑞
±𝑒
𝑞
±𝑒
𝑞
±𝑒
𝑞
±𝑒

𝑞
±𝑒

𝑣
= (𝑝
, 𝑞
), {𝑖=1,32} (12)
In Step 1 of Algorithm 1, partitioning the image
into grids using the bucketing technique and
extracting feature points from each grid region
resulted in improved performance regarding running
time and estimation error. The use of the bucketing
method proves to be more efficient in finding
solutions over multiple iterations. This can be
explained by the fact that, as shown in Section 2.3, we
ICINCO 2024 - 21st International Conference on Informatics in Control, Automation and Robotics
294
empirically verified that the pose estimation error is
minimized when a well-spread distribution of feature
points is used as input. In the following, we showcase
our proposed approach and evaluate its practical
effectiveness.
𝑝
=
𝑝
𝑒
𝑝
𝑒
𝑝
𝑒
𝑝
𝑒
𝑝
𝑒
𝑝
𝑒
𝑝
𝑒
𝑝
𝑒
𝑝
𝑒
𝑝
𝑒
(13)
𝑝
−𝑒
𝑝
−𝑒
𝑝
𝑒
𝑝
𝑒
𝑝
𝑒
𝑝
𝑒
𝑝
𝑒
𝑝
𝑒
𝑝
𝑒
𝑝
𝑒
𝑝
−𝑒
𝑝
−𝑒
𝑝
−𝑒
𝑝
−𝑒
𝑝
−𝑒
𝑝
−𝑒
𝑝
−𝑒
𝑝
−𝑒
𝑝
−𝑒
𝑝
−𝑒
𝑞
=
𝑞
𝑒
𝑞
𝑒
𝑞
𝑒
𝑞
𝑒
𝑞
𝑒
𝑞
𝑒
𝑞
𝑒
𝑞
𝑒
𝑞
𝑒

𝑞
𝑒

𝑞
−𝑒
𝑞
−𝑒
𝑞
𝑒
𝑞
𝑒
𝑞
𝑒
𝑞
𝑒
𝑞
𝑒
𝑞
𝑒
𝑞
𝑒

𝑞
𝑒

𝑞
−𝑒
𝑞
−𝑒
𝑞
−𝑒
𝑞
−𝑒
𝑞
−𝑒
𝑞
−𝑒
𝑞
−𝑒
𝑞
−𝑒
𝑞
−𝑒

𝑞
−𝑒

3 EXPERIMENTS
In this section, we assess the effectiveness of our
suggested methodology using real data. The
subsequent text provides an overview of the
experimental setting. We conducted all experiments
utilizing parallel computation on a computer
equipped with an Intel Core i5-9400F CPU operating
at 2.9 GHz in MATLAB. The experimental data was
evaluated using the public RGB-D TUM datasets
freiburg1_desk. In this evaluation, we evaluate the
relationship of the estimation accuracy of the
Essential matrix with the hypervolume using real
data. The input points used in this evaluation were
extracted as ORB feature points (Rublee et al., 2011)
using the detectORBFeatures function of the
Computer Vision Toolbox, and the matching feature
points used to estimate the essential matrix were
obtained using the matchFeatures function.
3.1 Real Data
To evaluate our proposed approach to real-world
images, we used the TUM-RGBD dataset. This data
provides the RGB and depth image data and the
ground truth trajectory data for evaluating the Visual
Odometry and Visual SLAM systems. All data is at a
full frame rate of 30Hz and the camera sensor, a
Microsoft Kinect sensor, has a resolution of 640x480.
Next section we demonstrate the effectiveness of our
proposed optimal feature selection using the
hypervolume in the following experiments.
Figure 4: (Top) Graph showing rotation error and
correlation when hypervolume is small. (Bottom)
Distribution of selected 5 pairs of corresponding feature
points with small rotation error among cases with small
hypervolume.
3.2 The Effect of Hypervolume Based
Optimal Feature Selection
After completing Step 4 of Algorithm 1, the rotation
matrix calculated from the hypervolume and essential
matrix was evaluated by analyzing the correlation
using the error with the ground truth. The data was
accumulated and analyzed through 100 iterations. Fig.
4(Top) is a graph showing the relationship between
rotation error and hypervolume. For the rotation error,
the essential matrix estimated by the five-point
algorithm and the rotation matrix between the two
images were obtained from the ground truth trajectory
provided by the TUM RGB-D dataset. The x-axis of
the graph is represented by an index based on the size
of the hypervolume, and the y-axis is the rotation
error. The general trend is that the larger the
hypervolume, the larger the rotation error. First, let's
look at the 10 data with the smallest hypervolume.
Among them, No. 1 and No. 2 are the cases with the
Uncertainty Hypervolume in Point Feature-Based Visual Odometry
295
smallest rotation error, and the selected matching
feature point set is shown in Fig. 4(Bottom).
Figure 5: (Top) Graph showing rotation error and
correlation when hypervolume is large. (Bottom)
Distribution of selected 5 pairs of corresponding feature
points in the case with large rotation error among the cases
with large hypervolume.
We can see that the five pairs of corresponding
feature points are evenly distributed among each
other. This is the same result we found experimentally
in Section 3. The second case is when the
hypervolume is large and the rotation error is the
largest. As shown in Fig. 5, the five pairs of
corresponding feature points have a clustered
distribution. This confirms the correlation of different
rotation errors with hypervolume size and proves the
validity of our proposed approach.
3.3 Threshold for Hypervolume
Selection
When estimating the essential matrix using our
proposed method, we used multiple iterations to
select a set of 5 pairs of corresponding feature points
when the hypervolume is less than a threshold 𝐾
value. The criterion for selecting 𝐾 depends on the
number of matching points, which we found through
experimentation. In our case, if the number of
matching points is 150 or less, we selected a set of 5
pairs of corresponding feature points when the value
of logscale applied to the hypervolume is less than -
32, and if the number of matching points is more than
150, we set 𝐾 −36.
3.4 Comparison with RANSAC
Algorithm Using KITTI Odometry
Benchmark Dataset
The following evaluates the performance of the
proposed method by comparing the Absolute Pose
Error (APE) with the commonly used RANSAC
algorithm using the KITTI Odometry Benchmark
dataset (Geiger et al., 2012). In the previous section,
it is the same as the environment that evaluates the
correlation between Uncertainty Hypervolume and
pose estimation using the TUM dataset. The
algorithm for calculating VO used mono_vo, and the
feature point extraction method used the FAST
algorithm. The feature point matching method was
performed using a KLT tracker and a 5-point
Algorithm for motion estimation. The Scale Factor
was extracted and used from the Ground Truth
provided by the KITTI dataset. The method of
selecting the optimal feature point set with the
proposed Uncertainty Hypervolume method is the
same as Algorithm 2.
The difference from Algorithm 1 is that posture
estimation cannot be performed because the optimal
set of feature points cannot be selected depending on
the threshold value during the VO process. Therefore,
in the process of selecting the optimal set of feature
points, as in Algorithm 2, the Uncertainty
Hypervolume measurement index value calculated by
several extracts is sorted, and then VO is used as input
data for estimation by selecting the lower 10% set
with a small value. In addition, in this dataset
experiment, the Uncertainty Hypervolume
measurement index using the Rotation Matrix
obtained using it as Roll, Pitch, Yaw, and its size as
an indicator performed better than the Uncertainty
Hypervolume measurement index using Algorithm 1.
The following are the results of evaluating the KITTI
Odometry Benchmark 00, 02, 03, 04, and 05
sequences.
The results of the 00 sequence are shown in Fig.
6. Although both results have large errors due to drift
and estimation errors by scale factor, it can be seen
that APE's RMSE and Mean are relatively less
ICINCO 2024 - 21st International Conference on Informatics in Control, Automation and Robotics
296
(a) Sequence 00
(b) Sequence 02
(c) Sequence 03
(d) Sequence 04
(e) Sequence 05
Figure 6: The KITTI Odometry Benchmark dataset (00, 02, 03, 04, 05) APE Results. Each sequence includes a graph of the
translation part APE, RMSE, Median, Mean, and Std, and a graph of the Error Mapped ontology. The left is when the proposed
Uncertified Hypervolume method is applied, and the right is the RANSAC method. It can be seen that the overall proposed
method has a smaller error than the case of RANSAC. Performance comparisons are summarized in Table 1.
Uncertainty Hypervolume in Point Feature-Based Visual Odometry
297
in the proposed method compared to RANSAC using
all feature points. In VO, the sensitivity to rotation is
high, so when calculating the Uncertainty
Hypervolume, it is better to use it to convert it to
Rotation Matrix, convert it to Roll, Pitch, and Yaw,
and then use the difference from the reference
Rotation Matrix as Uncertainty Hypervolume. In the
simplified part of the Uncertainty Hypervolume
calculation mentioned above, the measurement
indicators obtained using the Rotation Matrix instead
of the Essential Matrix are used. On the 02 sequence,
the same method as the 00 sequence shows slightly
better results. Next, for the 03 sequence, the proposed
method for RMSE shows better results, but for Mean,
RANSAC has a slight advantage. Next, it can be seen
from the graph that the results of the proposed method
for both the 04 and 05 sequences clearly show good
performance.
Table 1: Comparison of the Uncertainty Hypervolume
Method with the RANSAC Method.
Uncertainty
Hypervolume Method
APE (m)
RANSAC
Method APE (m)
Se
q
uence RMSE Mean RMSE Mean
00 100.66 89.07 126.85 102.07
02 133.74 109.84 220.29 194.16
03 19.73 18.35 19.98 18.27
04 3.54 3.31 8.19 7.81
05 68.93 55.66 91.53 74.92
4 CONCLUSIONS
This study proposes the selection of optimal feature
points based on Uncertainty Hypervolume, a new
approach for estimating the Essential Matrix for
visual odometry. Through the pioneers' previous
research and simulation experiments, it was found
that uncertainty due to various errors in the selected
corresponding feature point pair affects posture
estimation, and that better performance can be
obtained in VO if a set of feature point pairs well
distributed in space is selected without clumping or
forming lines. Based on this, the uncertainty in VO
estimation is quantified by Uncertainty Hypervolume,
a new measurement index that considers the error of
the selected corresponding feature point pair and the
distribution they form. Using actual data, it was
confirmed that selecting a feature point set with a
small measurement index had a smaller error with the
Ground Truth value. The proposed method can work
effectively even though there are many features
extracted with large errors due to low visibility or bad
weather conditions. It can provide robustness in VO
because only features with high quality are collected
and used for VO estimation. The future work includes
the optimization of algorithms to improve the
computational efficiency while maintaining the
performance advantage of the proposed approach
over other conventional methods that rely on inlier
feature points and RANSAC.
Algorithm 2: Uncertainty hypervolume-based optimal
feature selection for visual odometry.
Data: 𝐿 correspondence feature point sets
Result: A set of five-feature point pairs from the
b
ottom 10% with low Uncertaint
y
H
y
pervolume
Step 1. Randomly select K sets of five-feature point
sets, {
(
𝑝
,𝑞
)
, 𝑖=1,,5} from the feature pairs
detected from the images of two camera views
subject to VO. Incorporating a bucketing approach
into this step to improve the initial selection of the
five feature sets provides a way to select well-
distributed feature sets with few attempts.
Step 2. Generate the corresponding coefficient
vectors, { 𝑣
,𝑖=1,⋯,5 } . Then, compute the
essential matrix using the five-point algorithm with
{𝑣
,𝑖=1,⋯,5 } as input.
Step 3. Generate 32 pairs of 𝑣
= (𝑝
, 𝑞
), {𝑖=1
,⋯32} taking into account the error of the
corresponding feature points to {𝑣
,𝑖=1,⋯,5 }.
With this, the 32-vertices Essential matrix is
computed. And, the rotation matrix is estimated
using the essential matrix that constitutes the
vertex. The matrix is then converted to Roll, Pitch,
and Yaw. The difference between this and the
reference rotation, obtained with the reference
essential matrix, is taken as the uncertainty
hypervolume.
Step 4. Select a set of five feature point pairs from
the bottom 10% with low Uncertainty
Hypervolume. The VO is estimated using the best
set of five selected feature point pairs.
ACKNOWLEDGMENTS
This research was funded, in part, by the “Intelligent
Manufacturing Solution under Edge-Brain Framework”
(Grant 2022-0-00067 and IITP-2022-0-00187)
project, in part, by the Artificial Intelligence Graduate
School Program, Grant No. 2019-0-00421, and by ICT
ICINCO 2024 - 21st International Conference on Informatics in Control, Automation and Robotics
298
Consilience Program, IITP-2020-0-01821, of the
Institute of Information and Communication
Technology Planning & Evaluation (IITP), sponsored
by the Korean Ministry of Science and Information
Technology (MSIT).
REFERENCES
D. Nistér, N. Oleg and B. James, "Visual odometry."
Computer Vision and Pattern Recognition, 2004.
CVPR 2004. Proceedings of the 2004 IEEE Computer
Society Conference on. Vol. 1. IEEE, pp I-I, 2004.
S. Poddar, R. Kottath, and V.Karar, Motion Estimation
Made Easy: Evolution and Trends in Visual Odometry
. In Recent Advances in Computer Vision (pp. 305-
331). Springer, Cham, 2019.
M.A. Fischler and C.B. Robert, "Random sample
consensus: an automated cartography."
Communications of the ACM 24.6 (1981): 381-395.
Igor Cvišić and Ivan Petrović, “Stereo odometry based on
careful feature selection and tracking,” in Proc. Eur.
Conf. Mobile Robots (ECMR), Sep. 2015.
Z. Zhang, R. Deriche, O. Faugeras, and Q.-T. Luong, A
robust technique for matching two uncalibrated images
through the recovery of the unknown epipolar
geometry, Artificial Intelligence, vol. 78, no.12, pp.
87119, 1995.
B. Kitt, A. Geiger, H. Lategahn, “Visual odometry based on
stereo image sequences with RANSAC-based outlier
rejection scheme, in Intelligent Vehicles
Symposium, 2010, pp. 486-492
H. H. Nguyen, and S. Lee, Orthogonality Index Based
Optimal Feature Selection for Visual Odometry. IEEE
Access, 7, 6228 4-62299, 2019.
R. Deriche, Z. Zhang, Q.-T. Luong and 0. Faugeras, Robust
recovery of the epipolar geometry for an uncalibrated
stereo rig, in: Proceedings Third European Conference
on Computer Vision I, Stockholm, Sweden (1994) 567-
576.
Nister, D. An Efficient Solution to the Five-Point
Relative Pose Problem. IEEE Transactions on
Pattern Analysis and Machine Intelligence.Volume 26,
Issue 6, June 2004.
Rublee, E., Rabaud, V., Konolige, K., & Bradski, G.
(2011). ORB: An efficient alternative to SIFT or SURF.
2011 International Conference on Computer Vision,
2564-2571.
Geiger, A., Lenz, P., & Urtasun, R. (2012). Are we ready
for autonomous driving? The KITTI vision benchmark
suite. 2012 IEEE Conference on Computer Vision and
Pattern Recognition (CVPR), 3354-3361.
Uncertainty Hypervolume in Point Feature-Based Visual Odometry
299