SIMULTANEOUS LOCALIZATION AND MAPPING IN

UNMODIFIED ENVIRONMENTS USING STEREO VISION

A. Gil, O. Reinoso, C. Fern

andez, M. A. Vicente

Universidad Miguel Hern

andez de Elche

Avda. de la Universidad s/n, 03202 Elche (Alicante)

A. Rottmann, O. Mart

ınez Mozos

University of Freiburg

Department of Computer Science

D-79110 Freiburg

Keywords:

SLAM, Stereo Vision, Visual landmarks, Data Association.

Abstract:

In this paper we describe an approach that builds three dimensional maps using visual landmarks extracted

from images of an unmodiﬁed environment. We propose a solution to the Simultaneous Localization and

Mapping (SLAM) problem for autonomous mobile robots using visual landmarks. Our map is represented

by a set of three dimensional landmarks referred to a global reference frame, each landmark contains a visual

descriptor that partially differentiates it from others. Signiﬁcant points extracted from stereo images are used

as natural landmarks, in particular we employ SIFT features found in the environment. We estimate both

the map and the path of the robot using a Rao-Blackwellized particle ﬁlter, thus the problem is decomposed

into two parts: one estimation over robot paths using a particle ﬁlter, and N independent estimations over

landmark positions, each one conditioned on the path estimate. We actively track visual landmarks at a local

neighbourhood and select only those that are more stable. When a visual feature has been observed from a

signiﬁcant number of frames it is then integrated in the ﬁlter. By this procedure, the total number of landmarks

in the map is reduced, compared to prior approaches. Due to the tracking of each landmark, we obtain different

examples that represent the same natural landmark. We use this fact to improve data association. Finally,

efﬁcient resampling techniques have been applied, which reduces the number of particles needed and avoids

the particle depletion problem.

1 INTRODUCTION

Building an accurate map of a given environment is

one of the hardest tasks for a mobile robot. It is inher-

ently difﬁcult, since noise in the estimation of robot’s

pose leads to errors in the estimation of the map and

vice-versa. Recently, many authors have considered

the problem of Simultaneous Localization and Map-

ping (SLAM). The aim of SLAM is to build a map of

an unknown environment and simultaneously localise

the robot with respect to this map. Here we consider

this problem using a Rao-Blackwellized Particle Fil-

ter (RBPF).

Most work on SLAM so far has focussed on build-

ing 2D maps of environments using range sensors

such as SONAR and laser (Wijk and Christensen,

2000), (Thrun, 2001). Recently, Rao-Blackwellized

particle ﬁlters have been used as an effective mean

of solving the SLAM problem using occupancy grid

maps (Stachniss et al., 2004). In this approach, each

particle constructs its own map based on the obser-

vations and the trajectory for that particle. Parti-

cles which best explain current observations are given

higher weights and re-sampled.

While mapping a particular environment, one must

be aware of two key problems:

• Odometry is subject to cumulative drift.

• Landmarks can be ambiguous.

Recently, some authors have been concentrating on

building three dimensional maps using visual infor-

mation extracted from cameras. In this scenario, the

map is represented by a set of three dimensional land-

marks related to a global reference frame. Stereo sys-

tems are typically less expensive than laser sensors

and are able to provide directly 3D information from

the scene. In (Little et al., 2001) and (Little et al.,

2002) stereo vision is used to track 3D visual land-

marks extracted from the environment. In this work,

SIFT features are used as visual landmarks. SIFT

(Scale Invariant Feature Transform) where initially

invented by D. Lowe and used in object recognition

tasks (Lowe, 1999). During exploration, the robot

extracts SIFT features from stereo images and calcu-

302

Gil A., Reinoso O., Fernández C., A. Vicente M., Rottmann A. and Martínez Mozos O. (2006).

SIMULTANEOUS LOCALIZATION AND MAPPING IN UNMODIFIED ENVIRONMENTS USING STEREO VISION.

In Proceedings of the Third International Conference on Informatics in Control, Automation and Robotics, pages 302-309

DOI: 10.5220/0001207303020309

 SciTePress

lates relative measurements to them. Landmarks are

then integrated in the map with an Extended Kalman

Filter associated to it. However, this approach does

not manage correctly the uncertainty associated with

robot motion, and only one hypothesis over the pose

of the robot is maintained. Consequently it may fail

in the presence of large odometric errors (e.g. while

closing a loop). In (Mir

o et al., 2005) a Kalman ﬁl-

ter is used to estimate an augmented state constituted

by the robot pose and N landmark positions (Dis-

sanayake et al., 2001). SIFT features are used too

to manage the data association among visual land-

marks. However, since only one hypothesis is main-

tained over the robot pose, the method would fail in

the presence of incorrect data associations. In addi-

tion, in the presence of a signiﬁcant number of land-

marks the method would be computationally expen-

sive.

The work presented in (Sim et al., 2005) uses SIFT

features as signiﬁcant points in space and tracks them

over time. It uses a Rao-Blackwellized particle ﬁlter

to estimate both the map and the path of the robot.

The robot movement is here estimated from stereo

ego-motion (Little et al., 2001), providing a corrected

odometry that simpliﬁes the SLAM problem, since no

large odometric errors are introduced.

The most relevant contribution of this paper is

twofold. First, we present a new mechanism to deal

with the data association problem in the case of dif-

ferent landmarks with similar appearance. This fact

may occur in most environments. Second, our ap-

proach actively tracks landmarks prior to its integra-

tion in the map. As a result only those landmarks

that are more stable are incorporated in the map. By

using this approach, our map typically consists of a

reduced number of landmarks compared to those of

(Little et al., 2002) and (Sim et al., 2005), for compa-

rable map sizes. In addition, we have applied effec-

tive resampling techniques, as exposed in (Stachniss

et al., 2005). This fact reduces the number of particles

needed to construct the map, thus reducing computa-

tional time.

The remainder of the paper is structured as follows.

Section 2 deals with visual landmarks and their utility

in SLAM. Section 3 explains the basics of the Rao-

Blackwellized particle ﬁlter. Next, section 4 presents

our solution to the data association problem in the

context of visual landmarks. In section 5 we present

our experimental results. Finally, section 6 sums up

the most important conclusions and proposes future

extensions.

2 FEATURE-BASED METHODS:

VISUAL FEATURES

Previous works in map building has revolved around

two topics: occupancy or certainty grids, and feature-

based methods. Feature based methods work by lo-

cating features in the environment, estimating their

position, and then using them as known landmarks.

Different kinds of features have been used to create a

map of the environment. In our work, we use visual

landmarks as features to build the map. In particular,

we use SIFT features (Scale Invariant Feature Trans-

form) which were developed for image feature gen-

eration, and used initially in object recognition ap-

plications (see (Lowe, 2004) and (Lowe, 1999) for

some examples). Key locations are selected at max-

ima and minima of a difference of Gaussian function

applied in scale space. The features are invariant to

image translation, scaling, rotation, and partially in-

variant to illumination changes and afﬁne or 3D pro-

jection. They are computed by building an image

pyramid with resampling between each level. SIFT

locations extracted by this procedure may be under-

stood as signiﬁcant points in space that are highly dis-

tinctive, thus can be found from a set of robot poses.

In addition, each SIFT location is given a descriptor

that describes this landmark. Thus, this process en-

ables the same points in the space to be recognized

from different viewpoints, which may occur while

the robot moves around its workplace, thus providing

information for the localization process. SIFT fea-

tures have been used in robotic applications, showing

its suitability for localization and SLAM tasks (Little

et al., 2001), (Little et al., 2002), (Sim et al., 2005).

Figure 1 shows a visual features extracted from an im-

age. These SIFT locations are used as landmarks in

the map. The size of the arrow on each feature is pro-

portional to SIFT’s scale.

3 RAO-BLACKWELLIZED SLAM

We estimate the map and the path of the robot using

a Rao-Blackwellized particle ﬁlter. Using the nomen-

clature in this ﬁlter, we denote as s

the robot pose at

time t. On the other hand, the robot path until time

t will be denoted s

= {s

, s

, . . . , s

}, the set of

observations made by the robot until time t will be

denoted z

= {z

, z

, . . . , z

} and the set of actions

= {u

, u

, . . . , u

}. Therefore, the SLAM prob-

lem can be formulated as that of determining the lo-

cation of all landmarks in the map Θ and robot poses

from a set of measurements z

and robot actions

. The map is composed as a set of differente land-

marks Θ = {θ

, θ

, . . . θ

, . . . , θ

}. We call c

the correspondence of the landmarks extracted from

SIMULTANEOUS LOCALIZATION AND MAPPING IN UNMODIFIED ENVIRONMENTS USING STEREO VISION

303

Figure 1: Image shows SIFT locations extracted from a typ-

ical image. The size of the arrow is proportional to SIFT’s

scale.

the data association problem. In consequence, the

SLAM problem can be formulated as the estimation

of the following probability density function over ro-

bot poses:

p(s

, Θ|z

, u

, c

) (1)

The map Θ is represented by a collection of N

landmarks. Each landmark is described as: θ

{µ

, Σ

, d

}, where µ

= (X

, Y

, Z

) is a vector

describing the position of the landmark referred to a

global reference frame O

with associated covariance

matriz Σ

. In addition, each landmark θ

is associ-

ated with a SIFT descriptor d

that partially differen-

tiates it from others. This map representation is com-

pact and has been used to effectively localize a robot

in unmodiﬁed environments (Gil et al., 2005). We can

see in ﬁgure 2 an example of the map represented by

N landmarks in a section of the environment.

−10

−15

−10

−5

X (m)

Y (m)

Z (m)

Figure 2: Example of the map created by a set of N 3D land-

marks in a section of the environment in which the robot is

present.

While exploring a particular environment, the robot

Figure 3: Stereo correspondences using SIFT features. The

epipolar constrain is used to ﬁnd correspondences across

images.

needs to determine whether a particular observation

t,k

corresponds to a previously mapped landmark or

to a new one. For the moment, we consider this cor-

respondence as known. Given that, at a time t the

map is formed by N landmarks, the correspondence

is represented by c

= {c

t,1

, c

t,2

, . . . , c

t,B

}, where

t,i

∈ [1 . . . N]. In consequence at a time t the ob-

servation z

t,k

corresponds to the landmark c

t,k

in the

map. When no correspondence is found we denote it

as c

t,i

= N + 1, indicating that it is a new landmark.

3.1 Stereo SIFT

Given two images I

and I

, captured with a stereo

head at a time t, we extract natural landmarks which

correspond to points in the 3-dimensional space. Each

point is accompanied by its SIFT descriptor and then

matched across images. The matching procedure is

constrained by the well-known epipolar geometry of

the stereo rig. In addition, a comparison between

SIFT descriptors is used to avoid false correspon-

dences. As a result, at a time t we obtain a set of B

observations denoted by z

= {z

t,1

, z

t,2

, . . . , z

t,B

} =

t,1

, d

t,1

, v

t,2

, d

t,2

, . . . , v

t,B

, d

t,B

}. A particular ob-

servation is constituted by z

t,k

= (v

t,k

, d

t,k

), where

t,k

= (X

, Y

, Z

) is a three dimensional vector

represented in the left camera reference frame and

t,k

is the SIFT descriptor associated to that point.

Figure 3 shows two stereo images of the environ-

ment. Correspondences found across both images are

shown. After stereo correspondence, a 3D reconstruc-

tion of the points is obtained, obtaining B measure-

ments v

t,k

= (X

, Y

, Z

) relative to the left camera

reference frame.

ICINCO 2006 - ROBOTICS AND AUTOMATION

304

3.2 Particle Filter Estimation

The conditional independence property of the SLAM

problem implies that the posterior (1) can be factored

as (Montemerlo et al., 2002):

p(s

, Θ|z

, u

, c

) = p(s

, u

, c

)

k=1

p(θ

, z

, u

, c

)

(2)

This equation states that the full SLAM posterior is

decomposed into two parts: one estimator over robot

paths, and N independent estimators over landmark

positions, each conditioned on the path estimate. This

factorization was ﬁrst presented by Murphy in 1999

(Murphy, 1999). We approximate p(s

, u

, c

) us-

ing a set of M particles, each particle having N inde-

pendent landmark estimators (implemented as EKFs),

one for each landmark in the map. Each particle is

thus deﬁned as:

[m]

= {s

t,[m]

, µ

[m]

t,1

, Σ

[m]

t,1

, . . . , µ

[m]

t,N

, Σ

[m]

t,N

} (3)

Where µ

[m]

t,i

is the best estimation at time t for the

position of landmark θ

based on the path of the parti-

cle m and Σ

[m]

t,i

its associated covariance matrix. The

particle set S

= {S

[1]

, S

[2]

, . . . , S

[M]

, } is calculated

incrementally from the set S

t−1

at time t − 1 and the

robot control u

. Thus, each particle is sampled from

a proposal distribution s

[m]

∼ p(s

t−1

, u

). Next,

and following the approach of (Montemerlo et al.,

2002) each particle is then assigned a weight accord-

ing to:

[m]

t,i

|2πZ

t,i

exp{−

a} (4)

where

a = (v

t,i

− ˆv

t,c

t,i

)

t,i

]

−1

t,i

− ˆv

t,c

t,i

)

Where v

t,i

is the current measurement and ˆv

t,c

t,i

is the predicted measurement for the landmark c

t,i

based on the pose s

[i]

. The matrix Z

t,i

is the co-

variance matrix associated with the innovation (v

t,i

−

ˆv

t,c

t,i

). Note that we implicitly assume that each mea-

surement v

t,i

has been assigned to the landmark c

t,i

of the map. This problem is, in general, hard to solve,

since similar-looking landmarks may exist. In section

4 we describe our approach to this problem. In the

case that B observations from different landmarks ex-

ist at a time t, we calculate the total weight assigned

to the particle as:

[m]

i=1

[m]

t,i

(5)

3.3 Efﬁcient Resampling

In order to assess for the difference between the

proposal and the target distribution, each particle is

drawn with replacement with probability proportional

to this importance weight. During resampling, parti-

cles with a low weight are normally replaced by oth-

ers with a higher weight. It is a well known prob-

lem that the resampling step may delete good particles

from the set and cause particle depletion. In order to

avoid this problem we follow an approach similar to

(Stachniss et al., 2004). Thus we calculate the number

of efﬁcient particles N

eff

as:

eff

i=1

[i]

(6)

We resample each time N

eff

drops below a prede-

ﬁned threshold (set to M/2 in our application). By

using this approach we have veriﬁed that the number

of particles needed to achieve good results is reduced.

4 DATA ASSOCIATION

While the robot explores the environment it must de-

cide whether the observation z

t,k

= (v

t,k

, d

t,k

) corre-

sponds to a previously mapped landmark or to a dif-

ferent landmark. We use each associated SIFT de-

scriptor d

t,k

to improve the data association process.

Each SIFT descriptor is a 128-long vector computed

from the image gradient at a local neighbourhood

of the interest point. Experimental results in object

recognition applications have showed that this de-

scription is robust against changes in scale, viewpoint

and illumination (Lowe, 2004). In the approaches

of (Little et al., 2001), (Little et al., 2002) and (Sim

et al., 2005), data association is based on the squared

Euclidean distance between descriptors. In conse-

quence, given two SIFT descriptors d

and d

the fol-

lowing distance function is computed:

E = (d

− d

)(d

− d

)

(7)

Then, the landmark of the map that minimizes the

distance E is chosen. Whenever the distance E is be-

low a certain threshold, the two landmarks are con-

sidered to be the same. On the other hand, a new

landmark is created whenever the distance E exceeds

a pre-deﬁned threshold. When the same point is

viewed from slightly different viewpoints and dis-

tances, the values in its SIFT descriptor remain almost

unchanged. However, when the same point is viewed

from signiﬁcantly different viewpoints (e.g. 30 de-

grees apart) the difference in the descriptor is remark-

able. In the presence of similar looking landmarks,

SIMULTANEOUS LOCALIZATION AND MAPPING IN UNMODIFIED ENVIRONMENTS USING STEREO VISION

305

(a) (b) (c)

Figure 4: Figures (a)-(c) show different SIFT points tracked along different images with variations in scale and orientation.

Table 1: Comparison of correct and incorrect matches using

the Euclidean distance and the Mahalanobis distance.

Correct Incorrect

matches matches

Euclidean distance 83.85 16.15

Mahalanobis distance 94.04 5.96

this fact causes a signiﬁcant number of false corre-

spondences, as can be seen in the ﬁrst column of table

We propose a different method to deal with the data

association in the context of SIFT features. We ad-

dress the problem from a pattern classiﬁcation point

of view. We consider the problem of assigning a pat-

tern d

to a class C

. Each class C

models a land-

mark. We consider different views of the same vi-

sual landmark as different patterns belonging to C

Whenever a landmark is found, it is tracked along p

frames and its descriptors d

, d

, . . . , d

are stored.

Then, for each landmark C

we compute a mean value

and estimate a covariance matrix S

, assuming the

elements in the SIFT descriptor independent. Based

on this data we compute the Mahalanobis distance:

L = (

−

−1

(

−

)

(8)

Where S

is a diagonal covariance matrix associ-

ated with the class C

with mean value

. We com-

pute the distance L for all the landmarks in the map

of each particle and assign the correspondence to the

landmark that minimizes L. If none of the values ex-

ceeds a predeﬁned threshold then we consider it a new

landmark.

In order to test this distance function we have

recorded a set of images with little variations of view-

point and distance (see ﬁgure 4). SIFT landmarks

are easily tracked across consecutive frames, since the

variance in the descriptor is low. In addition, we visu-

ally judged the correspondence across images. Based

on these data we compute the matrix S

for each SIFT

point tracked for more than 5 frames. Following, we

compute the distance to the same class using equa-

tion (7) and (8). For each experiment, we select the

class that minimises its distance function and as we

already know the correspondences, we can compute

the number of mistakes and correct matches. Table 1

shows the results based on our experiments. A total

of 3000 examples where used. As can be clearly seen,

a raw comparison of two SIFT descriptors using the

Euclidean distance does not provide total separation

between landmarks, since the descriptor can vary sig-

niﬁcantly from different viewpoints. As can be seen,

the number of false correspondences is reduced by us-

ing the Mahalanobis distance.

By viewing different examples of the same land-

mark we are able to build a more complete model

of it and this permits us to better separate each land-

mark from others. We consider that this approach re-

duces the number of false correspondences and, con-

sequently produces better results in the estimation of

the map and the path of the robot, as will be shown in

section 5.

5 EXPERIMENTAL RESULTS

During the experiments we used a B21r robot

equipped with a stereo head and a LMS laser range

ﬁnder. We manually steered the robot and moved

it through the rooms of the building 79 of the Uni-

versity of Freiburg. A total of 507 stereo im-

ages at a resolution of 320x240 were collected.

The total traversed distance of the robot is ap-

proximately 80m. For each pair of stereo im-

ages a number of correspondences were established

and observations z

= {z

t,1

, z

t,2

, . . . , z

t,B

} =

t,1

, d

t,1

, v

t,2

, d

t,2

, . . . , v

t,B

, d

t,B

} were obtained.

After stereo correspondence, each point is tracked for

a number of frames. By this procedure we can assure

that the SIFT point is stable and can be viewed from a

signiﬁcant number of robot poses. Practically, when a

landmark has been tracked for more than 5 frames it is

ICINCO 2006 - ROBOTICS AND AUTOMATION

306

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

0.2

0.4

0.6

0.8

1.2

1.4

1.6

1.8

RMS position error (m)

Figure 5: Results in RMS localization error when varying

considered a new observation and is integrated in the

ﬁlter. As mentioned in section 4, each descriptor is

now represented by d

t,i

= {

t,i

, S

} where

t,i

is the

SIFT vector computed as the mean of the p tracked

landmarks and S

is the corresponding diagonal co-

variance matrix.

5.1 Threshold Selection

In a practical way, given a SIFT descriptor and a pre-

dicted landmark position, the robot needs to decide

wether a particular observation belongs to a landmark

t,k

in the map or it is a new one. In our approach we

used equation 8 and set up a particular threshold p

Based on a particular observation z

t,k

= {v

t,k

, d

t,k

we ﬁnd the landmark that minimizes L for the land-

marks that lay in a neighbourhood. When the distance

L exceeds p

for all the landmarks in the neighbour-

hood, we create a new landmark. If L < p

for a

particular landmark in the map, we update its Kalman

ﬁlter. In consequence if p

is set too low, landmarks

will be frequently confused. On the contrary, if we

set p

too high, new landmarks will be frequently cre-

ated, which affects the quality of the estimated path

and the resulting map.

To test the inﬂuence of p

we repeatedly ran the al-

gorithm changing its value. Figure 5 shows the RMS

error in the position of the robot when the value of p

is changed. In order to test the quality of our results,

we have compared the estimated pose of our method

with the estimated pose using laser data recorded

during exploration. To calculate the pose based on

laser measurements, the method exposed in (Stach-

niss et al., 2004) was used. The SLAM algorithm

was run several times for each value of p

. Finally,

to build the map we choose the value of p

that pro-

duces better results. During SLAM, the true pose of

the robot is not known, in consequence, the value of

that produces better results cannot be calculated by

this way. However, we believe that the chosen value

of p

will produce good results when mapping differ-

ent environments.

Figure 6 shows the map constructed with 1, 10, and

100 particles. A total number of 1500 landmarks were

estimated. It can be seen that, with only 10 parti-

cles, the map is topologically correct. As can be seen

in the ﬁgures, some areas of the map do not possess

any landmark. This is due to the existence of feature-

less areas in the environment (i.e. texture-less walls),

where no SIFT features can be found.

Figure 7(a) shows the error in localization for each

movement of the robot during exploration using 200

particles. Again, we compare the estimated position

of the robot using our approach to the estimation us-

ing laser data.

In addition, we have compared both approaches

to data association as described in section 4. To do

this, we have made a number of simulations vary-

ing the number of particles used in each simulation.

The process was repeated using both data association

methods. As can be seen in ﬁgure 7(b) for the same

number of particles, better localization results are ob-

tained when the Mahalanobis distance is used, thus

improving the quality of the estimated map.

Compared to preceding approaches our method

uses less particles to achieve good results. For ex-

ample, in (Sim et al., 2005), a total of 400 particles

are needed to compute a topologically correct map,

while correct maps have been built using 50 parti-

cles with our method. In addition, our maps typi-

cally consists of about 1500 landmarks, a much more

compact representation than the presented in (Sim

et al., 2005), where the map contains typically around

100.000 landmarks.

6 CONCLUSION AND FUTURE

WORK

In this paper a solution to SLAM based on a Rao-

Blackwellized particle ﬁlter has been presented. This

ﬁlter uses visual information extracted from cameras.

We have used natural landmarks as features for the

construction of the map. The method is able to build

3D maps of a particular environment using relative

measurements extracted from a stereo pair of cam-

eras.

We have also proposed an alternative method to

deal with the data association problem in the context

of visual landmarks, addressing the problem from a

pattern classiﬁcation point of view. When different

examples of a particular SIFT descriptor exist (be-

longing to the same landmark) we obtain a proba-

bilistic model for it. Also we have compared the re-

SIMULTANEOUS LOCALIZATION AND MAPPING IN UNMODIFIED ENVIRONMENTS USING STEREO VISION

307

−5 0 5 10 15

−10

−8

−6

−4

−2

X (m)

Y (m)

(a)

−5 0 5 10 15

−10

−8

−6

−4

−2

X (m)

Y (m)

(b)

−5 0 5 10 15

−10

−8

−6

−4

−2

X (m)

Y (m)

(c)

−5 0 5 10 15

−10

−8

−6

−4

−2

X (m)

Y (m)

(d)

Figure 6: Figure 6(a) shows a map created using 1 particle. Figure 6(b) shows a map created using 10 particles and image 6(c)

used 100 particles. We also show superposed the real path (continuous) and the estimated path using our approach (dashed).

Figure 6(d) shows the real path (continuous) and the odometry of the robot (dashed).

sults obtained using the Mahalanobis distance and the

Euclidean distance. By using a Mahalanobis distance

the data association is improved, and, consequently

better results are obtained since most of the false cor-

respondences are avoided.

Opposite to maps created by means of occupancy

or certainty grids, the visual map generated by the ap-

proach presented in this paper does not represent di-

rectly the occupied or free areas of the environment.

In consequence, some areas totally lack of landmarks,

but are not necessary free areas where the robot may

navigate through. For example, featureless areas such

as blank walls provide no information to the robot. In

consequence, the map may be used to effectively lo-

calize the robot, but cannot be directly used for nav-

igation. We believe, that this fact is originated from

the nature of the sensors and it is not a failure of the

proposed approach. Other low-cost sensors such as

SONAR would deﬁnitely help the robot in its naviga-

tion tasks.

As a future work we think that it is of particu-

lar interest to further research in exploration tech-

niques when this representation of the world is used.

We would also like to extend the method to the case

where several robots explore an unmodiﬁed environ-

ment and construct a visual map of it.

ACKNOWLEDGEMENTS

This research has been supported by a Marie Curie

Host fellowship with contract number HPMT-CT-

2001-00251 and the spanish government (Ministerio

de Educaci

on y Ciencia. Projects Ref: DPI2004-

07433-C02-01, and PCT-G54016977-2005). The au-

ICINCO 2006 - ROBOTICS AND AUTOMATION

308

0 100 200 300 400 500

0.5

1.5

2.5

3.5

Movement number

Position absolute error (m)

(a)

0 50 100 150 200 250 300

0.5

1.5

2.5

Number of particles

Position RMS error (m)

(b)

Figure 7: Figure 7(a) shows the absolute position error during the SLAM process. The ﬁnal error in position is 0.34m. Figure

7(b) shows results in localization error when varying the number M of particles. The RMS error in odometry is shown as a

dotted line. Results using equation (7) are shown as a dashed line and results using equation (8) are shown as a continuous

line.

thors would also like to thank, Rudolph Triebel and

Manuel Mucientes for their help and stimulating dis-

cussions.

REFERENCES

Dissanayake, G., Newman, P., Clark, S., Durrant-Whyte,

H., and Csorba, M. (2001). A solution to the simulta-

neous localization and map building (slam) problem.

IEEE Trans. on Robotics and Automation, 17:229–

241.

Gil, A., Reinoso, O., Vicente, A., Fern

andez, C., and Pay

L. (2005). Monte carlo localization using sift fea-

tures. Lecture Notes in Computer Science (LNCS),

1(3523):623–630.

Little, J., Se, S., and Lowe, D. (2001). Vision-based

mobile robot localization and mapping using scale-

invariant features. In Proceedings of the IEEE In-

ternational Conference on Robotics and Automation

(ICRA), pages 2051–2058.

Little, J., Se, S., and Lowe, D. (2002). Global localization

using distinctive visual features. In Proceedings of the

2002 IEEE/RSJ Intl. Conference on Intelligent Robots

and Systems.

Lowe, D. (1999). Object recognition from local scale-

invariant features. In International Conference on

Computer Vision, pages 1150–1157.

Lowe, D. (2004). Distinctive image features from scale-

invariant keypoints. International Journal of Com-

puter Vision, 2(60):91–110.

Mir

o, J. V., Dissanayake, G., and Zhou, W. (2005). Vision-

based slam using natural features in indoor environ-

ments. In Proceedings of the 2005 IEEE International

Conference on Intelligent Networks, Sensor Networks

and Information Processing, pages 151–156.

Montemerlo, M., Thrun, S., Koller, D., and Wegbreit, B.

(2002). Fastslam: A factored solution to the simulta-

neous localization and mapping problem. In AAAI.

Murphy, K. (1999). Bayesian map learning in dynamic en-

vironments. In In Neural Information Processing Sys-

tems (NIPS).

Sim, R., Elinas, P., Grifﬁn, M., and Little, J. J. (2005).

Vision-based slam using the rao-blackwellised parti-

cle ﬁlter. In IJCAI Workshop on Reasoning with Un-

certainty in Robotics.

Stachniss, C., Haehnel, D., and Burgard, W. (2004). Ex-

ploration with active loop-closing for FastSLAM. In

IEEE/RSJ Int. Conference on Intelligent Robots and

Systems.

Stachniss, C., Haehnel, D., and Burgard, W. (2005). Im-

proving grid-based slam with rao-blackwellized parti-

cle ﬁlters by adaptive proposals and selective resam-

pling. In IEEE Int. Conference on Robotics and Au-

tomation (ICRA).

Thrun, S. (2001). A probabilistic online mapping algorithm

for teams of mobile robots. International Journal of

Robotics Research, 20(5):335–363.

Wijk, O. and Christensen, H. I. (2000). Localization and

navigation of a mobile robot using natural point land-

markd extracted from sonar data. Robotics and Au-

tonomous Systems, 1(31):31–42.

SIMULTANEOUS LOCALIZATION AND MAPPING IN UNMODIFIED ENVIRONMENTS USING STEREO VISION

309