Improving 2D Reactive Navigators with Kinect

Javier Gonzalez-Jimenez, J. R. Ruiz-Sarmiento and Cipriano Galindo

System Engineering and Automation Dpt., University of Malaga, Malaga, Spain

Keywords: Kinect, Reactive Navigation, Obstacle Detection, Mobile Robot.

Abstract: Most successful mobile robots rely on 2D radial laser scanners for perceiving the environment. The use of

these sensors for reactive navigation has a serious limitation: the robot can only detect obstacles in the plane

scanned by the sensor, with the consequent risk of collision with objects out of this plane. The recent

commercialization of RGB-D cameras, like Kinect, opens new possibilities in this respect. In this paper we

address the matter of adding the 3D information provided by these cameras to a reactive navigator designed

to work with radial laser scanners. We experimentally analyze the suitability of Kinect to detect small

objects and propose a simple but effective method to combine readings from both type of sensors as well as

to overcome some of the drawbacks that Kinect presents. Experiments with a real robot and a particular

reactive algorithm have been conducted, proving a significant upgrade in performance.

1 INTRODUCTION

Most successful mobile robots rely on 2D radial

laser scanners for perceiving the environment and on

a hybrid navigation approach (Fiorini and Shiller,

1988), including a reactive algorithm to deal with

obstacle avoidance (Blanco et al., 2008). The usage

of 2D scanners for reactive navigation bears a

serious limitation: the robot can only detect

obstacles on the plane scanned by the sensor,

typically a plane parallel to the floor. Obviously, this

entails a collision risk with objects at a different

height or with salient parts. Such a limitation can

only be tackled by gathering 3D information of the

robot’s surrounding from a three-dimensional field-

of-view. Past solutions for obtaining 3D data

includes the so-called actuated laser range finders

(aLRF), which are expensive and not quick enough

for mobile robot navigation purposes (Holz et al.,

2008), (Marder-Eppstein et al., 2010). Nowadays,

this hurdle can be overcome by RGB-D cameras,

like Kinect (Kinect, www.xbox.com).

Kinect has been developed by Microsoft® as a

natural interface for videogames. Additionally, it

presents certain features that turn it into an attractive

device for mobile robotics:

 It is a compact and lightweight sensor which

provides both RGB and range images.

 It is fast, working at a frequency of 30 Hz.

 The operation range is acceptable for indoor

applications: from 0.5 to 3.5m.

 It is cheap: around 150€ nowadays.

With the aim of improving the reactive navigation

capabilities of a mobile robot, in this work we

address the problem of replacing (or enhancing) a

radial laser scanner which feeds a reactive navigator

with a Kinect sensor. It is important to remark that

our interest is not in the modification of the reactive

algorithm, but in adapting the Kinect depth image in

order to provide the navigation algorithm with a

virtual 2D scan, encapsulating the 3D world

information. For such a goal, two major drawbacks

need to be overcome. On the one hand, the depth

image of Kinect, which has a resolution of 640x480

pixels, needs to be effectively condensed in a 2D

scan format, losing as less meaningful information

as possible for the motion algorithm. On the other

hand, Kinect has a large blind zone, both in angle

(i.e. narrow field-of-view) and for short distances.

For a solution to these problems we propose a two-

steps postprocesing of the Kinect data:

i.The range image can be seen as the output of 480

radial scanners heading to different tilt angles. We

simplify that information by projecting the

perceived obstacles around the robot on a virtual

horizontal scanning plane.

ii.A short-term memory of sensed obstacles is

implemented such that unobserved

obstacles lying in the blind zone of the sensor are

incorporated into the virtual scan.

393

Gonzalez-Jimenez J., R. Ruiz-Sarmiento J. and Galindo C..

Improving 2D Reactive Navigators with Kinect.

DOI: 10.5220/0004485503930400

In Proceedings of the 10th International Conference on Informatics in Control, Automation and Robotics (ICINCO-2013), pages 393-400

ISBN: 978-989-8565-71-6

 2013 SCITEPRESS (Science and Technology Publications, Lda.)

Figure 1: Kinect sensor and coordinate system (taken from

(Garcia and Zalevsky, 2008)).

Although there are some works that use Kinect as an

input sensor for reactive navigation (see for example

(Cunha et al., 2011), (Biswas and Veloso, 2011)), to

the best knowledge of the authors the commented

drawbacks have not been explicitly addressed.

In order to test the suitability of the proposed

approach, a set of experiments has been conducted

within cluttered scenarios. The experiments have

validated the improvement in performance of a

particular reactive navigator (Blanco et al., 2008),

though the solution presented here can be applied to

any other reactive algorithm that utilizes a 2D

representation of the space, i.e. only relying on 2D

laser scanners.

Next section gives a description of the Kinect

device. Section III presents the proposed solution for

the usage of Kinect as an additional input sensor for

any reactive navigation algorithm based on a 2D

obstacle representation. In section IV results from

the conducted experiments are presented and

discussed. Finally, some conclusions are outlined.

2 THE KINECT SENSOR

2.1 Description

The Kinect device (see figure 1) is equipped with an

RGB camera, a depth sensor, a matrix of

microphones, a motorized base which endows the

sensor with a tilt movement of  27º, and a 3-axis

accelerometer.

Focusing on the depth sensor, also called range

camera, it is composed of an infrared light projector

combined with a monochrome CMOS sensor. It has

a VGA resolution (640x480 pixels) with 11-bit

resolution depth, and a data refresh ratio of 30 Hz.

Its nominal field of view is 58º in the horizontal

plane and 45º in the vertical one. The operational

range reported by the manufacturer is from 1.2 m. to

3.5, though in our experiments we have tested that

the sensor is able to detect objects placed at 0.5 m.

(see section 2.3).

2.2 Working Principle of the Range

Camera

The range camera of Kinect consists of two devices:

an infrared (IR) light projector which casts a pattern

of points onto the scene and a standard CMOS

monochrome sensor (IR camera) (Freedman et al.,

2010). Both are aligned along the X axis, with

parallel optical axis separated a distance (baseline)

of 75mm (see figure 1). This configuration eases the

computation of the range image, which is performed

through the triangulation between the IR rays and

their corresponding dot projections onto the image.

The method to compute the correspondence between

rays and pixels relies on innovative technique called

Light Coding (Garcia and Zalevsky, 2008), patented

by PrimeSense (PrimeSense, www.primesense.com),

which entails a very particular factory calibration. In

this calibration, a set of images of the point pattern,

as projected on a planar surface at different known

distances, are stored in the sensor. These are the so-

called reference images.

Kinect works like a correlation-based stereo

system with an ideal configuration (i.e. identical

cameras, aligned axes separated along the X axis)

where the IR rays are “virtually” replaced by line-of-

sight of the points in the reference images. As in

stereo, depth at each image point is derived from its

disparity along the X axis, which is computed by

correlating a small window over the reference

image. Further information about this calculation

can be found in (Khoshelham, 2011).

Regarding the accuracy of this method, the

distance errors are lower than 2 cm. for close objects

(up to 2m.), linearly increasing until an average error

of 10 cm. at 4m.

2.3 Kinect Reliability for Detecting

Thin Obstacles

To assess how much reliable Kinect is to feed a

robot reactive navigation system, it is of interest to

analyze its capacity to detect surrounding obstacles,

particularly those that are hardly detectable by other

sensors, either because of their small size or because

of their position in the scene, e.g. the salient board of

a table, the legs of a chairs, coat stands, etc.

We have performed a number of experiments

where sticks of different thickness (1, 2, and 4 cm.)

were horizontally placed in front of the sensor, at a

ICINCO2013-10thInternationalConferenceonInformaticsinControl,AutomationandRobotics

394

Figure 2: Results for the small obstacle detection experiment. The vertical axis represents the 16 central columns of the

range image. The horizontal axis is the distance, discretized into intervals of 5 cm, from Kinect to the stick horizontally

placed in front of the robot. Each cell encodes whether the stick was detected at any of the image rows considered.

Figure 3: Experiment setup to test the “visual acuity” of

Kinect for detecting small obstacle. In this particular case,

the obstacle is a horizontal stick with a thickness of 4 cm.

certain height, verifying whether they are detected at

different distances. The experiments were conducted

in a corridor with the Kinect sensor mounted on a

mobile robot (see figure 3). To extract the

meaningful data for this analysis we only focused on

a rectangular window of the range image, concretely

the 16 central columns of a small number of

consecutive image rows at the height where the stick

is expected to be. We must account for an interval of

rows since there is no guarantee that the stick is

observed in a single, fixed row during all the robot

run. Notice that, though the stick is roughly

horizontal, its projection on the range image may

cover different rows and also may change from one

position to another.

Starting at a distance of 4 meters from the

obstacle, the robot gradually moves towards it at

discrete increments, while recording range images as

well as the robot odometry at each position. This

experiment was repeated 5 times for each thickness

(1, 2, and 4 cm.).

Figure 2 displays the results of two of these robot

runs, for the 1 and 2cm. stick. It is interesting to see

how the 2cm.-thick stick was almost always

observed over the full operational interval (from 0.5

to 3.5 m.). The plot for the 4cm.-wide stick has been

omitted because it was always detected.

These results reveal the Kinect’s potential for

detecting small obstacles up to an acceptable

distance. Note how in the case of the obstacle of 1

cm its detection is not stable: though it starts to be

detected at 2.3 m, it later disappears because of the

discrete spatial sampling of the sensor.

Object not detectedObject detected

Legend

No measurements available

Central measures

Improving2DReactiveNavigatorswithKinect

395

3 KINECT AS INPUT SENSOR

OF A 2D REACTIVE

NAVIGATOR

In general, the use of Kinect to feed a reactive

navigator based on a 2D space representation

presents two difficulties: i) the huge amount of data

it provides, and ii) the existence of a blind zone both

at short distance and because of the narrow

horizontal field-of-view (in comparison to laser

radial scanners). Our solution to overcome these

issues consists of a post-processing stage to

conveniently adapt the data to the specific needs of

the reactive navigator, which may also receive

sensorial information from other sources, e.g. a 2D

laser scanner (see figure 4).

Figure 4: General diagram of the proposed system to

combine data from a laser scanner and a Kinect sensor to

feed a reactive navigator.

The first step of such a post-processing stage aims at

reducing the huge amount of data while keeping the

relevant 3D sensorial information about obstacles. A

second phase strives for coping with the problem of

the blind zone by creating a short-term memory

which temporally recalls the perceived obstacles

around the robot. This memory is then transformed

to a scan format suitable to be exploited by the

reactive navigator. Next, these processes are

described.

3.1 Projection of the Point Cloud to 2D

The Kinect sensor provides more than 300.000

measures per frame at 30Hz. This intense flow of

data provides the robot with very valuable

information about the surrounding but entails two

problems for a reactive navigator: First, it becomes a

significant computational burden that could hamper

the robot to concurrently execute other tasks, e.g.

robot localization, mapping, etc. Second, these data

cannot be directly exploited by conventional robot

reactive algorithms designed to be fed with 2D

scans, such as Virtual Force Field (Borenstein and

Koren, 1989), Nearness diagram (Minguez and

Montano, 2004), PTG-based navigator (Blanco et

al., 2008), etc.

Our approach to overcome these two problems

consists in condensing the 3D Kinect data into a 2D

scan by selecting the minimum measured distance

from each column of the range image. This is a

simple and efficient procedure whose results is a

virtual scan of 640 ranges (the number of columns in

the image) that captures the closest obstacle point

for all the different heights (image rows), as shown

Figure 5.

Please, notice that, by extracting a virtual scan

for different height intervals, this solution could

cope with robots with varying polygonal sections.

The solution implemented in this work, thereby, is a

particular case of this general approach.

3.2 Short Term Memory

A serious limitation of Kinect is its inability to

Figure 5: Left, image of the scene captured by the RGB camera. Middle, top view of the scene perceived by the range

camera. Right, lateral view of the same scene. The green points correspond to discarded measurements, and the blue ones

are those considered for the virtual scan.

ICINCO2013-10thInternationalConferenceonInformaticsinControl,AutomationandRobotics

396

detect close objects, both below its minimum

operational range and out of the field-of-view

(FOV). The minimum range depends on the surface

color and material, and is typically between 0.5 and1

m., while the horizontal and vertical FOV are 58 and

45 degrees, respectively. This blind zone becomes a

serious drawback for a reactive navigator, since

when the robot approaches an obstacle (both with

rotational or translational movements) it suddenly

disappears and the space becomes obstacle-free,

causing the robot to crash into it. To overcome this

problem we propose a short-term memory around

the robot location which maintains previous

measurements temporally. Such a memory has been

implemented through a grid, similarly to an

occupancy map (see figure 6), which is formalized

as follows.

3.2.1 Memory

A short-term memory M is defined as an n x m

matrix, which discretizes a certain bi-dimensional

area around the robot. Let M(i,j) represent the

probability of the cell c

i,j

to be occupied by an

obstacle, based on the observations, o

, …, o

, of the

robot, that is:

( , ) ( ,..., )

ij k

Mij pc o o

(1)

For the calculation of such a probability it is

convenient to use the so-called log-odds (refer to

(Thrun, 2003) for further detail), which requires the

computation, for each c

i,j

at time t, of the expression:

( ,..., )

()log

1 ( ,..., )

ij k

pc o o





(2)

The memory cells are initialized with the value:

()

()log

1()





(3)

where p(c

i,j

) has been set to 0.5.

Given l

it is possible to retrieve the probability

of each cell as:

()

( ,..., ) 1

ij k

pc o o



(4)

and from that, we create an obstacle map by

considering that a cell c

i,j

is occupied if its

probability is above a given threshold. To feed the

reactive navigator, a scanner simulator converts this

obstacle grid map into a scan format.

Figure 6: Short-term memory at time t

. The memory

stores a set of previous sensor measurements into a grid.

Right, the obstacles considered by the reactive navigator

are formed by the integration of points from the blind zone

(in red) and the current virtual scan (in blue), which is

obtained from the Kinect data (in green).

3.2.2 Memory Update

Let o

be a virtual scan observation derived from

Kinect at time t. Let c

be the cell where the k-th

measurement of the scan lies, given the current robot

pose. Using the Bayes rule and log-odds, the

updating of the value l

i,j

) as a consequence of a

new scan o

is performed through the following

expression (see (Thrun, 2003)):

(|) 1()

() ()log log

1(|) ()

ij k

ij t ij

ij ij

ij t ij

pc o pc

lc l c

pc o pc









 







(5)

where the a posteriori probability

(|)

ij t

pc o

calculated with the Bayes rule using the following

sensor model:

()

(| , 1)

tij

po M c e









(6)

where d

i,,j

is the distance to the center of the cell c

i,j

The term

1()

log

()









is a constant which

depends on the selected value of the a priori

probability

()

. Therefore, considering

()0.5

pc 

, the calculation of the memory update

is simplified to:

(|)

,( ) ( ) log

1(|)

ij t

ij k i j i j

ij t

pc o

cclc lc

pc o





  







(7)

Improving2DReactiveNavigatorswithKinect

397

Notice that, according to this update mechanism a

cell requires persistent observations to change from

occupied to free, and viceversa. Also, note that non-

observed cells are not updated, i.e., they keep their

values.

Figure 7: Mobile robot SANCHO, equipped with 2

Hokuyo laser scanners and a Kinect sensor for obstacle

avoidance.

4 IMPLEMENTATION

AND DISCUSSION

The proposed approach has been implemented in the

mobile robot SANCHO (Gonzalez et al, 2009) (see

figure 7. The size of the short-term memory was set

to 100 x 100 cells, for a 2 x 2m. area around the

robot, i.e., with a cell size of 2 cm.

SANCHO, built upon a commercial Pioneer

3DX base, is equipped with two Hokuyo laser

scanners (Hokuyo, www.hokuyo-aut.jp) placed at 30

cm. over the floor (scanning the front and the rear

of the robot) and a Kinect sensor in its front upper

part (see figure 7).

The reactive navigation system of SANCHO,

called PTG-based navigator, transforms the 2D local

obstacles in the 3D Configuration Space of the robot

into the so-called “Trajectory Parameter Spaces”

(TP-Spaces). A TP-Space is a 2D representation of

all the poses the robot could reach if it moved

according to a certain path model. Since several path

models are considered by the reactive system,

several TP-Spaces are built. The mathematical

transformation between the C-Space and the TP-

Spaces is done by “Parameterized Trajectory

Generators” or PTGs, each one representing a path

model (e.g. circular, turn&move-straight, etc.) that

fulfills certain geometrical and topological

properties. As a result of this transformation the

robot in the TP-space becomes a free-flying point

whose motion might be solved by any holonomic

obstacle avoidance, such as Virtual Force Field

(Borenstein and Koren, 1989) or Nearess Histogram

(Minguez and Montano, 2004). Please, refer to

(Blanco et al., 2008) for a more detailed explanation

of the PTG-based navigator. An interested feature of

the PTG-based navigator is that it can deal with

several sensors that provide surrounding information

simultaneously.

When using only the two radial laser scanners,

SANCHO is prone to crash with many unobserved

objects that it may encounter in a typical

environment, e.g. tables, chairs, boxes, plants, coat

stands, shelves, etc. After incorporated the Kinect

sensor this problem has been drastically reduced,

though not completely eliminated. Since a precise

quantification of the improvement level of the

proposed approach is not possible we have tried to

validate the method with two different tests. In the

first, we have repeated a number of robot local

navigations, i.e. go from A to B, with the two

sensing configurations (wiht and without Kinect)

and varying the obstacles along the path. Figure 8

shows one of these setups in our lab, with obstacles

at different heights, including a papers box, the base

of another robot, the board of a table, and a coat

stand with jackets. Some of the trajectories followed

by the robot are shown in figure 9. The red one

corresponds to the navigation performed using only

the radial laser scanners, which ends up with the

robot bumping into the cardboard box. On the

contrary, when using Kinect (the other three paths),

SANCHO manages to negotiate all the obstacles, so

reaching the goal successfully.

A second experiment has consisted of robot

SANCHO moving randomly for more than 20 hours

Figure 8: Scenario where some of the experiments were

conducted.

ICINCO2013-10thInternationalConferenceonInformaticsinControl,AutomationandRobotics

398

Figure 9: Paths followed by the robot during four different

navigations. The colored circles represent the initial

position of the robot, and the crosses correspond to the

target locations. Black points are obstacles detected by the

laser range finders. The rectangles and circles have been

added manually, and represent the obstacles undetected by

the laser scanners. Axis units are meters.

A second experiment has consisted of robot

SANCHO moving randomly for more than 20 hours

(in sessions of around 2 hours) in the environment

represented in the map of Figure 10. The reactive

navigation in this case was running as the lower part

of a hybrid navigational system with a topological

navigation on top (Fernández-Madrigal et al., 2004).

During all these sessions SANCHO suffered 5

collisions: three when passing doorways and 2 when

making rapid turns.

5 CONCLUSIONS

In this paper we have presented methods to

conveniently adapt the 3D information provided by a

Kinect device to work with a reactive navigator

designed to cope with radial laser scanners. It has

been proposed solutions for two of the major Kinect

drawbacks: the huge amount of data provided by the

sensor, more than 300.000 measures per frame at

30Hz, and its large blind zone due to both its narrow

field-of-view and the lower limit of its operational

range (from 0.5 to 1m., depending on the surface

characteristics).

We have experimentally demonstrated the Kinect

potential to detect small obstacles, a key aspect for

safety during a reactive navigation. Finally, a

number of tests have been conducted, validating the

suitability of the proposed methods.

ACKNOWLEDGEMENTS

This work has been supported by two projects:

“GiraffPlus”, funded by EU under contract FP7 -

ICT - #288173, and “TAROTH: New developments

toward a robot at home”, funded by the Spanish

Government and the “European Regional

Development Fund ERDF” under contract DPI2011-

25483.

Figure 10: Topological and metric maps used in the second experiment mapping 6 rooms connected by a large corridor. Red

nodes represent topological locations and blue lines the possible paths between them. Orders to the robot were to navigate

randomly between nodes during periods of 2 hours. The metric map was used for localization purposes.

-2 -1 0 1 2 3 4 5 6 7 8

Improving2DReactiveNavigatorswithKinect

399

REFERENCES

Biswas, J., Veloso, M., 2011. Depth Camera based

Localization and Navigation for Indoor Mobile

Robots. In RGB-D Workshop at RSS 2011.

Blanco, J. L., Gonzalez, J., Fernandez-Madrigal, J.A.,

2008. Extending Obstacle Avoidance Methods

through Multiple Parameter-Space Transformations.

In Autonomous Robots, vol. 24 (1).

Borenstein, J., Koren, Y., 1989. Real-time Obstacle

Avoidance for Fast Mobile Robots. In IEEE

Transactions on Systems, Man, and Cybernetics, vol.

19, no. 5.

Cunha, J., Pedrosa, E., Cruz, C., Neves, A., Lau, N., 2011.

Using a Depth Camera for Indoor Robot Localization

and Navigation. In RGB-D Workshop at RSS 2011.

Fernández-Madrigal, J. A., Galindo, C., González, J.,

2004. Assistive navigation of a robotic wheelchair

using a multihierarchical model of the environment. In

Integrated Computer-Aided Engineering vol.11, no. 4.

Fiorini, P., Shiller, Z., 1988. Motion Planning in Dynamic

Environments using Velocity Obstacles. In The

International Journal of Robotics Research, vol. 17,

no. 7.

Freedman, B., Shpunt, A., Machline, M., Arieli, Y., 2010.

Depth Mapping using Projected Patterns. Patent No.:

US 2010/0118123 A1.

Garcia, J., Zalevsky, Z., 2008. Range Mapping Using

Speckle Decorrelation. Patent No.: US 7,433,024 B2.

Gonzalez, J., Galindo, C., Blanco, J.L., Fernandez-

Madrigal, J.A., Arevalo, V., Moreno, F.A., 2009.

SANCHO, a Fair Host Robot. A Description. In IEEE

International Conference on Mechatronics (ICM),

Malaga, Spain.

Holz, D., Lörken, C., Surmann H., 2008. Continuous 3D

Sensing for Navigation and SLAM in Cluttered and

Dynamic Environments. In International Conference

on Information Fusion of the (FUSION). Cologne,

Germany.

Hokuyo: http://www.hokuyo-aut.jp/

Kinect: http://www.xbox.com/en-US/kinect.

Khoshelham, K., 2011. Accuracy analysis of kinect depth

data. In ISPRS Workshop Laser Scanning, Calgary,

Canada.

Marder-Eppstein, E., Berger, E., Foote, T., Gerkey, B.,

Konolige, K., 2010. The Office Marathon: Robust

navigation in an indoor office environment. In IEEE

International Conference on Robotics and Automation

(ICRA).

Minguez, J., Montano, L., 2004. Nearness Diagram (ND)

Navigation: Collision Avoidance in Troublesome

Scenarios. In IEEE Transactions on Robotics and

Automation, vol.20, no.1.

PrimeSense: http://www.primesense.com.

Thrun, S., 2003. Learning occupancy Grid Maps with

Forward Sensor models. Autonomous Robots, n. 15.

ICINCO2013-10thInternationalConferenceonInformaticsinControl,AutomationandRobotics

400