VISUAL NAVIGATION FOR THE BLIND
Path and Obstacle Detection
J. Jos
´
e, J. M. H. du Buf and J. M. F. Rodrigues
Vision Laboratory, Institute for Systems and Robotics (ISR/IST)
University of the Algarve (FCT and ISE), 8005-139 Faro, Portugal
Keywords:
Path detection, Obstacle detection, Obstacle avoidance.
Abstract:
We present a real-time vision system to assist blind and visually impaired persons. This system complements
the white cane, and it can be used both indoor and outdoor. It detects borders of paths and corridors, obstacles
within the borders, and it provides guidance for centering and obstacle avoidance. Typical obstacles are
backpacks, trash cans, trees, light poles, holes, branches, stones and other objects at a distance of 2 to 5 meters
from the camera position. Walkable paths are detected by edges and an adapted Hough transform. Obstacles
are detected by a combination of three algorithms: zero crossings of derivatives, histograms of binary edges,
and Laws’ texture masks.
1 INTRODUCTION
Blind and visually impaired persons need to navigate
using the cane or, at best, an ultrasonic obstacle de-
tector. There are an estimated 180 million persons
with severe visual impairments of which 40-50 mil-
lion are completely blind, and every year 2 million
more become blind. The Portuguese project “Blav-
igator: a cheap and reliable navigation aid for the
blind” aims at developing a cheap portable GIS and
GPS-based navigation aid for the blind, for both out-
door and indoor navigation, with vision modules for
obstacle avoidance and object recognition.
There are a few recent systems for visually im-
paired users which may assist them in navigating,
with and without obstacle detection and avoidance.
One system integrates outdoor navigation with obsta-
cle detection (Lee et al., 2008). Another is the elec-
tronic travel aid called iSONIC (Kim et al., 2009).
The latter complements the conventional cane by de-
tecting obstacles at head height. (CASBliP, 2009) was
an EU-funded project. The main aim was to develop
a system capable of interpreting and managing real-
world information from different sources in order to
improve autonomous mobility. For a detailed descrip-
tion of the state-of-the-art see (du Buf et al., 2011).
The work presented here concerns one of the vi-
sion modules of the Blavigator project. It uses a stereo
camera fixed to the chest of the blind, at a height of
about 1.5 m from the ground. Results presented are
obtained by using only the right-side camera, and the
system performs equally well using a normal, inex-
pensive webcam with about the same resolution. The
resolution must be sufficient to resolve textures of
pavements and potential obstacles like holes with a
minimum size of about 10 cm at a distance of 2 to
5 meters from the camera. Our prototype works in
real-time on a normal netbook computer and is able
to detect borders of paths like outdoor sidewalks and
indoor corridors, and it can assist the user in navigat-
ing inside the borders. It also detects obstacles, both
indoor (trash cans, plant pots, backpacks and furni-
ture) and outdoor (light poles, tree branches, holes
and loose stones) inside the borders for alerting the
user and avoiding them.
2 PATH DETECTION
Paths are the areas where the user can walk in a cor-
ridor or on a sidewalk. These are delimited by left
and right border lines. The position where these lines
intersect is called the Vanishing Point (VP). Using
the Canny edge detector (Canny, 1986) in combina-
tion with an adapted version of the Hough transform
(Duda and Hart, 1972), it is possible to detect the bor-
ders and the VP.
Let I
I
(x,y) be an input frame with fixed width W
I
and height H
I
. Because of perspective projection, the
left and right path borders intersect at the vanishing
515
José J., Buf J. and Rodrigues J. (2012).
VISUAL NAVIGATION FOR THE BLIND - Path and Obstacle Detection.
In Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods, pages 515-519
DOI: 10.5220/0003711405150519
Copyright
c
SciTePress
Figure 1: Top row, from left: one input frame, its edge map I
b
with left and right borders in blue and green, the corresponding
AHS with magnified regions, and detected borders highlighted with the vanishing point VP (yellow circle) and the horizon
line HL (red). Bottom row: the different path and obstacle windows I
b
, PW, OW and ODW.
point; see the yellow circle in the top-right image of
Fig. 1. If the camera plane is exactly in the vertical
position, the horizon line HL (the red line in the same
image) is close to the middle of the frame. Since verti-
cal camera alignment is not fixed but varies over time
when the user walks, we use the VP in order to de-
termine the line: y
HL
= y
VP
. During an initialization
phase, i.e., the first 5 frames which last less than one
second, y
HL
= H
I
/2 is taken, after which the height
of HL is dynamically computed for each new frame
by averaging the heights of the previous 5 frames of
which the VP is known. The lower part of the image,
below HL, will contain path borders and obstacles on
the ground. This part, which is illustrated in Fig. 1
(bottom-left), will be analyzed.
In order to reduce CPU time, only grayscale in-
formation is processed after resizing the lower part
to a width of 300 pixels while maintaining the as-
pect ratio. Then two iterations of a smoothing filter
are applied in order to suppress noise. The Canny
edge detector is applied with σ = 1.0, T
l
= 0.25 and
T
h
= 0.5. The result is a binary edge image I
b
(x,y) of
width W
b
= 300 and height H
b
= y
VP
(W
b
/W
I
), with
x [W
b
/2,W
b
/2 1] and y [0,H
b
1], and the
origin of the coordinate system at the bottom-center
of the image. One path frame and I
b
are shown in the
leftmost images of Fig. 1.
We then use the Hough transform to search for
border candidates in the left and right halves of
image I
b
. As we want to check straight lines in
both halves using polar coordinates, the Hough trans-
form is applied to I
b
, yielding the adapted Hough
space I
AHS
(ρ,θ): we restrict the Hough space to
θ [20
o
,69
o
] [111
o
,160
o
] such that vertical and
almost vertical lines in the intervals θ [0
o
,20
o
]
[160
o
,180
o
] are ignored. The same is done for hor-
izontal and almost horizontal lines (θ [70
o
,110
o
]).
For maximizing the number of searched lines and
minimizing the computation time, we use ∆θ = 0.5
o
.
We use ∆ρ = cosθ for θ [20
o
,44
o
] and ∆ρ = sin θ
for θ [45
o
,69
o
]. Lines outside the image can
be discarded, so the biggest value of ρ occurs for
the straight lines that pass through the opposite cor-
ners relative to the bottom-center of the image (ρ
[0,(W
b
/2)cosθ + H
b
sinθ]).
For optimization purposes, we can compute the
lines for the Hough transform by “mirroring” the
left and right lines about the y axis: we calculate
the lines for θ [20
o
,69
o
] and, at the same time,
the lines for θ [111
o
,160
o
], as explained below.
The right border is denoted by L
ρ,θ
(x
r
,y
r
) and the
left one by L
ρ,πθ
(x
l
,y
l
). For L
ρ,θ
we use x
r
=
(ρ y
r
sinθ)/cosθ and y
r
= (ρ x
r
cosθ)/sinθ. For
L
ρ,πθ
we use x
l
= x
r
1 and y
l
= y
r
. This results
in a reduction of CPU time of about 50%, as we need
little more than half of all computations involved in
building the normal Hough space.
The I
AHS
space is filled by checking the pixels in
I
b
from top to bottom: left-to-right for the right bor-
der (L
ρ,θ
) and right-to-left for the left border (L
ρ,πθ
).
As for the normal Hough space, I
AHS
is a histogram
which is used to count the co-occurrences of aligned
pixels in the binary edge map I
b
. However, longer
sequences of edge pixels count more than short se-
quences or not-connected edge pixels. A run of n
connected edge pixels is counted by applying P
n
=
P
n1
+2 with P
1
= 1, and will contribute n
2
to the rel-
evant I
AHS
bin. The final value of an I
AHS
(ρ,θ) bin is
the sum of the P
n
values of all sequences of ON pix-
els, each sequence having at least one ON pixel. The
AHS is shown in Fig. 1, 1st row 3rd column, with the
detected maxima in blue and green. The correspond-
ing borders are also shown colored in the edge map
(Fig. 1, 2nd column at top).
ICPRAM 2012 - International Conference on Pattern Recognition Applications and Methods
516
For selecting the path borders we analyze the suc-
cessive highest values of I
AHS
for each frame, in each
half, starting with the highest ones. This results in
an intersection point. If the intersection point of the
next left and/or right value(s) has a smaller Euclid-
ian distance to the VP of the previous frame, we con-
tinue checking the next value(s) on the corresponding
side(s). If there is no pair with a smaller distance,
the current value is selected. After both values are
selected, their intersection corresponds to the frame’s
VP. In this search, all combinations of left and right
border candidates are considered. If in the left or right
regions where the I
AHS
values are checked there is no
maximum which corresponds to at least one sequence
of not less than 10 connected ON pixels, it is checked
again without region restrictions. If still no corre-
spondence can be found, the border is considered not
found for that side. In this case, the average of the last
5 borders found is used. If after 5 consecutive frames
one or both borders are not found, the user is warned.
The HL value remains the same, and the scanned win-
dow expands to the left- and/or rightmost column ac-
cording to the missing border(s). In such a case the
user knows that the vision module temporarily cannot
detect a valid path, but he can continue walking using
the white cane, and the module will still check for ob-
stacles in front. Below, the walkable path is called the
Path Window (PW). It consists of the triangle delim-
ited by the left and right borders in I
b
, as shown in the
second-from-left images in Fig. 1.
3 OBSTACLE DETECTION
The PW is wider than the area in front where the
blind person will walk and where obstacles must be
detected. Hence, the PW is narrowed by drawing new
lines through the VP: the positions of the original bor-
ders of the PW at the bottom line are shifted right (left
border) and left (right border), by 5% of W
b
, which re-
sults in a narrower triangle. In addition, the height of
the window can be reduced because the top of the tri-
angle (near VP) is too far away. Hence, the height
is redefined for a corresponding range of at least 5
meters from the user, i.e., H
OW
= 2/3 × y
VP
, which
yields a trapezoid with parallel top and bottom lines.
This is called the Obstacle Window (OW, I
OW
); see
Fig. 1, 2nd row 3rd column, and Fig. 2, the bright
area in the top-left image.
For obstacles in the immediate neighborhood be-
yond the reach of the white cane we have to consider
distances between 2 and 5 m in front of the user, tak-
ing into account the height of the camera and perspec-
tive projection of the lens. However, the resolution at
the bottom of the image is higher than that at the top
or even at the VP. Therefore, we define a new Obsta-
cle Detection Window (ODW) and use interpolation
to correct image resolution; see Fig. 1 (bottom-right);
also Fig. 2, the 2nd and 4th images from left on the
top row in the case of the OW in the frames to their
left. Hence, the trapezoidal OW is converted to the
rectangular ODW by maintaining the resolution at the
top of OW but reducing it at the bottom.
At this point we must stress that our system will
not detect obstacles at a distance of less than 2 m from
the user, because of two reasons: (i) The user has al-
ready been alerted to a looming obstacle at a larger
distance and advised to adapt path trajectory. (ii) The
user will always check a detected obstacle using the
white cane at short distance.
To the ODW region we also apply a lowpass filter
with a 3 ×3 kernel in order to suppress noise. An ex-
ample is shown in the top-right image of Fig. 2. This
is the filtered ODW of the image to its left, and this
will be used for obstacle detection.
Below we explain the three obstacle detection
algorithms: (a) Zero-Crossing Counting, (b) His-
tograms of Binary Edges and (c) Laws’ Texture En-
ergy Masks. If at least two of these detect an anomaly
at about the same position, an obstacle is assumed. At
least 3 successive frames are required to confirm the
presence of an obstacle and before alerting the user.
(a) In Zero-Crossing Counting, we compute
derivatives in x and y in I
ODW
, using a large kernel
K = (1,1,1,0,1,1,1). Then we sum the am-
plitudes of the maxima and minima near every zero-
crossing (ZC): each time the derivative changes sign,
we look for the minimum and maximum value on
both sides and sum the absolute values. For analyz-
ing variations on lines we use the x derivative, and
for columns we use the y derivative. This is done
for every line and column in the ODW. These arrays
are then filtered twice with a 7 × 1 smoothing kernel.
Filtered values below 3 in the histograms are due to
noise and are removed. All parameters and kernels
were determined experimentally using different test
sequences. Examples of x and y derivatives are shown
in the 1st and 2nd images on the 2nd row of Fig. 2,
with the two histograms in black.
Thresholds are applied to the histograms in order
to remove “noise” caused by the texture of the pave-
ment. Let T
max/ min,dx/dy
denote the maximum and
minimum thresholds of the derivatives in x and y, and
i be the frame index of a sequence. Values in the in-
terval [T
min,dx/dy
,T
max,dx/dy
] are set to zero, as these
are caused by a textured pavement. In the first frames
during initialization (i 5), we must assume that no
obstacle is present and T
max,dx/dy
will be set to the
VISUAL NAVIGATION FOR THE BLIND - Path and Obstacle Detection
517
Figure 2: Top row, from left: two frames with the OW highlighted, their rectangular ODWs to their right, and the result of
lowpass filtering. The latter is used for obstacle detection. Middle row, from left: x and y derivatives with histograms in black
and detected obstacle region; the histograms of horizontal and vertical edges in gray and detected obstacle region. Bottom
row, from left: the local energy images of Laws’ masks E5L5, R5R5, E5S5 and L5S5, the detected obstacle region, and the
final result, after combining the three methods, in the input frame.
corresponding maximum values of I
ODW,dx/dy
. Simi-
larly, T
min,dx/dy
are determined. Here the entire ODW
is used.
After initialization (i 6), the same method is
applied, but the maxima and minima are calculated
using only the bottom half of the window, i.e., x
[W
ODW
/2,W
ODW
/2 1] and y [0, H
ODW
/2]. If
the maximum of a derivative of the current frame
i is higher than the maximum threshold of the pre-
vious frame, max
i
> T
max,i1
, or the minimum is
lower than the minimum threshold, min
i
< T
min,i1
,
then T
max/ min,i
= max
i
/min
i
; otherwise T
max/ min,i
=
(max
i
/min
i
+T
max/ min,i1
)/2. This allows the sys-
tem to adapt to different types of pavements while
walking, but it is still sensitive to deviations.
This procedure yields a binary image resulting
from the multiplication of the thresholded and back-
projected line and column histograms. The 3rd image
on the 2nd row of Fig. 2 shows the result as a bright
rectangular area in the ODW.
(b) In the Histograms of Binary Edges algorithm
Canny’s edge detector is applied to I
ODW
, with the
same σ and T
l
as used in path detection, but with
T
h
as described below. Canny’s algorithm is used to
calculate first derivatives dx and dy, the edge magni-
tude I
ODWc,mag
= (dx
2
+ dy
2
)
1/2
, and the edge orien-
tation I
ODWc,θ
= arctan (dy/dx). This information is
used as follows: (b-i) To compute dynamically the
high hysteresis threshold T
h
. During initialization,
i 5, the maximum value of T
h
= max
i
[I
ODWc,mag
]
is used. For i 6, if the maximum in the bottom
half of I
ODWc,mag
(x,y), (x [W
ODW
/2,W
ODW
/21]
and y [0,H
ODW
/2]), is higher than the maximum
threshold of the previous frame (max
i
> T
h,i1
), then
T
h,i
= max
i
; otherwise T
h,i
= (max
i
+T
h,i1
)/2. As in
algorithm (a), this allows to adapt to different types of
pavements.
(b-ii) As we want to determine the region where
the obstacle is, we split the resulting Canny edge
map I
ODWc
using the edge orientations. This yields
two images I
ODWc,V/H
with the vertical and horizon-
tal edges. For this we check the angles in I
ODWc,θ
.
We apply four angle intervals: for horizontal edges
θ
H
[67.5
o
,67.5
o
] [112.5
o
,247.5
o
] and for verti-
cal ones θ
V
[22.5
o
,157.5
o
] [202.5
o
,337.5
o
]. The
edge pixels in I
ODWc
for which the angles are in the
θ
H
and θ
V
intervals are stored in I
ODWc,H/V
.
For locating the region where an obstacle might
be, edge histograms are filled along columns and lines
in I
ODWc,V
and I
ODWc,H
. The 4th and 5th images on
the 2nd row of Fig. 2 show the edge maps in black
and the histograms in gray at the borders. All bins
in both histograms with values below 2 are discarded
for a better localization with left-right and top-bottom
limits. The final image results from the multiplication
of the back-projected histograms. This is shown by
the bright rectangular area in the rightmost image on
the 2nd row of Fig. 2.
(c) The third algorithm is based on Laws’ Texture
Energy Masks (Laws, 1980) applied to I
ODW
. The
main idea is to detect changes of the image’s textures
in the window. If the frames contain textures before
an obstacle enters the ODW, these will not be detected
through the use of a threshold value. As in (Laws,
1980), our tests showed that the best masks are E5L5,
R5R5, E5S5 and L5S5. Below the masks are denoted
ICPRAM 2012 - International Conference on Pattern Recognition Applications and Methods
518
Figure 3: Typical results of in- and outdoor test sequences. Only a few frames are shown.
by subscript lm.
After filtering with the masks, which results in
4 images I
ODW,lm
, we compute the energy measures
E
lm
(x,y) =
5
i, j=5
I
ODW,lm
(x + i,y + j)
2
. The four
energy images are then normalized using the maximal
energy responses that each mask can achieve, such
that each mask contributes equally in final detection.
The four normalized energy images are then summed
and the maximum value is determined. All values
above 4% of the maximum are considered to be due
to a possible obstacle. The rest of the processing is
equal to the processing as described in Zero-Crossing
Counting. The bottom row in Fig. 2 shows, from left,
the four E
lm
and the summed and thresholded result.
The final result of the 3 combined methods is also
shown in Fig. 2 (bottom-right). The region where the
regions detected by at least 2 algorithms overlap is
highlighted. More results are shown in Fig. 3.
In summary, if an obstacle is detected (i) in at least
3 consecutive frames, (ii) by at least 2 of the 3 algo-
rithms, and (iii) with regions in the ODW whose in-
tersections are not empty, the user will be alerted.
4 CONCLUSIONS
Computationally fast methods intended to run in real-
time are not easy to develop, as we do not want to
sacrifice performance. All algorithms presented here
can run on an inexpensive netbook at more than 5 fps,
with very satisfying results. In all sequences tested
so far, some with complex path and pavement struc-
tures, paths were correctly detected. Also most sim-
ple and complex obstacles were detected, only failing
when the obstacles were too similar to the pavement
or when multiple textures were present, yielding false
negatives and positives. A few examples of sequences
(not all frames) are shown in Fig. 3, where the user
can be alerted well in advance. The system is now be-
ing extended to cope also with sharp turns and other
frequent complications, and extensive field tests in-
volving blind users are being prepared with ACAPO,
the Portuguese association for blind and amblyopes,
in the project coined Blavigator, from Blind Naviga-
tor.
It should be stressed once more that the vision sys-
tem will complement the white cane beyond its reach;
it is not intended to replace the cane. In addition,
it only serves local navigation for path and obstacle
negotiation. Global GPS/GIS-based navigation will
complement the vision system (du Buf et al., 2011),
leading to improved and autonomous mobility.
ACKNOWLEDGEMENTS
This research was supported by the Portuguese Foun-
dation for Science and Technology (FCT), through
the pluriannual funding of the Institute for Systems
and Robotics (ISR/IST) through the PIDDAC Pro-
gramme funds, and by the FCT project Blavigator
(RIPD/ADA/109690/2009).
REFERENCES
Canny, J. (1986). A computational approach to edge de-
tection. IEEE Trans. Pattern Anal. Mach. Intell.,
8(6):679–698.
CASBliP (2009). Final activity report of the EU-funded
CASBliP project. http://casblipdif.webs.upv.es/.
du Buf, J., Barroso, J., Rodrigues, J., Paredes, H., Farrajota,
M., Fernandes, H., Jos
´
e, J., Teixeira, V., and Saleiro,
M. (2011). The smartvision navigation prototype for
blind users. JDCTA: International Journal of Digital
Content Technology and its Applications, 5(5):351–
361.
Duda, R. and Hart, P. (1972). Use of the hough transform
to detect lines and curves in pictures. Comm. ACM,
15:11–15.
Kim, L., Park, S., Lee, S., and Ha, S. (2009). An electronic
traveler aid for the blind using multiple range sensors.
IEICE Electronics Express, 11(6):794–799.
Laws, K. (1980). Textured image segmentation. PhD thesis,
USCIPI Report 940, Image Processing Inst., Univ. of
Southern California, Los Angeles (USA).
Lee, S., Kang, S., and Lee, S. (2008). A walking guid-
ance system for the visually impaired. Int. J. Pattern
Recogn. Artif. Intell, 22(6):1171–1186.
VISUAL NAVIGATION FOR THE BLIND - Path and Obstacle Detection
519