Filter. At last, the performances are first evaluated us-
ing an advanced real-time robotic simulator (Burtin
et al., 2016), then with a real robot/sensor set-up.
2 RELATED WORK
Using segments to perform SLAM processing have
been explored either with a lone lidar (Roumelio-
tis and Bekey, 2000), (Choi et al., 2008) or camera
alone (Huang, 2008), (Lemaire and Lacroix, 2007),
(Micusik and Wildenauer, 2015), (Zuo et al., 2017).
SLAMs are generally split between sparse and dense
vision algorithms. Sparse SLAMs are using only a
few salient features in the image to compute the lo-
calization, for example the ORB-SLAM (Mur-Artal
et al., 2015). Each feature is represented and stored
specifically in the map to be used later as reference
in the localization algorithm. These types of process
only need a small percentage of the pixels from the
entire image to be tracked, while dense methods use
almost all the pixels. Because dense methods such
as DTAM (Newcombe et al., 2011) uses every pixels
from the image, users need a powerful hardware to
perform all the operations in real-time. Most of the
time, GPU processing is used to improve the compu-
tation speed. RGB-D (Kinect, Xtion) and depth cam-
era sensors brought new SLAMs systems (Engelhard
et al., 2011), (Schramm et al., 2018), with new ap-
proaches. They avoid the issue of initialization from
unknown range for the features. Monocular cameras
have the weakness to be unable, using only one frame,
to obtain the distance between an object and the cam-
era for its given pixel in the image (scale effect). LSD
(Von Gioi et al., 2012) is massively used by monoc-
ular and stereo-vision SLAMs systems (Engel et al.,
2014) or (Pumarola et al., 2017) but this line segment
detector is too generic and extracts all segments avail-
able while processing only B&W images: the pro-
cess is not optimized enough. Moreover, the RGB to
B&W conversion is a potential danger of missing gra-
dients because of the gray level conversion method.
The idea to use both lidar and monocular camera has
been more usually applied to mobile objects detec-
tion and tracking (Premebida et al., 2007), (Asvadi
et al., 2016), (De Silva et al., 2018) but more rarely
to localization itself. In our case, we will focus on
the detection of vertical lines in the camera because
they are commonly found and invariants regarding our
environment. The common slam, using these types
of features are commonly referred as ”bearing” only
slam (Bekris et al., 2006), they are proven effective in
minimalist set-ups with simple environments (Huang,
2008), (Choi et al., 2008), (Zuo et al., 2017).
In order to extract these vertical segments in the
image (which are supposed to be the projection of 3D
vertical structures of the scene: doors, angles of corri-
dors), we are looking for classical edge segmentation
composed of well known steps (Nachar et al., 2014):
1. gradient computation ;
2. thinning ;
3. closing ;
4. linking ;
5. and polygonal approximation.
The gradient computation being the most time
consuming step, we will detail the choice of the
appropriate algorithm in this section, its adaptation
and its time optimized implementation (Cabrol et al.,
2005) in the next section.
The gradient computation algorithms could be
classified into three categories, according to their
complexity, and the size of the neighborhood:
• 2 × 2: Roberts (Roberts, 1965) proposed in the
60’s;
• 3 × 3: Prewitt (Prewitt, 1970), Sobel (Sobel,
1978) and Kirsh (Kirsch, 1971) in the 70’s;
• JF Canny ((Canny, 1983)) (in OpenCV),
R.Deriche ((Deriche, 1987)) in the 80’s.
Although better quality results are obtained by algo-
rithms of the last category, the best compromise be-
tween the quality of results and the sum of computa-
tions, for our real-time and embedded robotic appli-
cation, is given by the second one.
The principle of Prewitt and Sobel algorithms is to
compute the projection of the gradient
−→
G on the axis
of the image:
−→
u
x
and
−→
u
y
first and, then, to perform a
rectangular to polar transformation to obtain:
• the gradient magnitude, which is the reflect of the
transition between two regions;
• the gradient argument, which is orthogonal to the
local edge direction. This information is reduced
to the knowledge of direction of the neighbor
pixel in the orthogonal direction of the edge for
the thinning step, ie. four directions.
||
−→
G || =
q
(
−→
G .
−→
u
x
)
2
+ (
−→
G .
−→
u
y
)
2
Arg(
−→
G ) = arctan
−→
G .
−→
u
y
−→
G .
−→
u
x
!
In order to obtain quite same results, and avoid
the computations (in double floats if done without ap-
proximation) of the rectangular to polar transform,
Kirh introduced two diagonal direction of projection:
ICINCO 2019 - 16th International Conference on Informatics in Control, Automation and Robotics
24