3.2 CNN Architecture
For the training the following CNN architecture was
chosen (this architecture is similar to the architecture
used by Kang et al in (Kang et al, 2014), which
proved effective for image fragment classification of
similar size):
Three convolution layers: 8, 16 and 32 filters (size
3x3, stride 1), ReLu activation.
Three Max pooling layers (one after each
convolution layer): first two layers have 2x2
pools, third is 3x3. Stride: 2.
Adam optimizator, categorical cross entropy loss,
learning rate = 0.001.
After convolution and Max pooling layers follows
fully connected 50-unit layer with ReLu
activation and Dropout layers (0.4 dropout). Fully
connected layer is followed by output layer with
two outputs (‘Good’ and ‘Poor’) with softmax
activation.
This CNN architecture proved to be quickly
trainable and returned reasonable results given the
noisy training and testing data.
3.3 Fragment Extraction and Tagging
From the pre-processed maps the fragments for
training, validation and testing were extracted. When
extracting fragments, the following parameters were
considered:
Fragment size. Smaller size is useful when
considering local quality of map (resolution and
local noise) while larger fragment size can better
represent the structural quality of the map.
Fragment size of 32x32 was chosen similar to the
work by Kang et al (Kang et al, 2014).
Minimum rate of significant cells. Generally large
areas of occupancy grid maps consist of cells with
‘unknown’ values, which represent unobserved
environment. Only cells, which contain
significant information (occupied and free parts of
environment), should be used for quality
evaluation. The minimum rate was chosen to be
0.4 (40% of all cells), but anything from 0.3 to 0.6
is reasonable (these rates are both representative
and able to represent border areas of the
environment).
Minimum rate of occupied cells. It is difficult to
determine the map quality just from free space
representation. Occupied cells provide the most
important information about the location of the
obstacles, and at least some part of the fragment
should contain occupied cells. Rate 0.025 (2.5%
of all cells) was chosen as the minimum rate
where the fragment contained enough occupied
cells to be evaluated by human expert.
The tagging of fragments was performed
manually. For each extracted fragment, a human
expert evaluated whether it is belongs to the class
‘good’ or ‘poor’. Only two classes are used to classify
each fragment, because it is difficult enough for the
expert to divide the data set in two classes, and more
classes would make such a task even more
complicated.
It must be noted that expert evaluation is
inherently subjective and based on the preferences of
the expert. It has the benefit of introducing desirable
properties in evaluation but is also prone to human
error introduced noise. If such subjectivity is
undesirable, then the expert evaluation can be
replaced with more formal metrics assuming that the
ground truth maps are available, e.g. by using map
quality evaluation metric in (Varsadan et al, 2008).
4 EXPERIMENTAL RESULTS
To train and test the CNN, data set of 37 various
quality maps was collected from several open source
data sets.
Pre-2014 Robotics 2D-Laser Datasets
(http://www.ipb.uni-bonn.de/datasets/): MIT
CSAIL (C. Stachniss), Freiburg Campus (C.
Stachniss, G. Grisetti), Intel Research Lab (D.
Haehnel), Seattle UW (D. Haehnel), MIT Infinite
Corridor Dataset (M. Bosse, J. Leonard), Orebro
(H. Andreasson, P. Larsson, T. Duckett),
Belgioioso castle (D. Haehnel), FHW (D.
Haehnel), ACES3 Austin (P. Beeson), Edmonton
(N. Roy), Freiburg, Building 079 (C. Stachniss),
Acapulco Convention Center, Mexico (N. Roy).
Radish: Robotics Research Datasets (Howard and
Roy, 2015): sdr_site_b (A. Howard), stanford-
gates1 (B. Gerkey), intel_oregon (M. Batalin),
ubremen-cartesium (C. Stachniss), csc-mezzanine
(A. Howard), usc-sal200-021120 (A. Howard).
Robot@Home Dataset (Ruiz-Sarmiento et al,
2017).
Data set also includes several unpublished maps
collected in Riga Technical university.
From each map, 20 random map fragments for
CNN training and 8 fragments were extracted for
testing and validation (4 for each). The decision to use
the same maps for training and testing was made due
to the limited amount of available occupancy grid
maps (in total 37 maps). Initial tests showed that
using too few maps (10 out of 37) for validation led