3.2.1 Initial Learning Phase
The application of a reinforcement learning method
is always coupled with the question, how much ex-
ploration and exploitation should be granted to the re-
inforcement learning agent. Typically, the reinforce-
ment learning system starts with a random policy for
maximum exploration. In this phase, we will use al-
ready known and classified objects from freely avail-
able data sets, e. g. from the large-scale hierarchical
multi-view RGB-D object dataset of the University
of Washington (Lai et al., 2011), since the decision
whether the final category matches is straightforward.
If the determined object class matches the input ob-
ject class, the reward is a value ∈ [0, 1] depending on
the accumulated computation time left with respect to
t
max
. Otherwise the reward is 0, too.
3.2.2 Classification Phase and Adaptive
Learning
If the q-values get more stable, the exploration is com-
monly reduced in favor of exploitation. This method
is called ε-greedy, meaning that most of the time those
actions are selected, where the expected reward is
maximized, but with the probability of ε a random
action is selected. However, instead of selecting sin-
gle actions randomly, we will adopt this concept to
completely random policy episodes with the follow-
ing behavior: during the regular classification of (un-
known) objects we will use a max-q policy selecting
always that action with the highest proposed q-value.
In these episodes no modifications of the q-values
will be made – and thus a reward is irrelevant. Oc-
casionally we will integrate random policy episodes
with known objects. In this way, the system adapts to
changes of the environment over time and allows us
to add new descriptors to the system.
3.2.3 Handling New Categories
A big advantage of our approach consist in the fact
that the number of object classes the system can rec-
ognize can increase dynamically. If no object class
candidate is left during the classification phase, an
explicit comparison with multiple accurate descrip-
tors such as PFH and SHOT is envisaged. In case,
the object can still not be assigned to one of the ob-
ject classes at hand, a new unlabeled class is created,
which means that the reinforcement learning agent
has learned a new object class autonomously. New
classes, of course, should be labeled and from time to
time reviewed for consistency.
3.2.4 Evaluation
To evaluate the results, we will determine the recog-
nition rates for each local 3-D feature descriptor al-
gorithm individually, using the basic classification
pipeline proposed in section 3.1. Subsequently, the
individual results can be compared with the results of
our adaptive 3-D object classification approach.
4 CONCLUSIONS
This proposal suggests a system which learns a strat-
egy to select and apply 3-D point cloud descriptor al-
gorithms with the goal to classify a point cloud with
high accuracy within a preset time limit. The pro-
posed approach is based on a reinforcement learn-
ing system with a 3-D classification pipeline in se-
lecting local 3-D feature descriptor algorithms. Due
to properties of reinforcement learning we expect the
approach to be highly adaptive, e. g., allowing the in-
tegration of new descriptors and the on-line learning
of new object categories.
REFERENCES
Alexandre, L. A. (2012). 3d descriptors for object and cate-
gory recognition: a comparative evaluation. In Work-
shop on Color-Depth Camera Fusion in Robotics at
the IEEE/RSJ International Conference on Intelligent
Robots and Systems (IROS), Vilamoura, Portugal.
Cholewa, M. and Sporysz, P. (2014). Classification of dy-
namic sequences of 3d point clouds. In Artificial Intel-
ligence and Soft Computing, pages 672–683. Springer.
Filipe, S. and Alexandre, L. A. (2013). A comparative eval-
uation of 3d keypoint detectors. In 9th Conference on
Telecommunications, Conftele 2013, pages 145–148,
Castelo Branco, Portugal.
Frome, A., Huber, D., Kolluri, R., Bulow, T., and Malik,
J. (2004). Recognizing objects in range data using
regional point descriptors. In Proceedings of the Eu-
ropean Conference on Computer Vision (ECCV).
Guo, Y., Bennamoun, M., Sohel, F., Lu, M., and Wan, J.
(2014). 3d object recognition in cluttered scenes with
local surface features: A survey. IEEE Transactions
on Pattern Analysis and Machine Intelligence, 36(11).
Johnson, A. E. and Hebert, M. (1998). Surface matching
for object recognition in complex three-dimensional
scenes. Image and Vision Computing, 16(9):635–651.
Lai, K., Bo, L., Ren, X., and Fox, D. (2011). A large-
scale hierarchical multi-view rgb-d object dataset. In
Robotics and Automation (ICRA), 2011 IEEE Interna-
tional Conference on, pages 1817–1824. IEEE.
Rusu, R., Blodow, N., and Beetz, M. (2009). Fast point fea-
ture histograms (fpfh) for 3d registration. In Robotics
ICINCO2015-12thInternationalConferenceonInformaticsinControl,AutomationandRobotics
384