Adaptive 3-D Object Classiﬁcation with Reinforcement Learning

Jens Garstka and Gabriele Peters

Human-Computer Interaction, Faculty of Mathematics and Computer Science, University of Hagen,

D-58084, Hagen, Germany

Keywords:

3-D Object Classiﬁcation, Reinforcement Learning.

Abstract:

We propose an adaptive approach to 3-D object classiﬁcation. In this approach appropriate 3-D feature de-

scriptor algorithms for 3-D point clouds are selected via reinforcement learning depending on properties of the

objects to be classiﬁed. This approach is supposed to be able to learn strategies for an advantageous selection

of 3-D point cloud descriptor algorithms in an autonomous and adaptive way, and thus is supposed to yield

higher object classiﬁcation rates in unfamiliar environments than any of the single algorithms alone. In addi-

tion, we expect our approach to be able to adapt to subsequently added 3-D feature descriptor algorithms as

well as to autonomously learn new object categories when encountered in the environment without further user

assistance. We describe the 3-D object classiﬁcation pipeline based on local 3-D features and its integration

into the reinforcement learning environment.

1 MOTIVATION

In the ﬁeld of human-robot interaction it is essential

for an interactive system to recognize and classify ob-

jects in its environment. A number of 3-D object clas-

siﬁcation methods exist, which are successful in de-

ﬁned contexts such as scene understanding, naviga-

tion, or applications in robotics like grasping or ma-

nipulation. But none of them is ﬂexible enough to sat-

isfy the needs for arbitrary environments or in several

different applications or contexts. At the same time

in the ﬁeld of machine learning approaches have been

developed that are able to adapt to dynamic changes

in environments and to learn behavioral strategies au-

tonomously. One of these approaches is represented

by methods of reinforcement learning. It seems rea-

sonable to overcome some of the inﬂexibilities of

state-of-the-art 3-D object classiﬁcation methods by

an appropriate fusion with an reinforcement learning

approach.

There are different ways, how 3-D objects can be

recognized and classiﬁed, e. g., by direct matching of

global 3-D point cloud descriptions, pairwise match-

ing of local descriptions with clustering, or dictio-

nary based approaches, where local descriptions are

summarized in histograms, before they are compared.

Over the years local 3-D feature descriptor algorithms

have emerged as the most promising means to com-

pare 3-D point clouds.

In the present work we will focus only on lo-

cal 3-D feature descriptor algorithms for point clouds

without additional structural information like trian-

gle meshes and surfaces. These local 3-D feature de-

scriptor algorithms differ considerably with respect to

recognition rates and computation times. However,

in general the computational costs of calculation and

comparison for local feature descriptors are high. A

comparison of recent algorithms (Alexandre, 2012)

and a survey of local feature based approaches for 3-D

object recognition (Guo et al., 2014) show, that there

is not a single best algorithm in the domain of 3-D

object recognition and classiﬁcation for all purposes.

This raises the question which descriptors should be

used in which cases. Furthermore, the question arises

whether a combination of two or more different algo-

rithms leads to better results.

This proposal provides a concept which may of-

fer answers to these questions. We present a method,

where we use reinforcement learning to learn se-

quences of 3-D point cloud descriptors to obtain high

classiﬁcation rates. Section 2 gives a short overview

over the required components and section 3 describes

our approach in detail.

381

Garstka J. and Peters G..

Adaptive 3-D Object Classiﬁcation with Reinforcement Learning.

DOI: 10.5220/0005563803810385

In Proceedings of the 12th International Conference on Informatics in Control, Automation and Robotics (ICINCO-2015), pages 381-385

ISBN: 978-989-758-123-6

 2015 SCITEPRESS (Science and Technology Publications, Lda.)

2 RELATED WORK

2.1 Reinforcement Learning

In general, reinforcement learning (Sutton and Barto,

1998) describes a class of machine learning algo-

rithms, in which an agent tries to achieve a goal by

trial and error. The agent acts in an environment and

learns to choose optimal actions in each state of the

environment. The strategy of chosen actions is called

policy. Further, it is assumed that the goals of the

agent can be deﬁned by a reward function that assigns

a numerical value to each distinct action the agent

may take in each distinct state of the environment. In

this environment the task of the agent is to choose and

apply one of the available actions in the current state.

This changes the environment which leads the agent

to the next state, and the agent can observe the con-

sequences (the immediate reward). While repeating

these steps the reinforcement learning agent can learn

a policy. Typically, it is desired to ﬁnd a policy that

maximizes the accumulated reward.

Watkins (Watkins, 1989) introduced a reinforce-

ment learning algorithm called Q-learning. In his

method the agent exists within a world that can be

modeled as a Markov Decision Process, consisting of

a ﬁnite number of states and actions. In each step

the agent performs one of the available actions, ob-

serves the new state, and receives the reward. This

reward and the expected future reward result in the

so-called quality-value (q-value). With ongoing iter-

ations all q-values for each possible state-action pair

will be approximated. Watkins and Dayan (Watkins

and Dayan, 1992) proved that the discrete case of Q-

learning will converge to an optimal policy under cer-

tain conditions. A series of survey articles of rein-

forcement learning methods is collected in (Wiering

and Van Otterlo, 2012).

2.2 3-D Keypoint Detectors

3-D keypoint detectors are essential for local 3-D fea-

ture based approaches. They reduce the computa-

tional complexity by identifying those regions of 3-D

point clouds, which are interesting for descriptors in

terms of high informational density. There has been

a lot of research in this ﬁeld in the last few years.

A good overview of keypoint detector performances

is provided in the comparative evaluations of Salti

et al. (Salti et al., 2011) and Filipe and Alexandre

(Filipe and Alexandre, 2013). Additionally, there is

an overview of the available 3-D keypoint detection

algorithms in (Guo et al., 2014). For our approach,

we will choose the intrinsic shape signatures (Zhong,

2009) based on the results of Salti et al. and Felipe

and Alexandre, where it yielded best results in terms

of repeatability.

2.3 Local 3-D Feature Descriptors

The number of published local 3-D feature descrip-

tor algorithms has grown considerably in the last

few year. But there are two mentionable early ap-

proaches, the splash (Stein and Medioni, 1992) and

the widely used spin image (Johnson and Hebert,

1998). These two methods stand for numerous sub-

sequently developed algorithms that can be grouped

into two categories: signature and histogram based

methods. Alexandre and Gou et al. (Alexandre, 2012;

Guo et al., 2014) provide extensive comparisons of

state of the art local 3-D feature descriptor algorithms.

The methods that we will use for an initial implemen-

tation will be the spin images, the 3-D shape con-

text (3DSC) (Frome et al., 2004), the persistent fea-

ture histogram (PFH) (Rusu et al., 2008), the fast

point feature histogram (FPFH) (Rusu et al., 2009),

the unique signatures of histograms (USC) (Tombari

et al., 2010b), and the unique shape context (SHOT)

(Tombari et al., 2010a).

2.4 Classiﬁcation

As already mentioned, there are different ways to

match local 3-D features of an unknown object under

inspection against previously experienced features.

For our approach, it is assumed that the existing point

clouds have already been segmented. Thus, the local

feature descriptions can be used directly for classiﬁ-

cation without a previous clustering (hypothesis gen-

eration). In this case bag-of-words models with quan-

tized local descriptors as used in recent classiﬁca-

tion pipelines (Toldo et al., 2010; Wu and Lin, 2011;

Cholewa and Sporysz, 2014) can easily be used. In

these pipelines a directory of keywords (quantized

feature descriptions) is determined by clustering. By

ﬁlling the keywords found with a nearest neighbor

search for all local 3-D feature descriptions into a his-

togram, the number of each occurrence is counted. Fi-

nally, these histograms for an object can be learned

and/or classiﬁed by support vector machines (SVM)

or random trees (RT).

3 ADAPTIVE 3-D OBJECT

CLASSIFICATION

As indicated in the introduction, the main goal of

our approach is the autonomous learning of an opti-

ICINCO2015-12thInternationalConferenceonInformaticsinControl,AutomationandRobotics

382

point cloud keypoints features bag of words classiﬁcation

states

available algorithms

global properties

>1 object class candidates: new action selection

matching object class: reward [0,1]

q-values

policy

max-q / random

actions

algorithms

selected algorithm

Figure 1: The proposed classiﬁcation system. The basic classiﬁcation pipeline is shown in the lower half. The reinforcement

learning part used for the selection of a local 3-D feature descriptor algorithm (action) is shown above. During the classi-

ﬁcation phase the decision which algorithm should be selected ﬁrst/next (represented by the circle), will be made based on

the current state (the currently available actions and global properties) and the learned q-values (max-q). During the initial

learning phase the algorithms will be chosen randomly.

mized combination and application of various local

3-D feature descriptor algorithms with the purpose

to increase the overall classiﬁcation rate of 3-D point

clouds. Moreover, depending on some basic proper-

ties of a point cloud, the combinations of these algo-

rithms are supposed to vary. The steps, how we intend

to realize this, are described in the following sections.

3.1 The Basic Classiﬁcation Pipeline

The basic classiﬁcation pipeline consists of ﬁve main

steps. Given a 3-D point cloud we start with the

collection of global properties. These properties are

the total number of points, the point cloud resolution,

the eigenvalues to get the variance that is correlated

with each eigenvector and the dimensions of the point

cloud along the eigenvectors. These values will be

used by the reinforcement learning agent to select the

ﬁrst algorithm. In the second step, the intrinsic shape

signature keypoint detection algorithm (Zhong, 2009)

will be used to determine points of interest. During

the third step, one of the local 3-D feature descriptor

algorithms stated in section 2.3 will be applied. As a

result we will get a set of local 3-D feature descrip-

tions. Each of the determined feature descriptions is

quantized to be binned in a histogram (step four). In

the last step the values of the histogram will be used

as input vectors for a classiﬁer, i. e., for an SVM, to

identify an appropriate object class. This pipeline is

depicted in the lower half of ﬁgure 1.

3.2 Fusion with Reinforcement

Learning

The basic classiﬁcation pipeline will be enhanced by

a reinforcement learning agent.

States consist of sets of local 3-D feature descrip-

tor algorithms that have not already been used (a de-

scriptor will be applied only once) and the global

properties mentioned above. In order to achieve a ﬁ-

nite number of possible states, the continuous global

properties will be divided into a ﬁxed number of

classes.

Actions correspond to the selection of a local 3-D

feature descriptor algorithm and the application of the

last three steps of the basic pipeline (see ﬁgure 1),

i. e., the computation of local 3-D feature descriptions

based on the selected algorithm and of the histogram

which is fed as input vector into the SVMs of remain-

ing object classes. Usually the object class with the

highest score would be used as a single result. In our

case, using the SVMs as binary classiﬁers trained with

the responses −1 and 1, we will reject all classes with

a corresponding output < 0 and keep all classes ≥ 0

as object class candidates.

Without any restrictions the reinforcement learn-

ing agent would stop, if there is no remaining object

class or 3-D feature descriptor algorithm left. But this

“natural” termination is not desirable, since we pro-

pose a time limit t

max

how long a single object classiﬁ-

cation should take. Thus, the learning process breaks

down into episodic tasks that end in four possible ﬁ-

nal states. These states are reached if all algorithms

have been used once, if no object class is left, if exact

one object class is left, or if the accumulated computa-

tion time exceeds the predeﬁned time limit t

max

. In the

only remaining situation, when there is more than one

object class candidate left, the reinforcement learning

agent continuous with the next action, i. e., with the

selection of the next algorithm (see ﬁgure 1).

At this point we have to clarify among which con-

ditions the reinforcement learning agent gets an im-

mediate reward. The reward for all states except for

the ﬁnal states is 0. When the accumulated computa-

tion time is reached or no algorithm/object class is

left, the reward is also 0. For the remaining case,

when exact one object class is left, the reward depends

on the phase of the reinforcement learning system.

Adaptive3-DObjectClassificationwithReinforcementLearning

383

3.2.1 Initial Learning Phase

The application of a reinforcement learning method

is always coupled with the question, how much ex-

ploration and exploitation should be granted to the re-

inforcement learning agent. Typically, the reinforce-

ment learning system starts with a random policy for

maximum exploration. In this phase, we will use al-

ready known and classiﬁed objects from freely avail-

able data sets, e. g. from the large-scale hierarchical

multi-view RGB-D object dataset of the University

of Washington (Lai et al., 2011), since the decision

whether the ﬁnal category matches is straightforward.

If the determined object class matches the input ob-

ject class, the reward is a value ∈ [0, 1] depending on

the accumulated computation time left with respect to

max

. Otherwise the reward is 0, too.

3.2.2 Classiﬁcation Phase and Adaptive

Learning

If the q-values get more stable, the exploration is com-

monly reduced in favor of exploitation. This method

is called ε-greedy, meaning that most of the time those

actions are selected, where the expected reward is

maximized, but with the probability of ε a random

action is selected. However, instead of selecting sin-

gle actions randomly, we will adopt this concept to

completely random policy episodes with the follow-

ing behavior: during the regular classiﬁcation of (un-

known) objects we will use a max-q policy selecting

always that action with the highest proposed q-value.

In these episodes no modiﬁcations of the q-values

will be made – and thus a reward is irrelevant. Oc-

casionally we will integrate random policy episodes

with known objects. In this way, the system adapts to

changes of the environment over time and allows us

to add new descriptors to the system.

3.2.3 Handling New Categories

A big advantage of our approach consist in the fact

that the number of object classes the system can rec-

ognize can increase dynamically. If no object class

candidate is left during the classiﬁcation phase, an

explicit comparison with multiple accurate descrip-

tors such as PFH and SHOT is envisaged. In case,

the object can still not be assigned to one of the ob-

ject classes at hand, a new unlabeled class is created,

which means that the reinforcement learning agent

has learned a new object class autonomously. New

classes, of course, should be labeled and from time to

time reviewed for consistency.

3.2.4 Evaluation

To evaluate the results, we will determine the recog-

nition rates for each local 3-D feature descriptor al-

gorithm individually, using the basic classiﬁcation

pipeline proposed in section 3.1. Subsequently, the

individual results can be compared with the results of

our adaptive 3-D object classiﬁcation approach.

4 CONCLUSIONS

This proposal suggests a system which learns a strat-

egy to select and apply 3-D point cloud descriptor al-

gorithms with the goal to classify a point cloud with

high accuracy within a preset time limit. The pro-

posed approach is based on a reinforcement learn-

ing system with a 3-D classiﬁcation pipeline in se-

lecting local 3-D feature descriptor algorithms. Due

to properties of reinforcement learning we expect the

approach to be highly adaptive, e. g., allowing the in-

tegration of new descriptors and the on-line learning

of new object categories.

REFERENCES

Alexandre, L. A. (2012). 3d descriptors for object and cate-

gory recognition: a comparative evaluation. In Work-

shop on Color-Depth Camera Fusion in Robotics at

the IEEE/RSJ International Conference on Intelligent

Robots and Systems (IROS), Vilamoura, Portugal.

Cholewa, M. and Sporysz, P. (2014). Classiﬁcation of dy-

namic sequences of 3d point clouds. In Artiﬁcial Intel-

ligence and Soft Computing, pages 672–683. Springer.

Filipe, S. and Alexandre, L. A. (2013). A comparative eval-

uation of 3d keypoint detectors. In 9th Conference on

Telecommunications, Conftele 2013, pages 145–148,

Castelo Branco, Portugal.

Frome, A., Huber, D., Kolluri, R., Bulow, T., and Malik,

J. (2004). Recognizing objects in range data using

regional point descriptors. In Proceedings of the Eu-

ropean Conference on Computer Vision (ECCV).

Guo, Y., Bennamoun, M., Sohel, F., Lu, M., and Wan, J.

(2014). 3d object recognition in cluttered scenes with

local surface features: A survey. IEEE Transactions

on Pattern Analysis and Machine Intelligence, 36(11).

Johnson, A. E. and Hebert, M. (1998). Surface matching

for object recognition in complex three-dimensional

scenes. Image and Vision Computing, 16(9):635–651.

Lai, K., Bo, L., Ren, X., and Fox, D. (2011). A large-

scale hierarchical multi-view rgb-d object dataset. In

Robotics and Automation (ICRA), 2011 IEEE Interna-

tional Conference on, pages 1817–1824. IEEE.

Rusu, R., Blodow, N., and Beetz, M. (2009). Fast point fea-

ture histograms (fpfh) for 3d registration. In Robotics

ICINCO2015-12thInternationalConferenceonInformaticsinControl,AutomationandRobotics

384

and Automation, 2009. ICRA ’09. IEEE International

Conference on, pages 3212–3217.

Rusu, R. B., Blodow, N., Marton, Z. C., and Beetz, M.

(2008). Aligning point cloud views using persistent

feature histograms. In Intelligent Robots and Systems,

2008. IROS 2008. IEEE/RSJ International Conference

on, pages 3384–3391. IEEE.

Salti, S., Tombari, F., and Stefano, L. D. (2011). A per-

formance evaluation of 3d keypoint detectors. In

3D Imaging, Modeling, Processing, Visualization and

Transmission (3DIMPVT), 2011 International Con-

ference on, pages 236–243. IEEE.

Stein, F. and Medioni, G. (1992). Structural indexing:

efﬁcient 3-d object recognition. IEEE Trans. PAM,

14:125–145.

Sutton, R. S. and Barto, A. G. (1998). Reinforcement learn-

ing: An introduction, volume 1. Cambridge Univ

Press.

Toldo, R., Castellani, U., and Fusiello, A. (2010). The bag

of words approach for retrieval and categorization of

3d objects. The Visual Computer, 26(10):1257–1268.

Tombari, F., Salti, S., and Di Stefano, L. (2010a). Unique

shape context for 3d data description. In Proceedings

of the ACM workshop on 3D object retrieval, pages

57–62. ACM.

Tombari, F., Salti, S., and Di Stefano, L. (2010b). Unique

signatures of histograms for local surface descrip-

tion. In Computer Vision-ECCV 2010, pages 356–369.

Springer.

Watkins, C. and Dayan, P. (1992). Technical note: Q-

learning. Machine Learning, 8(3-4):279–292.

Watkins, C. J. C. H. (1989). Learning from Delayed Re-

wards. PhD thesis, King’s College, Cambridge, UK.

Wiering, M. and Van Otterlo, M. (2012). Reinforcement

learning. In Adaptation, Learning, and Optimization,

volume 12. Springer.

Wu, C.-C. and Lin, S.-F. (2011). Efﬁcient model detection

in point cloud data based on bag of words classiﬁca-

tion. Journal of Computational Information Systems,

7(12):4170–4177.

Zhong, Y. (2009). Intrinsic shape signatures: A shape de-

scriptor for 3d object recognition. In Computer Vision

Workshops (ICCV Workshops), 2009 IEEE 12th Inter-

national Conference on, pages 689–696. IEEE.

Adaptive3-DObjectClassificationwithReinforcementLearning

385