Boosting 3D Shape Classiﬁcation with Global Veriﬁcation and

Redundancy-free Codebooks

Viktor Seib, Nick Theisen and Dietrich Paulus

Active Vision Group (AGAS), University of Koblenz-Landau, Universit

atsstr. 1, 56070 Koblenz, Germany

http://agas.uni-koblenz.de

Keywords:

Shape Classiﬁcation, Global Veriﬁcation, Mobile Robotics, Implicit Shape Models, Point Clouds, Codebooks.

Abstract:

We present a competitive approach for 3D data classiﬁcation that is related to Implicit Shape Models and

Naive-Bayes Nearest Neighbor algorithms. Based on this approach we investigate methods to reduce the

amount of data stored in the extracted codebook with the goal to eliminate redundant and ambiguous feature

descriptors. The codebook is signiﬁcantly reduced in size and is combined with a novel global veriﬁcation

approach. We evaluate our algorithms on typical 3D data benchmarks and achieve competitive results despite

the reduced codebook. The presented algorithm can be run efﬁciently on a mobile computer making it suitable

for mobile robotics applications. The source code of the developed methods is made publicly available to

contribute to point cloud processing, the Point Cloud Library (PCL) and 3D classiﬁcation software in general.

1 INTRODUCTION

Current research for object classiﬁcation and de-

tection focuses on deep neural networks for 2D image

data (Lin et al., 2017), (He et al., 2017). However,

affordable 3D sensors increase the demand for 3D

data processing. Consequently, approaches exploiting

depth data from RGBD-cameras have been proposed

(Eitel et al., 2015), (Zia et al., 2017). Neural networks

using volumetric (Maturana and Scherer, 2015), (Wu

et al., 2015), (Garcia-Garcia et al., 2016) or point

cloud data (Qi et al., 2017) for object classiﬁcation

are still rare. PointNet

(Qi et al., 2017) is currently

one of the best approaches in that area.

Our research is well informed about the advan-

ces achieved in the ﬁeld of convolutional neural net-

works for object classiﬁcation. In this work we ad-

here to a classic approach without the application of

neural networks. This has certain beneﬁts. The trai-

ning phase takes signiﬁcantly less time and computa-

tional resources. Further, the selection of parameters

for training is straight forward and does not require

time consuming tuning in a trial-and-error fashion.

Finally, the trained model runs efﬁciently on a mo-

bile computer which makes it well-suited for mobile

robotics applications. One downside, is that neural

The algorithms presented in (Garcia-Garcia et al.,

2016) and (Qi et al., 2017) are both dubbed “PointNet”.

networks allow to use more training data without in-

creasing the model size. Among others, this shortco-

ming of codebook-based approaches is addressed in

this work.

The Point Cloud Library (PCL)

addresses the de-

mand for 3D data processing by providing a frame-

work with a standardized data format and many al-

gorithms. Further, the PCL offers a complete proces-

sing pipeline for 3D object recognition and provides

an adaption of the well-known Implicit Shape Model

(ISM) approach (Leibe et al., 2004) to 3D data. The

3D variant of ISM constructs a geometric alphabet of

shape appearances, the codebook, rather than a visual

alphabet of 2D image patches (Leibe et al., 2004).

We present several methods to estimate a descrip-

tor’s relevance during training to obtain a more des-

criptive codebook and omit redundancies. Our se-

cond contribution is a global veriﬁcation approach

that boosts the classiﬁcation performance. Finally, as

a third contribution, the source code of our contribu-

tions is made publicly available

. Our algorithm is

competitive with standard approaches for 3D object

classiﬁcation on commonly used datasets. By provi-

ding the source code we hope to make a valuable con-

tribution to open-source 3D classiﬁcation software.

Point Cloud Library: http://pointclouds.org/

Code, documentation and examples available at https:

//github.com/vseib/PointCloudDonkey

Seib, V., Theisen, N. and Paulus, D.

Boosting 3D Shape Classiﬁcation with Global Veriﬁcation and Redundancy-free Codebooks.

DOI: 10.5220/0007312402570264

In Proceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2019), pages 257-264

ISBN: 978-989-758-354-4

257

Figure 1: The pipeline used in this work. Blue steps are common for both, training (orange) and classiﬁcation (green). The

contributions of this work are marked with a red star. The maneki-neko (lucky cat) model is intellectual property of user bs3

(downloaded from https://www.thingiverse.com/thing:923097).

Section 2 presents related work on codebook re-

duction. Our point cloud processing pipeline is pre-

sented in Section 3 with the contributions for co-

debook cleaning (Section 4) and global veriﬁcation

(Section 5). Section 6 presents and discusses an ex-

tensive evaluation of the proposed algorithms, while

Section 7 concludes the paper.

2 RELATED WORK

A common technique to reduce the size of the co-

debook is vector quantization. However, this is also

one of the main reasons for the inferior performance

of Nearest-Neighbor-based methods (Boiman et al.,

2008). Consequently, feature clustering is omitted by

approaches using 3D data (Salti et al., 2010), (Tom-

bari and Di Stefano, 2010) and some Naive-Bayes Ne-

arest Neighbor methods (McCann and Lowe, 2012).

The ISM algorithm contained in the PCL (Knopp

et al., 2010) also uses vector quantization to reduce

the codebook size. Other approaches argue against

verctor quantization (Salti et al., 2010), (Tombari and

Di Stefano, 2010), (Seib et al., 2015). The latter rea-

dapts the continuous hough-space of the original ISM

algorithm to 3D data in contrast to the discrete voting

spaces of other ISM adaptations.

Other ways of handling ambiguous features is an

optimization step during training that assigns weig-

hts to individual features (Liu et al., 2015), (McCann

and Lowe, 2012). These approaches improve classi-

ﬁer performance in their respective domains. Howe-

ver, they do not aim at cleaning out feature descriptors

to reduce the codebook size.

Due to the limitations of clustered codebooks a

random feature selection was proposed (Cui et al.,

2015). Surprisingly, in some cases a randomly redu-

ced codebook performs even better than the complete

codebook. These experiments show that some of the

features are less descriptive than others. We are thus

interested in ﬁnding an approach that can judge the fe-

atures and maintain only the strong descriptors, while

the weak or ambiguous ones are discarded.

Alternatively, the size of the codebook can be re-

duced by reducing the entry size instead of reducing

the number of entries. Recently, (Prakhya et al., 2015)

proposed to convert SHOT into a binary descriptor,

B-SHOT, to reduce its memory requirements. Furt-

her, recent advances in deep learning allow to le-

arn compact feature descriptors for 3D data (Khoury

et al., 2017), (Schmidt et al., 2017). In (Khoury et al.,

2017) Compact Geometric Features (CGF) are propo-

sed that outperform hand-crafted features (including

SHOT) in scan registration. Taking these recent rese-

arch into account we will compare our contributions

with the B-SHOT and CGF descriptors.

3 PIPELINE DESCRIPTION

For our own approach we re-implement the complete

ISM pipeline (Figure 1) using the PCL. We take inspi-

VISAPP 2019 - 14th International Conference on Computer Vision Theory and Applications

258

Figure 2: All features serve as input to the codebook cleaning pipeline. A set of nearest neighbors is found for each feature

and their scores are updated (see text). All features are ranked and trimmed based on their score to obtain a clean codebook.

ration from Naive-Bayes Nearest Neighbor (NBNN)

(McCann and Lowe, 2012), (Boiman et al., 2008) and

combine it with the localized hough-voting scheme

of ISM (Seib et al., 2015), (Tombari and Di Ste-

fano, 2010). The design choices we make for each

of the pipeline steps are shown in Figure 1. We ex-

tract keypoints on a dense grid and compute SHOT

descriptors for the keypoint locations. From all des-

criptors of the PCL, SHOT performed best in terms of

accuracy and speed in our experiments. In the training

pipeline (orange) we omit clustering and retain all fe-

atures. The resulting codebook is formed by a strict k-

NN activation with k = 1, i.e. every feature only votes

for itself. In the classiﬁcation pipeline (green) we ef-

ﬁciently match descriptors using FLANN and choose

a continuous voting space. Finally, a mean-shift max-

ima search yields object hypotheses. The described

pipeline (without the steps marked with red stars in

Figure 1) serves as a baseline for the evaluation of the

contributions proposed here.

4 CODEBOOK CLEANING

We present two algorithms to judge the relevance of a

feature in this section. The two presented algorithms

share a common idea: they compute a score for each

feature and rank the features according to that score.

The score updates are applied to the k nearest neig-

hbors of a query feature

instead of the feature itself.

Therefore, the resulting score value c of each feature

f is a sum of individual update values

c =

∑

. (1)

The number of update values m varies from feature to

feature as it indicates how often feature

f was within

the k nearest neighbors of all query features. Finally,

a ﬁxed ratio of features will be selected from the ran-

ked list. The process of codebook cleaning is applied

during training and is shown in Figure 2.

4.1 Incremental Ranking

McCann and Lowe (McCann and Lowe, 2012) pro-

pose a classiﬁcation rule that updates the posterior

probability of a class by an increment derived from

the k nearest neighbors of a query feature

. For

the purpose of codebook cleaning we take inspiration

from McCann and Lowe and deﬁne the Incremental

ranking, with the score c

inc

. The idea is to derive a fe-

ature increment instead of a class increment that com-

pares a feature’s distance with a background distance.

Our experiments have shown that an individual back-

ground distance d

per feature yields best results and

is less dependent on the choice of k.

Given a query feature

and its k +1 nearest neig-

hbors

, j ∈ {1,...,k,k + 1} we update the coefﬁ-

cients of the ﬁrst k neighbors by

= d

− d

= k

−

− k

−

j+1

. (2)

4.2 KNN-Activation Ranking

The KNN-Activation ranking simulates the classiﬁ-

cation process during training. The updates v

for the

corresponding score c

are deﬁned in various ways.

We deﬁne a base update value u = 1. In the simplest

case we update the score of the k nearest neighbors

, j ∈ {1, . . .,k} of a query feature

= u . (3)

The second score increment uses the descriptor dis-

tance d = k

−

between the query feature

and

its j-th neighbor

and is deﬁned as

d + 1

. (4)

Thereby, the similarity of the features is considered in

the update. The update becomes bigger with an in-

creasing descriptor similarity (favoring similar featu-

res). Another possibility is to make the updates smal-

ler with an increasing descriptor similarity (favoring

unique features):

= u · exp(d) . (5)

Boosting 3D Shape Classiﬁcation with Global Veriﬁcation and Redundancy-free Codebooks

259

Figure 3: The global veriﬁcation step takes the maxima found by the local classiﬁer as input. A global feature is computed

for the maxima locations. The global features are classiﬁed and merged to form the ﬁnal object hypothesis.

Table 1: Assignment of indexed names to the investiga-

ted variants of the KNN-Activation Coefﬁcient c

from

Section 4.2.

name without keypoint position (u = 1)

ka1

1 Eq. 3

ka2

d+1

Eq. 4

ka3

1 · exp(d) Eq. 5

name with keypoint pos. (u from Eq. 6)

ka4

u Eq. 3

ka5

d+1

Eq. 4

ka6

u · exp(d) Eq. 5

So far, only the descriptor distance d (feature si-

milarity) was taken into account. We further redeﬁne

the base update value u = 1 as

u = exp(|c

− c

|) (6)

for all of the above equations. The center distances c

and c

denote the distance of a keypoint of a feature

and

to the object’s centroid. Thereby, also the

relative position of a feature’s keypoint is considered

in the score increment. To be able to easily refer to

the different variants of the c

scores deﬁned here we

summarize them in Table 1 and assign indexed names.

5 GLOBAL VERIFICATION

While local features provide good robustness against

noise and clutter they still produce wrong maxima.

This effect is handled by including a veriﬁcation step

into the object recognition pipelines (Aldoma et al.,

2012), (Maji and Malik, 2009).

We propose a different veriﬁcation strategy,

shown in Figure 3. Our hypothesis veriﬁcation is ba-

sed on a classiﬁer that takes global features as input

(in the following named global classiﬁer, as apposed

to the local classiﬁer based on SHOT features). The

global classiﬁer computes an object hypothesis inde-

pendently from the hypotheses of the local classiﬁer.

In a second step, all local and global hypotheses are

fused. The motivation behind this approach is to eli-

minate wrong maxima and strengthen maxima where

the global classiﬁer supports the classiﬁcation hypot-

hesis.

We distinguish two use cases for our global veri-

ﬁcation approach. In the ﬁrst use case the input data

contains a single object (e.g. a previously segmen-

ted object with known location, but unknown class

label). In this case a global feature descriptor is com-

puted on the input. This (single) result is merged with

all hypotheses from the local classiﬁer. In the second

use case the input data contains an unknown number

of objects and possibly some clutter. The locations

of the maxim are used to extract a partial point cloud

for each maximum and compute a global feature des-

criptor. Each of the classiﬁcation results of the global

classiﬁer is merged with the corresponding maximum

from the local classiﬁer to obtain the ﬁnal object label.

Despite the global feature descriptors in the PCL

our own experiments have shown a better perfor-

mance when using an adaption of SHOT to a glo-

bal scale. We train a two-class SVM in a “one-vs.-

all”-fashion for each of the classes with the global

descriptors. The classiﬁcation output is a class label

and score s ∈ [0,1] representing the probability for the

found label.

The maxima from the local classiﬁer and the cor-

responding classiﬁcation results from the global clas-

siﬁer are merged to obtain the ﬁnal classiﬁcation re-

sult and weight. The merging is based on the local

and global labels l

∈ N, as well as on the local and

global weight w

∈ R of each maximum. Additio-

nally, the highest local weight ˆw

∈ R is used when

classifying isolated objects. The merging function

: N

× R

→ N × R that takes both labels, both

weights and the overall highest local weight as inputs

and outputs the ﬁnal classiﬁcation label and weight.

We test multiple deﬁnitions of this function f

our evaluation. In the ﬁrst use case the global clas-

siﬁcation result is the same for all maxima obtained

from the local classiﬁer. We search for the global la-

bel among the top-ranked maxima from the local clas-

siﬁer. If the global label is among the top results and

VISAPP 2019 - 14th International Conference on Computer Vision Theory and Applications

260

Figure 4: Example objects contained in the datasets used

for evaluation.

has a high weight, it is considered the true label:

(·) =

(

, ˆw

+ ε if w

> t

· ˆw

∧ l

= l

otherwise.

(7)

In this case t

∈ [0, 1] is the rate threshold determining

which local maxima are considered to be top results.

The function f

upweights a maximum by adding a

small increment ε to the highest available weight.

In the general case of an unknown number of ob-

jects in the input data, the preferred solution is to

merge the classiﬁcation results of the local and the

global classiﬁer per maximum. An individual global

classiﬁcation is carried out for each of the maxima

from the local classiﬁer. The corresponding maxima

will be upvoted using f

(·) =

(

· c

if l

= l

otherwise.

(8)

Function f

uses a ﬁxed constant factor c

> 1 to

emphasize maxima where both classiﬁers agree on a

label. Further, we evaluate the merging function f

which is similar to f

, but is parameter free:

(·) =

(

· (1 + w

) if l

= l

otherwise.

(9)

If both classiﬁers agree on the label, the resulting

weight is determined based on the global weight. By

adding the constant 1 we ensure to never downweight

a maximum from the local classiﬁer.

6 EVALUATION AND RESULTS

We evaluate our approach on datasets used to bench-

mark 3D classiﬁcation and shape retrieval algorithms.

We convert the object meshes of these datasets to

point clouds and scale each model to the unit circle

for classiﬁcation. Example objects are shown in Fi-

gure 4.

Aim@Shape-Watertight (ASW) (Giorgi et al.,

2007) 20 object classes, 200 object for training

and for testing.

McGill Dataset (MCG)

19 classes with articulated

objects, 234 for training and 223 for testing.

McGill: www.cim.mcgill.ca/~shape/benchMark/

Princeton Shape Benchmark (PSB)

7 classes with

907 objects for training and for testing.

Shrec-12 (SH12)

shape retrieval benchmark 60

classes, 600 objects for training and for testing.

ModelNet (MN40 and MN10) (Wu et al., 2015):

Full dataset (MN40, 40 classes) and its subset

(MN10, 10 classes) used to benchmark neural net-

works for shape classiﬁcation. MN40: 9843 ob-

jects for training, 2468 for testing. MN10: 3991

objects for training and 908 for testing. We use

only a subset of the training data. Contrary to ot-

her datasets, the standard metric for these two da-

tasets is average per class accuracy and will be

reported accordingly in all tables.

We ﬁrst compare our algorithm to other approa-

ches to establish a baseline for further evaluations.

In particular, we compare our work to (McCann and

Lowe, 2012) and (Seib et al., 2015) because of its si-

milarity to the default pipeline and to (Knopp et al.,

2010) as it is the ISM implementation in the PCL. Ad-

ditionally, we compare with the approach of Ganihar

et al.(Ganihar et al., 2014), the descriptor B-SHOT

(Prakhya et al., 2015) and the deep learned feature

CGF (Khoury et al., 2017). In the latter two cases we

use the code provided by the authors of the approa-

ches. The authors of CGF provide different trained

models on two distinct datasets. We report results for

the best performing of these models in our pipeline

(CGF descriptor with 40 dimensions trained on laser

scan data). Finally, we compare our algorithm with

recent neural networks for shape classiﬁcation on the

ModelNet dataset.

The baseline comparison with non deep learning

approaches is presented in Table 2. The classiﬁca-

tion results reported for (McCann and Lowe, 2012)

are based on our own implementation. Our base pi-

peline performs best on two datasets. We observe that

the SHOT descriptor is a good choice since it outper-

forms the B-SHOT and CGF descriptors in the base

pipeline. CGF performs well for scan registration as

reported in (Khoury et al., 2017), however, it is not

descriptive enough for shape classiﬁcation.

6.1 Reduction of Codebook Size

The following evaluation is carried out with the two

datasets ASW (rigid shapes) and MCG (articulated

shapes) to ﬁnd best hyperparameters. Our propo-

sed codebook size reduction methods are additio-

nally compared to a random codebook generation

PSB: shape.cs.princeton.edu/benchmark/

Shrec-12: www.itl.nist.gov/iad/vug/sharp/contest/

2012/Generic3D/

Boosting 3D Shape Classiﬁcation with Global Veriﬁcation and Redundancy-free Codebooks

261

Table 2: Comparison of our baseline results with approaches in literature. We report the overall accuracy for ASW, MCG,

PSB and SH12 and the average per class accuracy for MN10 and MN40. Best result per dataset is shown in bold.

Dataset McCann and Lowe Seib et al. Knopp et al. (PCL) Ganihar et al.

(McCann and Lowe, 2012) (Seib et al., 2015) (Knopp et al., 2010) (Ganihar et al., 2014)

ASW 87.0 85.0 - -

MCG 82.5 - - -

PSB 66.6 61.6 58.3 67.9

SH12 73.2 - - 66.4

Dataset this work (base pipeline) with different feature descriptors

CGF (Khoury et al., 2017) B-SHOT (Prakhya et al., 2015) SHOT (Tombari et al., 2010)

ASW 80.5 87.0 90.0

MCG 73.1 78.9 85.2

PSB 58.7 62.0 67.0

SH12 58.3 64.3 70.2

MN10 - - 62.4

MN40 - - 71.9

Table 3: Comparison of the proposed ranking methods for

codebook cleaning and the baseline. The results obtained

refer to a codebook size of 75% compared to the baseline.

Coefﬁcient ASW MCG

own baseline 90 85.2

inc

89 83.9

ka2

91 84.3

ka4

89 83.0

random average 88.9 83.7

Table 4: Comparison of merging functions for global veri-

ﬁcation. Values in brackets indicate the applied parameter

values. All results are better than the baseline.

Merging function ASW MCG

(0.7) 91.5 86.6

(2.0) 92 86.6

(-) 91 86.6

(Cui et al., 2015). We randomly select 75% of all

features to compare with our codebook cleaning met-

hod. The random selection was run 100 times and the

average results are reported.

Table 3 reports the results on codebook cleaning

using the proposed ranking methods (for clarity, we

only report the best and the worst result for c

). The

results were obtained by taking the best 75% of featu-

res according to their computed ranking. A codebook

reduced with our approach hardly looses descriptive-

ness and the results almost stay the same as the base-

line. The loss for the MCG dataset is slightly higher,

since this dataset contains articulated objects.

6.2 Global Veriﬁcation

We have introduced three hypothesis merging functi-

ons for the purpose of global veriﬁcation. Two of

Table 5: Combination of global veriﬁcation and codebook

cleaning (reduction to 75%). Bold typeset indicates results

that are equal or better than our baseline.

Dataset c

ka3

+ f

(2.0) c

ka5

+ f

(2.0)

ASW 93.0 93.0

MCG 86.1 84.3

PSB 67.8 67.1

SH12 71.7 73.8

MN10 - 83.8

MN40 - 75.4

these functions depend on a parameter, while the third

is a parameter-free approach. However, our experi-

ments have shown that the choice of the parameter (in

a reasonable range) hardly effects the classiﬁcation

results. The results of the global classiﬁer combined

with the base pipeline (without codebook reduction)

are reported in Table 4. Adding global features and

the proposed merging functions to the baseline classi-

ﬁer improves the classiﬁcation by about 2% for ASW

(92%) and about 1% for the MCG (86%) dataset. The

choice of the merging function (and its parameters)

has only little inﬂuence on the result.

6.3 Combining Codebook Reduction

and Global Veriﬁcation

In this evaluation we test all datasets and reduce their

codebooks to 75%. For clarity, we can not report re-

sults for all ranking and merging functions. However,

we observe that the ranking not using the descriptor

similarity (c

ka1

and c

ka4

, Equation 3) perform worse

throughout all experiments. Whether the keypoint po-

sition (Equation 6) is used or not has a less signiﬁcant

impact. The two ranking methods favoring similar fe-

VISAPP 2019 - 14th International Conference on Computer Vision Theory and Applications

262

Table 7: Comparison of the ﬁnal evaluation results. The proposed contributions overcome the baseline on all datasets despite

the reduction of the codebook data. Bold typeset indicates results better than state of the art and the baseline. We report the

overall accuracy for ASW, MCG, PSB and SH12 and the average per class accuracy for MN10 and MN40.

best results in this work this work (contributions)

Dataset related work (own baseline) codebook size

100% 75% 50% 25%

ASW 87.0 (McCann and Lowe, 2012) 90.0 92.0 93.0 93.5 92.0

MCG 82.5 (McCann and Lowe, 2012) 85.2 86.6 86.1 85.6 83.0

PSB 67.9 (Ganihar et al., 2014) 67.0 68.4 67.8 67.9 65.3

SH12 73.2 (McCann and Lowe, 2012) 70.2 74.5 73.8 70.8 65.5

MN10 92.0 (Maturana and Scherer, 2015) 62.4 67.3 83.8 83.1 81.4

MN40 86.2 (Qi et al., 2017) 71.9 76.2 75.4 74.4 72.5

Table 6: Comparison of our proposed contributions with

deep learning approaches on the MN10 and MN40 datasets.

MN10 MN40

this work (codebook size 75% 83.8 75.4

+ global veriﬁcation)

PointNet 76.7 -

(Garcia-Garcia et al., 2016)

ShapeNets (Wu et al., 2015) 83.5 77.3

VoxNet 92.0 83.0

(Maturana and Scherer, 2015)

PointNet (Qi et al., 2017) - 86.2

atures (c

ka2

and c

ka5

, Equation 4) are better for small

codebook reductions. On the other hand, if the co-

debook is reduced by a great extent, ranking methods

favoring unique features (c

ka3

and c

ka6

, Equation 5)

perform best. For a better overview we only show

the results of two coefﬁcients, both combined with the

merging function f

(Equation 8) in Table 5. We ob-

serve that the combination of our proposed codebook

cleaning technique combined with the global veriﬁca-

tion approach retains a high descriptiveness and over-

all good classiﬁcation results. In fact, the codebooks

with 75% of all features still performs better then the

established baseline in almost all cases.

Table 6 compares our proposed pipeline with re-

cent deep learning approaches on the MN10 and

MN40 datasets. Note that our pipeline surpasses two

of the deep learning approaches on the MN10 dataset.

However, is is also far behind the leading approaches

VoxNet (Maturana and Scherer, 2015) and PointNet

(Qi et al., 2017).

Finally, Table 7 summarizes all evaluation results.

Our contributions overcome the state of the art on

classic (non deep learning) approaches on the four

smaller datasets. This holds for most cases, even if

the size of the codebook is signiﬁcantly reduced. This

supports our hypothesis that our codebook cleaning

successfully eliminates less descriptive and ambigu-

ous features. The more recent datasets MN10 and

MN40 are usually applied to benchmark deep lear-

ning shape classiﬁcation approaches. Although our

contributions signiﬁcantly improve over the baseline,

some deep learning approaches are still better. Howe-

ver, Table 6 shows that our classic approach is com-

petitive with some of the deep learning pipelines.

The average classiﬁcation times per object for

our baseline method were measured on a six years

old notebook with an Intel Core i7-2760QM CPU @

2.40GHz and are on average around 0.4 s for the ASW

and MCG datasets, around 1 s for the PSB and the

SH12 datasets and around 2 to 5 s on the large MN40

dataset (depending on codebook size). The classiﬁ-

cation times increase by about 0.2 s due to the SVM

classiﬁcation if the described contributions are app-

lied additionally. We therefore consider our approach

suitable for mobile robotics applications.

7 CONCLUSION

The combination of the presented algorithms allows

to signiﬁcantly reduce the amount of data used for

codebook construction. The merging functions for

global features mostly provide stable results, even if

the parameters are varied. Despite the reduced co-

debook size, the classiﬁcation performance is impro-

ved for all datasets compared to the baseline (Table 7)

and overcomes the state of the art of classic appro-

aches and is competitive with some deep learning

methods. The proposed codebook reduction method

is computed during training and is computationally

light-weight. We consider our methods and the pro-

vided open-source software as a valuable contribution

to the 3D vision community. In future work we will

investigate methods to bridge the gap between classic

and deep learning approaches.

Boosting 3D Shape Classiﬁcation with Global Veriﬁcation and Redundancy-free Codebooks

263

REFERENCES

Aldoma, A., Marton, Z.-C., Tombari, F., Wohlkinger, W.,

Potthast, C., Zeisl, B., Rusu, R. B., Gedikli, S., and

Vincze, M. (2012). Tutorial: Point cloud library:

Three-dimensional object recognition and 6 dof pose

estimation. IEEE Robotics & Automation Magazine,

19(3):80–91.

Boiman, O., Shechtman, E., and Irani, M. (2008). In de-

fense of nearest-neighbor based image classiﬁcation.

In Computer Vision and Pattern Recognition, 2008.

CVPR 2008. IEEE Conf. on, pages 1–8. IEEE.

Cui, S., Schwarz, G., and Datcu, M. (2015). Image clas-

siﬁcation: No features, no clustering. In Image Pro-

cessing (ICIP), 2015 IEEE Int. Conf. on, pages 1960–

1964. IEEE.

Eitel, A., Springenberg, J. T., Spinello, L., Riedmiller, M.,

and Burgard, W. (2015). Multimodal deep learning

for robust rgb-d object recognition. In Intelligent Ro-

bots and Systems (IROS), 2015 IEEE/RSJ Int. Conf.

on, pages 681–687. IEEE.

Ganihar, S. A., Joshi, S., Setty, S., and Mudenagudi, U.

(2014). Metric tensor and christoffel symbols based

3d object categorization. In Asian Conf. on Computer

Vision, pages 138–151. Springer.

Garcia-Garcia, A., Gomez-Donoso, F., Garcia-Rodriguez,

J., Orts-Escolano, S., Cazorla, M., and Azorin-Lopez,

J. (2016). Pointnet: A 3d convolutional neural net-

work for real-time object class recognition. In Neural

Networks (IJCNN), 2016 International Joint Conf. on,

pages 1578–1584. IEEE.

Giorgi, D., Biasotti, S., and Paraboschi, L. (2007). Shape re-

trieval contest 2007: Watertight models track. SHREC

competition, 8(7).

He, K., Gkioxari, G., Doll

ar, P., and Girshick, R. (2017).

Mask r-cnn. In Computer Vision (ICCV), 2017 IEEE

Int. Conf. on, pages 2980–2988. IEEE.

Khoury, M., Zhou, Q.-Y., and Koltun, V. (2017). Le-

arning compact geometric features. arXiv preprint

arXiv:1709.05056.

Knopp, J., Prasad, M., Willems, G., Timofte, R., and

Van Gool, L. (2010). Hough transform and 3d surf

for robust three dimensional classiﬁcation. In ECCV

(6), pages 589–602.

Leibe, B., Leonardis, A., and Schiele, B. (2004). Combined

object categorization and segmentation with an impli-

cit shape model. In ECCV’ 04 Workshop on Statistical

Learning in Computer Vision, pages 17–32.

Lin, T.-Y., Goyal, P., Girshick, R., He, K., and Dollar, P.

(2017). Focal loss for dense object detection. In 2017

IEEE Int. Conf. on Computer Vision (ICCV), pages

2999–3007. IEEE.

Liu, Q., Puthenputhussery, A., and Liu, C. (2015). Novel

general knn classiﬁer and general nearest mean clas-

siﬁer for visual classiﬁcation. In Image Processing

(ICIP), 2015 IEEE Int. Conf. on, pages 1810–1814.

IEEE.

Maji, S. and Malik, J. (2009). Object detection using a max-

margin hough transform. In Computer Vision and Pat-

tern Recognition, 2009. CVPR 2009. IEEE Conf. on,

pages 1038–1045. IEEE.

Maturana, D. and Scherer, S. (2015). Voxnet: A 3d convo-

lutional neural network for real-time object recogni-

tion. In Intelligent Robots and Systems (IROS), 2015

IEEE/RSJ Int. Conf. on, pages 922–928. IEEE.

McCann, S. and Lowe, D. G. (2012). Local naive bayes

nearest neighbor for image classiﬁcation. In Computer

Vision and Pattern Recognition (CVPR), 2012 IEEE

Conf. on, pages 3650–3656. IEEE.

Prakhya, S. M., Liu, B., and Lin, W. (2015). B-shot: A

binary feature descriptor for fast and efﬁcient keypoint

matching on 3d point clouds. In Intelligent Robots and

Systems (IROS), 2015 IEEE/RSJ Int. Conf. on, pages

1929–1934. IEEE.

Qi, C. R., Su, H., Mo, K., and Guibas, L. J. (2017). Point-

net: Deep learning on point sets for 3d classiﬁcation

and segmentation. Proc. Computer Vision and Pattern

Recognition (CVPR), IEEE, 1(2):4.

Salti, S., Tombari, F., and Di Stefano, L. (2010). On the

use of implicit shape models for recognition of object

categories in 3d data. In ACCV (3), Lecture Notes in

Computer Science, pages 653–666.

Schmidt, T., Newcombe, R., and Fox, D. (2017). Self-

supervised visual descriptor learning for dense corre-

spondence. IEEE Robotics and Automation Letters,

2(2):420–427.

Seib, V., Link, N., and Paulus, D. (2015). Pose estimation

and shape retrieval with hough voting in a continuous

voting space. In German Conf. on Pattern Recogni-

tion, pages 458–469. Springer.

Tombari, F. and Di Stefano, L. (2010). Object recognition in

3d scenes with occlusions and clutter by hough voting.

In Image and Video Technology (PSIVT), 2010 Fourth

Paciﬁc-Rim Symposium on, pages 349–355. IEEE.

Tombari, F., Salti, S., and Di Stefano, L. (2010). Unique

signatures of histograms for local surface description.

In Proc. of the European Conf. on computer vision

(ECCV), ECCV’10, pages 356–369. Springer-Verlag.

Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X.,

and Xiao, J. (2015). 3d shapenets: A deep represen-

tation for volumetric shapes. In Proceedings of the

IEEE conference on computer vision and pattern re-

cognition, pages 1912–1920.

Zia, S., Y

uksel, B., Y

uret, D., and Yemez, Y. (2017). Rgb-

d object recognition using deep convolutional neural

networks. In 2017 IEEE Int. Conf. on Computer Vision

Workshops (ICCVW), pages 887–894. IEEE.

VISAPP 2019 - 14th International Conference on Computer Vision Theory and Applications

264