point p
j
and the estimated point q
j
. The RMS dis-
tance err is computed as:
err =
v
u
u
t
1
4
4
∑
j=1
∥p
j
− q
j
∥
2
(3)
As we regard the higher RMS error as a sign to de-
tect algorithm failure, we remove the pixels where
err is greater than 10. We firstly used that preci-
sion evaluation to find the best values for our differ-
ent parameters. Then, we compared our results with
the ones of several different local features. Finally,
we showed the difference in performances of the pro-
posed method and a patch classification method.
5.1 Parameter Selection
We can change three main parameters for viewpoint
generative learning. One is the number of generated
patterns W . Because the rotation ranges are set to
ϕ, θ ∈ [−75
◦
, 75
◦
], the interval angle affects the num-
ber of patterns W that we will have to generate. For
example, the combinations of interval angle
ϕ
, inter-
val angle θ and W are (30
◦
, 30
◦
, 36), (25
◦
, 25
◦
, 49),
(15
◦
, 15
◦
, 121), (10
◦
, 10
◦
, 256) and (5
◦
, 5
◦
, 961). Of
course the interval angles ϕ and θ are respectively
changed, so that we can test 25 combinations.
Another is the number of stable keypoints N. The
other is the number of clusters K for k-means++. To
compare performance variation, we evaluated preci-
sion and learning time by using Graffiti and Wall se-
quences with SIFT. In particular, the performances in
terms of precision and computation time in case of
large viewpoint change are a really important point.
Thus, for deciding these three parameters, the preci-
sion is performed about Nos. 3, 4 and 5 views to give
more impact to the case of large viewpoint change.
Then because SIFT could detect 4654 keypoints on
the reference image of Graffiti and 3393 keypoints on
the reference image of Wall, we changed the number
of stable keypoint from 500 to 6500. The learning
time is measured from inputting the reference image
to the end of the creation of the database of feature
descriptors. All the experiments work on Intel Core
i3 3.07 GHz CPU, 2.92 GB RAM and GeForce 310
589MHz GPU.
The results are shown in Figure 3 and Table 1. The
precision changes for clusters and stable keypoints are
presented in Figure 3(a) and Figure 3(b). The more
clusters there are, the greater the precision is. How-
ever, in the case of more than seven clusters, there is
little or no increase in the precision. More stable key-
points also leads to higher precision but in the case
of over the half number of the reference keypoints,
the precision has little increasing. Incidentally, this
relationship does not affect learning time thanks to k-
means quickness. On the other hand, the number of
generated patterns directly affects the learning time
(Table 1). To evaluate this in an integrated way, we
use the precision average of the viewpoint change se-
quences. According to Figure 3(c), even if the number
of pattern increases, this does not mean that the accu-
racy improves. To be fair trade-off between accuracy
and learning speed, we have accepted the following
parameter: 7 for k-means K, 30
◦
for interval angle ϕ
and 25
◦
for interval angle θ. As a result, we generate
42 patterns that take 14 seconds about Graffiti or 25
seconds about Wall.
Table 1: The learning time (sec.) by changing the number
of patterns.
Number of patterns 36 49 121 256 961
Learning time (Graf) 12 16 49 118 611
Learning time (Wall) 21 30 92 241 1680
As described in Sec. 3.3, the number of features
in the database is represented as N ×K. When N × K
is lower than the number of the reference keypoints,
matching run-time is equal to or greater than a default
use. Therefore, for real-time requirement without ac-
curacy degradation, we should make consideration of
the parameter N and K.
5.2 Test Local Features
The proposed viewpoint generative learning (VGL)
can be adopted for any local features includ-
ing keypoint detector and descriptor. We used
SIFT implemented in SiftGPU (www.cs.unc.edu/
∼ccwu/siftgpu/), SURF and CenSurE (STAR) im-
plemented in OpenCV (opencv.willowgarage.com),
M-SURF (Modified-SURF) descriptor implemented
in OpenSURF (www.chrisevansdev.com/computer-
vision-opensurf.html) for combination use with Cen-
SurE, Harris/Hessian - Laplacian and Harris/Hessian
- Affine with GLOH we used the implementation pro-
posed by the dataset provider (www.robots.ox.ac.uk/
∼
vgg/research/affine/). To compare fairly, the same
parameters are set to extract these local features for all
the different approaches. And also we use the same
number of stable keypoints as the number of refer-
ence keypoints. The setting of the other parameters
for learning is performed as described in Sec. 5.1.
Figure 4 shows the precision and Table 2 shows
the RMS error. The higher the precision is the bet-
ter; the lower the RMS error is the better. Compar-
ing with default feature matching, VGL improves the
precision and achieves the accurate pose estimation
in most cases. In particular, in Nos. 4 and 5 Graffiti
StableKeypointRecognitionusingViewpointGenerativeLearning
313