1). Each image of Lion was calibrated separately us-
ing our simple DLT+MLE method (Section 5) under
RANSAC. The mean and the standard deviation of the
calibration results of all the images are compared and
provided in Table 2. It is observed that, the RMS re-
projection error of our technique is smaller than that
of [Bouguet, 2008] in most of the cases. Because
we only take one image for calibration, our result is
tightly coupled with that particular image which is re-
flected with the low RMS error and moderate standard
deviation over all the images.
RMS Error Analysis. In the next experiment we
make an analysis over the RMS reprojection error
over images of various types, sizes and focal lengths
for all our modelling methods. For each model snap1,
snap2 and tmap, we calibrated all the images of Lion
from ‘query image dataset’ (more than 500 images
with width-resolution 640, 800, 1024, 1600 and 2040
pixels) and collected their RMS error. The distribu-
tion of RMS error is represented as histogram in Fig-
ure 4 and the mean RMS error is provided in Table
3. As shown, though the error from our method is
spreaded over a good spectrum because of the various
types of images used, our mean RMS error is small
which makes our method a quick and reliable tool for
camera calibration where an extreme accuracy is not
a concern.
7 CONCLUSION
We have presented methods for creating feature-
augmented models from CAD models for the purpose
of 6DOF object recognition and camera calibration.
The fully automatic procedure produces models that
are capable of being recognized in single image with
high accuracy with different flavours of online stage,
and as a natural marker for the purpose of camera cal-
ibration.
In the future we look forward to consider view de-
pendent global features (Eg. 2D shape context) to be
computed on our virtual snapshots in an attempt to
match them in query images. In this way we plan to
include geometric information along with the texture
in our trained models.
ACKNOWLEDGEMENTS
This work was partially funded by the BMBF project
DYNAMICS (01IW15003).
REFERENCES
3Digify (2015). 3digify, http://3digify.com/.
Aldoma, A., Tombari, F., Rusu, R., and Vincze, M. (2012).
Our-cvfh oriented, unique and repeatable clustered
viewpoint feature histogram for object recognition
and 6dof pose estimation. In Pinz, A., Pock, T.,
Bischof, H., and Leberl, F., editors, Pattern Recogni-
tion, volume 7476 of Lecture Notes in Computer Sci-
ence, pages 113–122. Springer Berlin Heidelberg.
Bouguet, J. Y. (2008). Camera cal-
ibration toolbox for Matlab,
http://www.vision.caltech.edu/bouguetj/calib doc/.
Collet Romea, A., Martinez Torres, M., and Srinivasa, S.
(2011). The moped framework: Object recognition
and pose estimation for manipulation. International
Journal of Robotics Research, 30(10):1284 – 1306.
Collet Romea, A. and Srinivasa, S. (2010). Efficient multi-
view object recognition and full pose estimation. In
2010 IEEE International Conference on Robotics and
Automation (ICRA 2010).
D’Apuzzo, N. (2006). Overview of 3d surface digitization
technologies in europe. In Electronic Imaging 2006,
pages 605605–605605. International Society for Op-
tics and Photonics.
Dementhon, D. and Davis, L. (1995). Model-based ob-
ject pose in 25 lines of code. International Journal
of Computer Vision, 15(1-2):123–141.
Hao, Q., Cai, R., Li, Z., Zhang, L., Pang, Y., Wu, F., and
Rui, Y. (2013). Efficient 2d-to-3d correspondence fil-
tering for scalable 3d object recognition. In Computer
Vision and Pattern Recognition (CVPR), 2013 IEEE
Conference on, pages 899–906.
Heikkila, J. and Silven, O. (1997). A four-step camera cal-
ibration procedure with implicit image correction. In
Computer Vision and Pattern Recognition, 1997. Pro-
ceedings., 1997 IEEE Computer Society Conference
on, pages 1106–1112.
Irschara, A., Zach, C., Frahm, J.-M., and Bischof, H.
(2009). From structure-from-motion point clouds to
fast location recognition. In Computer Vision and
Pattern Recognition, 2009. CVPR 2009. IEEE Con-
ference on, pages 2599–2606.
Lepetit, V., Moreno-Noguer, F., and Fua, P. (2009). Epnp:
An accurate o(n) solution to the pnp problem. Int. J.
Comput. Vision, 81(2):155–166.
Lowe, D. (2004). Distinctive image features from scale-
invariant keypoints. International Journal of Com-
puter Vision, 60(2):91–110.
Rusu, R., Bradski, G., Thibaux, R., and Hsu, J. (2010). Fast
3d recognition and pose using the viewpoint feature
histogram. In Intelligent Robots and Systems (IROS),
2010 IEEE/RSJ International Conference on, pages
2155–2162.
Skrypnyk, I. and Lowe, D. (2004). Scene modelling, recog-
nition and tracking with invariant image features. In
Mixed and Augmented Reality, 2004. ISMAR 2004.
Third IEEE and ACM International Symposium on,
pages 110–119.
Feature-augmented Trained Models for 6DOF Object Recognition and Camera Calibration
639