Figure 7: Our outdoor ground truth dataset comprises of a precise 3D model of the environment and over 45,000 camera
images with sensor readings and 6DoF ground truth poses. Exemplary images are shown as insets with their ground truth
poses rendered as frustra.
ACKNOWLEDGEMENTS
This work was supported in part by the German Fed-
eral Ministry of Education and Research (BMBF,
reference number 16SV5745, PASSAge) and the
German Federal Ministry of Economics and Tech-
nology (BMWi, reference number 01MS11020A,
CRUMBS). The authors further wish to thank Darko
Stanimirovi
´
c and Marion M
¨
arz for their help on the
ground truth dataset and FARO Europe for providing
us with the laser scans.
REFERENCES
Arth, C., Mulloni, A., and Schmalstieg, D. (2012). Exploit-
ing Sensors on Mobile Phones to Improve Wide-Area
Localization. In Proc. Int. Conf. on Pattern Recogni-
tion (ICPR).
Baatz, G., K
¨
oser, K., Chen, D., Grzeszczuk, R., and Polle-
feys, M. (2012). Leveraging 3d city models for rota-
tion invariant place-of-interest recognition. Int. Jour-
nal of Computer Vision (IJCV), 96(3):315–334.
Chittaro, L. and Burigat, S. (2005). Augmenting audio mes-
sages with visual directions in mobile guides: an eval-
uation of three approaches. In Proc. Int. Conf. on Hu-
man Computer Interaction with Mobile Devices and
Services (Mobile HCI).
Chum, O. and Matas, J. (2005). Matching with PROSAC -
Progressive Sample Consensus. In Proc. Int. Conf. on
Computer Vision and Pattern Recognition (CVPR).
Fritz, M., Saenko, K., and Darrell, T. (2010). Size mat-
ters: Metric visual search constraints from monocular
metadata. In Advances in Neural Information Process-
ing Systems (NIPS).
Irschara, A., Zach, C., Frahm, J.-M., and Bischof, H.
(2009). From structure-from-motion point clouds to
fast location recognition. In Proc. Int. Conf. on Com-
puter Vision and Pattern Recognition (CVPR).
Klein, G. and Murray, D. (2009). Parallel tracking and map-
ping on a camera phone. In Proc. Int. Symp. on Mixed
and Augmented Reality (ISMAR).
Knopp, J., Sivic, J., and Pajdla, T. (2010). Avoiding confus-
ing features in place recognition. In Proc. European
Conf. on Computer Vision (ECCV).
Kurz, D. and Benhimane, S. (2011). Inertial sensor-aligned
visual feature descriptors. In Proc. Int. Conf. on Com-
puter Vision and Pattern Recognition (CVPR).
Kurz, D., Meier, P., Plopski, A., and Klinker, G. (2013). An
Outdoor Ground Truth Evaluation Dataset for Sensor-
Aided Visual Handheld Camera Localization. In Proc.
Int. Symp. on Mixed and Augmented Reality (ISMAR).
Kurz, D., Olszamowski, T., and Benhimane, S. (2012). Rep-
resentative Feature Descriptor Sets for Robust Hand-
held Camera Localization. In Proc. Int. Symp. on
Mixed and Augmented Reality (ISMAR).
Lieberknecht, S., Benhimane, S., Meier, P., and Navab,
N. (2009). A dataset and evaluation methodology
for template-based tracking algorithms. In Proc. Int.
Symp. on Mixed and Augmented Reality (ISMAR).
Lowe, D. G. (2004). Distinctive image features from scale-
invariant keypoints. Int. Journal of Computer Vision
(IJCV), 60(2):91–110.
Reitmayr, G. and Drummond, T. W. (2007). Initialisation
for visual tracking in urban environments. In Proc.
Int. Symp. on Mixed and Augmented Reality (ISMAR).
Rosten, E. and Drummond, T. (2006). Machine learning for
high-speed corner detection. In Proc. European Conf.
on Computer Vision (ECCV).
Smith, E. R., Radke, R. J., and Stewart, C. V. (2012).
Physical scale keypoints: Matching and registration
for combined intensity/range images. Int. Journal of
Computer Vision (IJCV), 97(1):2–17.
Sturm, J., Engelhard, N., Endres, F., Burgard, W., and Cre-
mers, D. (2012). A Benchmark for the Evaluation of
RGB-D SLAM Systems. In Proc. Int. Conf. on Intel-
ligent Robot Systems (IROS).
Ventura, J. and H
¨
ollerer, T. (2012). Wide-area scene map-
ping for mobile visual tracking. In Proc. Int. Symp. on
Mixed and Augmented Reality (ISMAR).
Wulf, O., Nuchter, A., Hertzberg, J., and Wagner, B. (2007).
Ground truth evaluation of large urban 6D SLAM. In
Proc. Int. Conf. on Intelligent Robot Systems (IROS).
Zhang, Z. (2000). A flexible new technique for camera cal-
ibration. Trans. on Pattern Analysis and Machine In-
telligence (TPAMI), 22(11):1330–1334.
AbsoluteSpatialContext-awareVisualFeatureDescriptorsforOutdoorHandheldCameraLocalization-Overcoming
VisualRepetitivenessinUrbanEnvironments
67