Client-side Mobile Visual Search

Andreas Hartl, Dieter Schmalstieg, Gerhard Reitmayr


Visual search systems present a simple way to obtain information about our surroundings, our location or an object of interest. Typically, mobile applications of visual search remotely connect to large-scale systems capable of dealing with millions of images. Querying such systems may induce considerable delays, which can severeley harm usability or even lead to complete rejection by the user. In this paper, we investigate an interim solution and system design using a local visual search system for embedded devices. We optimized a traditional visual search system to decrease runtime and also storage space in order to scale to thousands of training images on current off-the-shelf smartphones. We demonstrate practical applicability in a prototype for mobile visual search on the same target platform. Compared with the unmodified version of the pipeline we achieve up to a two-fold speed-up in runtime, save 85% of storage space and provide substantially increased recognition performance. In addition, we integrate the pipeline with a popular Augmented Reality SDK on Android devices and use it as a pre-selector for tracking datasets. This allows to instantly use a large number of tracking targets without requiring user intervention or costly server-side recognition.


  1. Alahi, A., Ortiz, R., and Vandergheynst, P. (2012). Freak: Fast retina keypoint. In CVPR, pages 510-517.
  2. Bay, H., Ess, A., Tuytelaars, T., and Gool, L. V. (2008). Speeded-up robust features (SURF). Computer Vision and Image Understanding, 110(3):346-359.
  3. Calonder, M., Lepetit, V., Strecha, C., and Fua, P. (2010). Brief: binary robust independent elementary features. In ECCV, pages 778-792.
  4. Chandrasekhar, V.and Takacs, G., Chen, D. M., Tsai, S., Reznik, Y.and Grzeszczuk, R., and Girod, B. (2012). Compressed histogram of gradients: A low-bitrate descriptor. International Journal of Computer Vision, 96(3):384-399.
  5. Chandrasekhar, V. R., Chen, D. M., Tsai, S. S., Cheung, N.-M., Chen, H., Takacs, G., Reznik, Y., Vedantham, R., Grzeszczuk, R., Bach, J., and Girod, B. (2011). The stanford mobile visual search data set. In MMSys, pages 117-122.
  6. Chen, D. M., Tsai, S. S., Chandrasekhar, V., Takacs, G., Vedantham, R., Grzeszczuk, R., and Girod, B. (2010). Inverted index compression for scalable image matching. In IEEE DCC, page 525.
  7. Evans, C. (2009). Notes on the OpenSURF Library. Technical Report CSTR-09-001, University of Bristol.
  8. Fischler, M. A. and Bolles, R. C. (1981). Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM, 24(6):381-395.
  9. Girod, B., Chandrasekhar, V., Chen, D. M., Cheung, N.-M., Grzeszczuk, R., Reznik, Y. A., Takacs, G., Tsai, S. S., and Vedantham, R. (2011). Mobile visual search. IEEE Signal Processing Magazine, 28(4):61-76.
  10. Hartley, R. and Zisserman, A. (2003). Multiple View Geometry in Computer Vision. Cambridge University Press, New York, NY, USA, 2nd edition.
  11. He, J., Feng, J., Liu, X., Cheng, T., Lin, T.-H., Chung, H., and Chang, S.-F. (2012). Mobile product search with bag of hash bits and boundary reranking. In CVPR, pages 3005-3012.
  12. Henze, N., Schinke, T., and Boll, S. (2009). What is that? object recognition from natural features on a mobile phone. In Workshop on Mobile Interaction with The Real World.
  13. Jégou, H., Douze, M., Schmid, C., and Pérez, P. (2010). Aggregating local descriptors into a compact image representation. In CVPR, pages 3304-3311.
  14. Ji, R., Duan, L.-Y., Chen, J., Yao, H., Rui, Y., Chang, S.- F., and Gao, W. (2011). Towards low bit rate mobile visual search with multiple-channel coding. In ACM MM, pages 573-582.
  15. Leutenegger, S., Chli, M., and Siegwart, R. (2011). Brisk: Binary robust invariant scalable keypoints. In ICCV, pages 2548-2555.
  16. Lowe, D. G. (2004). Distinctive image features from scaleinvariant keypoints. International Journal of Computer Vision, 60(2):91-110.
  17. Marius Muja, M. and Lowe, D. G. (2009). Fast approximate nearest neighbors with automatic algorithm configuration. In VISAPP, pages 331-340.
  18. Moffat, A. and Anh, V. N. (2005). Binary codes for nonuniform sources. In IEEE DCC, pages 133-142.
  19. Nister, D. and Stewenius, H. (2006). Scalable recognition with a vocabulary tree. In CVPR, pages 2161-2168.
  20. Pearson, K. (1901). On lines and planes of closest fit to systems of points in space. Philosophical Magazine, 2:559-572.
  21. Rublee, E., Rabaud, V., Konolige, K., and Bradski, G. (2011). Orb: an efficient alternative to sift or surf. In ICCV, pages 2564-2571.
  22. Sivic, J. and Zisserman, A. (2003). Video google: a text retrieval approach to object matching in videos. In ICCV, pages 1470-1477.
  23. Trzcinski, T., Christoudias, M., Fua, P., and Lepetit, V. (2013). Boosting binary keypoint descriptors. In CVPR, pages 2874-2881.
  24. Trzcinski, T., Lepetit, V., and Fua, P. (2012). Thick boundaries in binary space and their influence on nearest-neighbor search. Pattern Recognition Letters, 33(16):2173-2180.
  25. Tsai, S., Chen, D. M., Takacs, G., Chandrasekhar, V., Vedantham, R., Grzeszczuk, R., and Girod, B. (2010). Fast geometric re-ranking for image-based retrieval. In ICIP, pages 1029-1032.
  26. Tsai, S. S., Chen, D., Takacs, G., Chandrasekhar, V., Singh, J. P., and Girod, B. (2009). Location coding for mobile image retrieval. In MMCC, pages 8:1-8:7.
  27. Wang, X., Yang, M., Cour, T., Zhu, S., Yu, K., and Han, T. (2011). Contextual weighting for vocabulary tree based image retrieval. In ICCV, pages 209-216.
  28. Zhou, W., Lu, Y., Li, H., and Tian, Q. (2012). Scalar quantization for large scale image search. In ACM MM, pages 169-178.

Paper Citation

in Harvard Style

Hartl A., Schmalstieg D. and Reitmayr G. (2014). Client-side Mobile Visual Search . In Proceedings of the 9th International Conference on Computer Vision Theory and Applications - Volume 3: VISAPP, (VISIGRAPP 2014) ISBN 978-989-758-009-3, pages 125-132. DOI: 10.5220/0004672901250132

in Bibtex Style

author={Andreas Hartl and Dieter Schmalstieg and Gerhard Reitmayr},
title={Client-side Mobile Visual Search},
booktitle={Proceedings of the 9th International Conference on Computer Vision Theory and Applications - Volume 3: VISAPP, (VISIGRAPP 2014)},

in EndNote Style

JO - Proceedings of the 9th International Conference on Computer Vision Theory and Applications - Volume 3: VISAPP, (VISIGRAPP 2014)
TI - Client-side Mobile Visual Search
SN - 978-989-758-009-3
AU - Hartl A.
AU - Schmalstieg D.
AU - Reitmayr G.
PY - 2014
SP - 125
EP - 132
DO - 10.5220/0004672901250132