learning on heterogeneous distributed systems. arXiv
preprint arXiv:1603.04467.
Arandjelovic, R., Gronat, P., Torii, A., Pajdla, T., and Sivic,
J. (2016). Netvlad: Cnn architecture for weakly su-
pervised place recognition. In 2016 IEEE Conference
on Computer Vision and Pattern Recognition (CVPR),
pages 5297–5307.
Babenko, A. and Lempitsky, V. (2015). Aggregating local
deep features for image retrieval. In The IEEE Inter-
national Conference on Computer Vision (ICCV).
Babenko, A., Slesarev, A., Chigorin, A., and Lempitsky, V.
(2014). Neural Codes for Image Retrieval, pages 584–
599. Springer International Publishing, Cham.
Bui, T., Ribeiro, L., Ponti, M., and Collomosse, J.
(2016). Generalisation and sharing in triplet con-
vnets for sketch based visual search. arXiv preprint
arXiv:1611.05301.
Chang, A. X., Funkhouser, T., Guibas, L., Hanrahan, P.,
Huang, Q., Li, Z., Savarese, S., Savva, M., Song, S.,
Su, H., Xiao, J., Yi, L., and Yu, F. (2015). ShapeNet:
An Information-Rich 3D Model Repository. Techni-
cal Report arXiv:1512.03012 [cs.GR], Stanford Uni-
versity — Princeton University — Toyota Technolo-
gical Institute at Chicago.
Chen, K. and Salman, A. (2011). Extracting speaker-
specific information with a regularized siamese deep
network. In Shawe-Taylor, J., Zemel, R. S., Bartlett,
P. L., Pereira, F., and Weinberger, K. Q., editors, Ad-
vances in Neural Information Processing Systems 24,
pages 298–306. Curran Associates, Inc.
Chopra, S., Hadsell, R., and LeCun, Y. (2005). Learning a
similarity metric discriminatively, with application to
face verification. In Computer Vision and Pattern Re-
cognition, 2005. CVPR 2005. IEEE Computer Society
Conference on, volume 1, pages 539–546. IEEE.
Eitz, M., Hays, J., and Alexa, M. (2012). How do humans
sketch objects? ACM Trans. Graph., 31(4):44:1–
44:10.
Gordo, A., Almaz
´
an, J., Revaud, J., and Larlus, D. (2016).
Deep image retrieval: Learning global representations
for image search. In ECCV.
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep re-
sidual learning for image recognition. In 2016 IEEE
Conference on Computer Vision and Pattern Recogni-
tion (CVPR), pages 770–778.
Ioffe, S. and Szegedy, C. (2015). Batch normalization:
Accelerating deep network training by reducing inter-
nal covariate shift. CoRR, abs/1502.03167.
Jgou, H., Douze, M., Schmid, C., and Prez, P. (2010). Ag-
gregating local descriptors into a compact image re-
presentation. In 2010 IEEE Computer Society Con-
ference on Computer Vision and Pattern Recognition,
pages 3304–3311.
Kingma, D. P. and Ba, J. (2014). Adam: A method for
stochastic optimization. CoRR, abs/1412.6980.
Lecun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998).
Gradient-based learning applied to document recogni-
tion. Proceedings of the IEEE, 86(11):2278–2324.
LeCun, Y. and Cortes, C. (2010). MNIST handwritten digit
database.
Maaten, L. v. d. and Hinton, G. (2008). Visualizing data
using t-sne. Journal of Machine Learning Research,
9(Nov):2579–2605.
Mikulik, A., Perdoch, M., Chum, O., and Matas, J. (2013).
Learning vocabularies over a fine quantization. In-
ternational Journal of Computer Vision, 103(1):163–
175.
Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., and
Ng, A. Y. (2011). Reading digits in natural images
with unsupervised feature learning. In NIPS workshop
on deep learning and unsupervised feature learning,
volume 2011, page 5.
Paulin, M., Douze, M., Harchaoui, Z., Mairal, J., Perronin,
F., and Schmid, C. (2015). Local convolutional fea-
tures with unsupervised training for image retrieval.
In The IEEE International Conference on Computer
Vision (ICCV).
Perronnin, F., S
´
anchez, J., and Mensink, T. (2010). Impro-
ving the Fisher Kernel for Large-Scale Image Classi-
fication, pages 143–156. Springer Berlin Heidelberg,
Berlin, Heidelberg.
Philbin, J., Chum, O., Isard, M., Sivic, J., and Zisserman, A.
(2007). Object retrieval with large vocabularies and
fast spatial matching. In 2007 IEEE Conference on
Computer Vision and Pattern Recognition, pages 1–8.
Philbin, J., Chum, O., Isard, M., Sivic, J., and Zisserman,
A. (2008). Lost in quantization: Improving particu-
lar object retrieval in large scale image databases. In
2008 IEEE Conference on Computer Vision and Pat-
tern Recognition, pages 1–8.
Radenovi
´
c, F., Tolias, G., and Chum, O. (2016). CNN
Image Retrieval Learns from BoW: Unsupervised
Fine-Tuning with Hard Examples, pages 3–20. Sprin-
ger International Publishing, Cham.
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh,
S., Ma, S., Huang, Z., Karpathy, A., Khosla, A.,
Bernstein, M., Berg, A. C., and Fei-Fei, L. (2015).
ImageNet Large Scale Visual Recognition Challenge.
International Journal of Computer Vision (IJCV),
115(3):211–252.
Sangkloy, P., Burnell, N., Ham, C., and Hays, J. (2016). The
sketchy database: Learning to retrieve badly drawn
bunnies. ACM Transactions on Graphics (proceedings
of SIGGRAPH).
Schneider, R. G. and Tuytelaars, T. (2014). Sketch classi-
fication and classification-driven analysis using fisher
vectors. ACM Trans. Graph., 33(6):174:1–174:9.
Su, H., Maji, S., Kalogerakis, E., and Learned-Miller, E.
(2015). Multi-view convolutional neural networks for
3d shape recognition. In Proceedings of the 2015
IEEE International Conference on Computer Vision
(ICCV), ICCV ’15, pages 945–953, Washington, DC,
USA. IEEE Computer Society.
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Angue-
lov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A.
(2015). Going deeper with convolutions. In Procee-
dings of the IEEE Conference on Computer Vision and
Pattern Recognition, pages 1–9.
Tolias, G., Sicre, R., and J
´
egou, H. (2016). Particular object
retrieval with integral max-pooling of CNN activati-
VISAPP 2018 - International Conference on Computer Vision Theory and Applications
94