Farinella, G., Signorello, G., Battiato, S., Furnari, A., Ra-
gusa, F., Leonardi, R., Ragusa, E., Scuderi, E., Lopes,
A., Santo, L., et al. (2019). VEDI: Vision Exploita-
tion for Data Interpretation. In International Confer-
ence on Image Analysis and Processing, pages 753–
763. Springer.
Furnari, A. and Farinella, G. M. (2019). What would you
expect? anticipating egocentric actions with rolling-
unrolling lstms and modality attention. In Interna-
tional Conference on Computer Vision.
Giuliano, R., Marzovillo, M., Mazzenga, F., and Vari, M.
(2014). Visitors localization in cultural heritages for
experience enhancement. In 2014 Euro Med Telco
Conference (EMTC), pages 1–6. IEEE.
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep resid-
ual learning for image recognition. In 2016 IEEE Con-
ference on Computer Vision and Pattern Recognition
(CVPR), pages 770–778.
Horn, G. V., Mac Aodha, O., Song, Y., Shepard, A., Adam,
H., Perona, P., and Belongie, S. J. (2017). The inatu-
ralist challenge 2017 dataset. CoRR, abs/1707.06642.
Huang, G., Liu, Z., v. d. Maaten, L., and Weinberger, K. Q.
(2017). Densely connected convolutional networks. In
2017 IEEE Conference on Computer Vision and Pat-
tern Recognition (CVPR), pages 2261–2269.
Iandola, F. N., Han, S., Moskewicz, M. W., Ashraf, K.,
Dally, W. J., and Keutzer, K. (2016). Squeezenet:
Alexnet-level accuracy with 50x fewer parameters and
<0.5mb model size.
Joly, A., Go
¨
eau, H., Glotin, H., Spampinato, C., Bon-
net, P., Vellinga, W.-P., Lombardo, J.-C., Planque,
R., Palazzo, S., and M
¨
uller, H. (2017). LifeCLEF
2017 Lab Overview: Multimedia Species Identifica-
tion Challenges. In Jones, G. J., Lawless, S., Gonzalo,
J., Kelly, L., Goeuriot, L., Mandl, T., Cappellato, L.,
and Ferro, N., editors, CLEF: Cross-Language Evalu-
ation Forum, volume LNCS of Experimental IR Meets
Multilinguality, Multimodality, and Interaction, pages
255–274, Dublin, Ireland. Springer.
Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Im-
agenet classification with deep convolutional neural
networks. In Proceedings of the 25th International
Conference on Neural Information Processing Sys-
tems - Volume 1, NIPS’12, pages 1097–1105, USA.
Curran Associates Inc.
Kumar, N., Belhumeur, P. N., Biswas, A., Jacobs, D. W.,
Kress, W. J., Lopez, I. C., and Soares, J. V. B. (2012).
Leafsnap: A computer vision system for automatic
plant species identification. In Fitzgibbon, A. W.,
Lazebnik, S., Perona, P., Sato, Y., and Schmid, C.,
editors, ECCV (2), volume 7573 of Lecture Notes in
Computer Science, pages 502–516. Springer.
Milotta, F. L., Furnari, A., Battiato, S., Signorello, G., and
Farinella, G. M. (2019a). Egocentric visitors localiza-
tion in natural sites. Journal of Visual Communication
and Image Representation, page 102664.
Milotta, F. L. M., Furnari, A., Battiato, S., Salvo, M. D.,
Signorello, G., and Farinella, G. M. (2019b). Visitors
localization in natural sites exploiting egovision and
gps. In International Conference on Computer Vision
Theory and Applications (VISAPP).
Ragusa, F., Furnari, A., Battiato, S., Signorello, G., and
Farinella, G. M. (2019). Egocentric visitors localiza-
tion in cultural sites. Journal on Computing and Cul-
tural Heritage (JOCCH), 12(2):11.
Seidenari, L., Baecchi, C., Uricchio, T., Ferracani, A.,
Bertini, M., and Bimbo, A. D. (2017). Deep artwork
detection and retrieval for automatic context-aware
audio guides. ACM Transactions on Multimedia Com-
puting, Communications, and Applications (TOMM),
13(3s):35.
Simonyan, K. and Zisserman, A. (2015). Very deep con-
volutional networks for large-scale image recognition.
In International Conference on Learning Representa-
tions.
Wegner, J. D., Branson, S., Hall, D., Schindler, K., and Per-
ona, P. (2016). Cataloging public objects using aerial
and street-level images; urban trees. In 2016 IEEE
Conference on Computer Vision and Pattern Recogni-
tion (CVPR), pages 6014–6023.
Localizing Visitors in Natural Sites Exploiting Modality Attention on Egocentric Images and GPS Data
617