Authors:
Giovanni Maria Pasqualino
1
;
Stefano Scafiti
1
;
Antonino Furnari
1
and
Giovanni Maria Farinella
2
;
1
Affiliations:
1
Department of Mathematics and Computer Science, University of Catania, Catania, Italy
;
2
Cognitive Robotics and Social Sensing Laboratory, ICAR-CNR, Palermo, Italy
Keyword(s):
Egocentric (First Person) Vision, Localization, GPS, Multi-modal Data Fusion.
Abstract:
Localizing the visitors of an outdoor natural site can be advantageous to study their behavior as well as to provide them information on where they are and what to visit in the site. Despite GPS can generally be used to perform outdoor localization, we show that this kind of signal is not always accurate enough in real-case scenarios. On the contrary, localization based on egocentric images can be more accurate but it generally results in more expensive computation. In this paper, we investigate how fusing image- and GPS-based predictions can allow to achieve efficient and accurate localization of the visitors of a natural site. Specifically, we compare different fusion techniques, including a modality attention approach which is shown to provide the best performances. Results point out that the proposed technique achieve promising results, allowing to obtain the performances of very deep models (e.g., DenseNet) with a less expensive architecture (e.g., SqueezeNet) which employ a mem
ory footprint of about 3MB and an inference speed of about 25ms.
(More)