Localization based on Wireless and BLE Devices.
Localization can be performed employing several de-
vices and signals, such as antennas, RGB cameras,
mobile wireless devices (Alahi et al., 2015), and blu-
etooth low energy (BLE) (Ishihara et al., 2017b; Ishi-
hara et al., 2017a).
Alahi et al. (Alahi et al., 2015) developed a met-
hod to improve human localization based on GPS
employing a set of fixed antennas coupled with
fixed RGB cameras and mobile wireless devices (i.e.,
smartphones/beacons). The authors used a multimo-
dal approach in which visual information (RGB) is
considered jointly with wireless signals (W) obtaining
the so called RGB-W data. Signal trilateration and
propagation models are at the core of this wireless-
based approach. These signals are used jointly with
tracking methods in the RGB domain to localize users
in indoor environments.
Ishihara et al. (Ishihara et al., 2017b) have shown
how the user localization can be performed through a
beacon-guided approach, instrumenting the environ-
ment with bluetooth low energy (BLE) signals emit-
ters. The authors designed a method in which radio-
wave-based localization is combined with Structure
from Motion (SfM) starting from visual input. Ho-
wever, as stated by the authors, SfM is still a chal-
lenging task in a real world context (particularly in
natural outdoor scenarios) as it does not perform well
in environments with little to no distinctive visual fe-
atures or when there is a large amount of repetitive
features, like in a natural site (i.e., a garden with a
lot of plants as in our case). An improvement of the
approach as been proposed in (Ishihara et al., 2017a)
where inference machines have been trained on pre-
viously collected pedestrian paths to perform user lo-
calization. In this way, the authors managed to reduce
localization and orientation error with respect to their
previous method.
While the exploitation of Wireless and BLE de-
vices is convenient in indoor settings, this is not ge-
nerally the case in large outdoor and natural environ-
ments. The main problems due to the lack of existing
infrastructures (e.g., WiFi) and due to the difficulties
arising from the installation of specific hardware in
such settings. Therefore, in this paper, we consider
the exploitation of visual and GPS signal, which do
not require the installation of specific hardware in the
site.
Image based Localization. In this paper we ad-
dress localization of the visitors as a classification
problem, where each class represents a context of a
large outdoor natural place. This approach has been
already considered by other authors, as briefly sum-
marized in the following.
Furnari et al. (Furnari et al., 2017) considered the
problem of recognizing personal locations specified
by the user from egocentric videos. The segmentation
problem is addressed as an “open-set” classification
problem where the classes specified by the user have
to be identified and the other environments, which are
unseen during training, need to be rejected.
Body-mounted video cameras have been em-
ployed by Starner et al. (Starner et al., 1998) to lo-
calize users from first person images. Localization is
in this case considered at the room level in a “close-
set” scenario in which the users can move in a limited
set of environments.
Santarcangelo et al. (Santarcangelo et al., 2016)
have investigated the use of multimodal signals col-
lected from shopping carts to localize customers in a
retail store. The inferred location information is then
exploited fo infer the behavior of customers for mar-
keting purposes in a “Visual Market Basket Analysis”
scenario.
Ragusa et al. (Ragusa et al., 2018; Ragusa et al.,
2019) considered the problem of localizing the visi-
tors of a cultural site from egocentric video. In the
considered settings, museum curators and site mana-
gers could take advantage from the inferred informa-
tion to improve the arrangement of their collections
and increase the interest of their audience. The system
has been extended to automatically produce summa-
ries of the tours to be sent to the visitors of the cultural
site as digital memory.
Classification based localization has been studied
by Weyand et al. (Weyand et al., 2016). Specifically,
the authors presented PlaNet, a deep network able to
localize images of places through different cues such
as landmarks, weather patterns, vegetation, road mar-
kings, or architectural details.
Similarly to the aforementioned works, we tackle
localization as a classification problem, dividing the
space of interest into areas. Differently from the
above approaches, we explore the combination of
GPS and visual input to achieve better accuracy at a
low computational cost.
Joint Exploitation of Images and GPS for Loca-
lization. Previous works investigated the combina-
tion of GPS and vision to localize users in an envi-
ronment. Capi et al. (Capi et al., 2014) presented
an assistive robotic system to guide visually impaired
people in urban environments. Electronic navigation
aid is achieved through a multimodal approach: data
from GPS, compass, laser range finders, and visual
information are merged together and used for trai-
ning neural networks. The assistive robotic system
VISAPP 2019 - 14th International Conference on Computer Vision Theory and Applications
558