this field, followed by a general explanation of the
active vision system for human guidance in Section 3,
and a detailed explanation of the human-control mo-
dule in Section 4. The experimental results are pre-
sented in Section 5, after which we conclude the pa-
per and discuss future work in Section 6.
2 PREVIOUS WORK
Assistive technology for people living with visual im-
pairments is a growing research area (Manduchi and
Coughlan, 2012; Khoo and Zhu, 2016). In recent ye-
ars, the increase in mobile processing power and com-
puter vision improvements have led to research in the
use of smartphone cameras to augment or enhance
a user’s vision and help them find objects or other
points of interest. Earlier attempts at the problem in-
volved placing special markers or barcodes around an
environment, which the user then scans with a smartp-
hone or similar mobile device (Gude et al., 2013; Ian-
nizzotto et al., 2005; Manduchi, 2012). This device
then uses some feedback mode, e.g. Braille or sound,
to guide the user towards the target.
Another approach is to discard tags completely
and rely on computer vision to perform the object
detection, something that has become more practi-
cal with recent improvements to feature detectors and
deep networks (Huang et al., 2017; Redmon et al.,
2016). SIFT and SURF-based object detectors have
also been used to detect known objects, when they are
in the camera’s view, and to guide the user to them
using sonified instructions (Schauerte et al., 2012).
These type of systems is more flexible than the tag-
based ones, but it has the same drawback of being
passive, in the sense that it relies on having the object
within the camera’s view in the first place. Also, no
clear performance metrics are reported in the previous
paper. The VizWiz system (Bigham et al., 2010) of-
floads the object recognition tasks to an Amazon Me-
chanical Turk worker who then provides feedback on
where the object of interest is located relative to the
user. The VizWiz has the advantage of being fairly
robust and is able to classify a great deal of objects
with little effort from the user and can provide natural,
human-generated and curated directions. However,
this approach does not enhance user independence,
since a person with visual impairments is now behol-
den to an online worker instead of a relative, friend or
bystander. Furthermore, a good internet connection
is required on the device, possibly limiting its use in
some poor-reception areas.
Previous researchers have implemented active se-
arch and perception strategies in robots and image
classifiers (Bajcsy et al., 2017) in an attempt to op-
timise their classification and planning tasks, for ex-
ample by exploiting the structured nature of human
environments and object placements. Two research
teams have recently implemented an active object se-
arch strategy into their image classifiers (Caicedo and
Lazebnik, 2015; Gonzalez-Garcia et al., 2015). Their
approaches use different methods but conceptually si-
milar models to generate windows of interest for vi-
sual classification. The size and locations of the win-
dows within the image are generated using the spatial
relationship between objects, taken from the SUNCG
and PASCAL datasets (Song et al., 2017; Everingham
et al., 2010), and are iteratively changed based on the
output from the respective models. The advantage of
their approaches is that fewer windows are genera-
ted and submitted to the classifier, resulting in lower
object classification times while still keeping state-of-
the-art results for accuracy.
Similar strategies have been incorporated on robo-
tic platforms to improve autonomous object search,
manipulation and localisation tasks. For example,
some researchers have developed a planning algo-
rithm for a robotic manipulator that performs an op-
timal object search in a cluttered environment (Dogar
et al., 2014). Another team implemented an MDP ge-
nerating an optimal object search strategy in a room
over a belief state of object positions and configurati-
ons (Aydemir et al., 2011). However, the authors trai-
ned their MDP using a custom object-placement and
configuration scenario, so their results are sensitive to
changes within this distribution.
In summary, much research has been conducted
on recognition of and guidance towards target objects,
including active vision solutions for image classifiers
and robotic systems. However, to our knowledge, no
previous work has been done on active object search
and guidance for humans, which would especially be-
nefit people with visual impairments. In this paper,
we implement such an active vision system with a hu-
man in the loop that guides the user towards an out-
of-view target object. Our system exploits prior kno-
wledge of the objects spatial distribution within an
indoor environment, learned from a dataset of real-
world images, and the history of past object observa-
tions made during the search.
3 ACTIVE VISION SYSTEM
The work presented in this paper is a fundamental step
towards a more general project’s goal to develop a
stand-alone system that can guide a person with visual
impairments to his/her destination with minimal user
Active Object Search with a Mobile Device for People with Visual Impairments
477