it. Actually it will probably be next door feature of
image library management tools and web search en-
gines, complementing other research efforts focusing
on classification, annotation and so on.
2 BACKGROUND
The problem of image retrieval is two-fold. In the first
place, we need fast and effective techniques to convey
visual similarity to the user. In the second place, we
need an effective technique to allow the user to man-
age the results.
Regarding the first problem, a great amount of lit-
erature has been proposed. Among it, we think that
the natural choice is a global feature representation,
providing a compact summary by aggregating some
information extracted at every pixel location of the
image. The bag-of-words approach, a global repre-
sentation build of clustered local features like SIFT
(Lowe, 2004) or SURF (Bay et al., 2008) as a vi-
sual dictionary, is generally considered the state of
the art. For a complete comparison of performance of
local features in CBIR, please refer to (Mikolajczyk
and Schmid, 2005). Most of these local descriptors
use luminance information only. Nevertheless, both
color and shape are widely considered important vi-
sual characteristics in a cognitive context, so an inter-
esting way to account this information is by using the
covariance region descriptor, proposed by Tuzel et
al. in (Tuzel et al., 2008), which aggregates the cor-
relations of a custom amount of elementary sources
of information (like color, shape, spatial information,
gradients). Moreover, great interest was devoted to
GIST feature, a statistical summary of the spatial lay-
out properties (Spatial Envelope representation) of the
scene (Oliva and Torralba, 2006).
To solve the second problem, as pioneered by
Renninson in (Rennison, 1994), a presentation strat-
egy is required. The classical spatial arrangement of
images is their placement on a grid, typically in row-
major ordering based on relevance. Despite its sim-
plicity, this visualization is unable to convey infor-
mation on the structure of the collection, for example
the availability of a cluster of similar images. As de-
scribed in (Heesch, 2008), alongside with more stan-
dard approaches based on static hierarchies or cluster-
ing, the main approaches are build around a network
based or a dimensionality reduction based represen-
tations. Multi-Dimensional Scaling (MDS) solves a
non linear optimization problem by determining the
mapping that best approximates the high-dimensional
pairwise distances between data points. One of the
initial proposals was the Sammon mapping by (Sam-
mon, 1969). An interesting proposal of this kind is
the Hyperbolic-MDS by (Walter, 2004), which ex-
ploits the hyperbolic space H
2
to map the most sig-
nificant images in the center of the projection (thus
visualizing them with a greater detail) while displac-
ing the others along the curve H
2
falling towards in-
finity with a smaller scale; moreover this projection
has the advantage of allowing to focus the view in
different points by applying the M
¨
obius transforma-
tion. A number of other non-linear projections have
been proposed to solve the prohibitive computational
costs, for example the isometric mapping (ISOMAP)
(Tenenbaum et al., 2000), the stochastic neighbor em-
bedding (SNE) (Hinton and Roweis, 2002) and the
local linear embedding (LLE) (Roweis and Lawrence,
2000). An older yet effective approach, especially in
large scale contexts, is finally the FastMap (Faloutsos
and Lin, 1995) which exploits a set of pivot objects to
project points in the reduced space. This technique,
exploited also in this paper, has the advantage to al-
low easily a fast insertion of new objects within the
map.
3 RELEVANCE FEEDBACK FOR
IMAGE SURFING
The first task in image searching on large scale col-
lections is clearly managing the scalability problem.
Many techniques for approximated nearest neighbor
(ANN) search, starting from the LSH (Andoni and
Indyk, 2006) up to the product quantization (J
´
egou
et al., 2011), allow to greatly improve the perfor-
mance using vocabulary codes (with precomputed
distances) in place of real features. Moreover image
search based on contextual information (as done by
all search engines) proves to be definitely effective.
The real limitation of todays multimedia systems is
within the interaction possibilities.
The most important way in which the user can
help the system cross the semantic gap and interact
with the retrieval results, i.e. the relevance feedback,
becomes first of all prohibitive in large scale contexts.
Just consider the usual approaches: query point move-
ment (QPM), feature space warping (FSW) or ma-
chine learning approaches (Chang et al., 2009). QPM
notoriously suffers of slow convergence, and does not
guarantee to find intended targets; a fast QPM tech-
nique, trying to fix this problem, has been proposed
by (Liu et al., 2009). FSW requires a full space re-
encoding, and no proposals at the best of our knowl-
edge take into account FSW in large scale scenarios.
Finally the learning is notoriously a heavy procedure,
often requiring an offline processing and hardly capa-
RELEVANCE FEEDBACK AS AN INTERACTIVE NAVIGATION TOOL
55