Figure 2: A result corresponding to the query car on an
annotated data base.
images seem to have little relevance for the query,
they all had captions such as “in the car,” “in our car,”
“seen from my car,” and so on. For all of them, af-
ter stop-word removal and stemming, the only stem
left was CAR which therefore received, in the norma-
lized vector model, a weight of 1. These images had
therefore maximal similarity with the query and, in
the absence of provisions for diversifying the results,
the system guilelessly put them in the first place. The
system here is stuck in a “semantic rut:” a lot of ima-
ges share the same haphazard characteristic that ma-
kes them falsely suitable for the query and, given the
frequency of occurrence of this trait, hoard the first
positions of the result list.
These examples are not a slapdash collection of
fortuitous cases, but a representative of a general phe-
nomenon: relevance, alone, is not a suitable basis for
satisfactory retrieval. In information retrieval, the Ro-
bertsonian hypothesis that relevance is a property of a
document has all but been abandoned. It is, we argue,
time that the multimedia community follow suit.
In the multimedia community, attempts have been
made to solve the problems caused by examples such
as that of figure 1 through near-duplicate elimination.
Unfortunately, these inchoate techniques do not relate
near-duplicity to the information content of the result
set. Consequently, they fail to take a global view of
the result set, to consider it as a mathematical object
with a precise function, a function that can be inva-
lidated by elements that are not necessarily duplica-
tes. This makes near-duplicate elimination of limi-
ted usefulness in annotation-based systems (vide the
example of figure 2, in which no near-duplicates are
present), or on hybrid similarity-annotation systems
(Rasiwasia et al., 2010).
In this paper, we propose a non-Robertsonian fra-
mework to deal with these issues, one based on the
notions of novelty and diversity, with which Informa-
tion Retrieval conceptualizes the problems caused by
the independence hypothesis. These concepts, and the
measures that come with them, will allow, on the one
hand, to avoid the limitations of near-duplicate remo-
val and, on the other hand, the formulation of a cohe-
rent theory that will apply to visual similarity systems,
annotation-based systems, and their hybrids.
Diversity is the notion that allows the result set to
deal with query ambiguity, while novelty deals with
query underspecification. Consider a query compo-
sed of the keyword manhattan. The query is ambigu-
ous as it can have several interpretations: it may refer
to one of the “boroughs” of New York, to the cock-
tail, to the Woody Allen movie, or to the Indian tribe
from which the Dutch bought the island. A result set
with high diversity will cover all these interpretations,
possibly in a measure proportional to an a priori esti-
mation of the interest in each one of them. If we con-
centrate on a single interpretation (say: the borough)
there are many different aspects in which one may
be interested. We may be interested in the history of
Manhattan, in its attractions, in getting around it, or
in the housing prices. While interpretations are as-
sumed to be mutually exclusive (if I am interested in
the movie, I am probably not interested in the Indian
tribe), aspects are inclusive: I am more or less inte-
rested in all of them. An item in a result set is novel
to the extent in which it covers aspects of a query not
covered by other items in the result set, that is, to the
extent in which items are non-redundant: removing
an item would lead to a result set that would not cover
one or more of the aspects covered by the set before
the removal. Diversity is a global property of a data
set, while novelty is the corresponding property of a
document with respect to a set.
In the last few years, various methods have been
proposed both to measure the diversity and novelty
of a set of items (Chapelle et al., 2009; Clarke et al.,
2009; Santini and Castells, 2011) and to generate re-
sult sets that maximize novelty and/or diversity (Zhai
et al., 2003; Agrawal et al., 2009). Unfortunately, un-
like the Robertsonian model–whose complexity wit-
hout indices is O(|D|log n)–for virtually all measures
of interest maximizing novelty and diversity is NP-
complete (Santini, 2011), so approximate solutions
have to be used. No formal, workable definition of
novelty and diversity has hitherto been proposed for
multimedia.
As a final epistemological note, we point out that
while novelty and diversity are often maximized at
the same time, they have quite different implications,
and affect the results in quite different ways. From
the point of view of the final user, novelty should
always be maximized, as it avoids receiving redun-
dant results, and uses the “result budget” (the limited
number of items that can appear in the result list) to
cover different aspects of interest to the user. Diver-
sity is, from the point of view of the user, a nuisance.
Each user would of course like to minimize diversity
by receiving results only about the interpretation that
SIGMAP 2018 - International Conference on Signal Processing and Multimedia Applications
338