its components, associated to an initial part of the con-
structed ranking, being a set or interval of values. Al-
though we have not fully explored this flexibility in
our empirical setting, both of these features could be
useful in practical applications, where one would at-
tempt to actually determine what D should be.
A first approach to determining D could be to
measure the diversity µ(X) of the set X of options
that are available, and set each component of D to
equal {µ(X )}. This would effectively ask for some
form of uniformity of the diversity exhibited in any
initial part of a ranking presented to the user. Thus,
the user would not only wish for the entire list of op-
tions presented to them to be diverse, but also for the
first handful of results in that ranking — which prac-
tically is what a user gets to see when making their
choice — to also be equally diverse.
Variations are, of course, possible. Instead of in-
sisting on uniformity, the user could ask that each ini-
tial part of the ranking is at least as diverse as the en-
tire list by setting each component of D to equal the
interval [µ(X),1], so that the shuffling algorithm will
not seek to demote diversity of a part of the ranking
if this happens to be higher than µ(X). Analogously,
the user may wish to relax the minimum bound on the
diversity and replace the interval [µ(X), 1] with, say,
[µ(X)/2, 1], insisting that some diversity is minimally
and uniformly required in each part of the ranking.
An interesting direction for future research would
be to determine D using actual user feedback in real
scenarios. Namely, it would be interesting to see to
what extent and under what conditions users desire
ranked lists presented to them to be diverse and, on
top of that, what part of the list is of significant im-
portance — for instance, in typical web search tasks,
the first page of the returned results is usually of the
highest significance to a user.
One approach to learning the diversity that a user
may be seeking in their ranked options is to track the
user’s past choices from ranked lists. The precise
choice from each specific ranking can, naturally, be
used as a signal of how the user wishes the options to
be ranked to begin with. But in addition, looking at
the choices C
t
of the user after t rounds of being pre-
sented with a ranking and making a choice, one could
consider the diversity µ(C
t
) of C
t
as an indication of
the level of diversity that the user would wish to see
in subsequent rounds of rankings. Setting each com-
ponent of D to equal µ(C
t
), and presumably updating
D over time as C
t
is computed for larger and larger
values of t would capture this intuition.
A more involved solution would be to maintain a
list of user choices as above, but partitioned based on
the position in the ranking of each choice made. So,
if during round t a user chooses option at position i
in their presented ranking, then that option would be
added to list C
t
i
. D would then be defined by setting its
i-th component to equal µ(C
t
i
). Further accounting for
the amount of data within each list C
t
i
of choices, one
could insist that each component of D is some confi-
dence interval around µ(C
t
i
), whose size is determined
by the size of C
t
i
. This would then be able to capture
situations where a user might signal that different lev-
els of diversity are desired for different initial parts
of the ranking; e.g., no diversity preference for the
first handful of results (so that the original ranking is
maintained), but then maximum diversity in the sec-
ond handful of results (so that the user can explore
alternatives that might be less preferred according to
the original ranking, but are highly diverse).
5.2 Measuring Shuffling Deviation
A choice that we made for the purposes of our em-
pirical evaluation was to implement the deviation re-
striction, R, required by Algorithm 3, through the use
of
1
as a simple to compute yet informative way to
quantify distance between two ranked lists. As we
explained in Section 4.2,
1
is naturally interpreted
as the total “displacement” required to transit from a
ranking r
1
to another ranking r
2
. However, what mat-
ters in most occasions a ranking is used, is the first
part of the returned ranked list. So,
1
may not be the
most appropriate choice since it does not make a dis-
tinction between changes in the first, middle or last
places of a ranking r. A natural adaptation of
1
ac-
commodating the above is to introduce weights, w
k
,
∑
n
k=1
w
k
= 1, such that moving the first, say, element
of r(X) three places down the list matters more than
moving the 15
th
element to position 18.
Other than distance-based deviation constraints
that apply globally on a ranked list, it would also be
meaningful to explore more complex restrictions R,
including ones that are not necessarily based on dis-
tance metrics. For instance, in some scenarios a user
may wish to explicitly state that the initial handful of
options in the original ranking should not be shuffled
and that subsequent options could be shuffled more
the later in the ranking they appear.
5.3 Efficient Search of Shufflings
In the implementation of Algorithm 3, as used in
the empirical evaluation presented in Section 4, we
adopted a greedy breadth-first strategy to search the
space of all shufflings of r(X). The only heuristic we
have implemented in order to alter the order nodes
are expanded is, as mentioned in Section 3, that when
Post-hoc Diversity-aware Curation of Rankings
333