Maximizing the Relevant Diversity of Social Swarming Information
Peter Terlecky
1
, Yurong Jiang
2
, Xing Xu
2
, Amotz Bar-Noy
1
and Ramesh Govindan
2
1
CUNY Graduate Center, New York, NY, USA
2
University of Southern California, Los Angeles, CA, U.S.A.
Keywords:
Information Diversity, Social Swarming, Maximum Coverage.
Abstract:
In social swarming applications, users are equipped with smartphones and generate data on specific tasks
in the form pictures, video, audio, text. A central commander would like to gain access to data relevant
to a particular query. Which data wirelessly uploaded to the commander maximizes the amount of diverse
information received subject to a bandwidth constraint? We model such a problem in two distinct ways. It
is first modeled as a maximum coverage with group budget constraints problem and then as a variant of the
maximum edge-weighted clique problem. It is shown that the algorithm for the maximum coverage model
outperforms a heuristic for the clique-based model theoretically and practically, with both performing very
well experimentally compared to an upper bound benchmark.
1 INTRODUCTION
In social swarming applications, users are equipped
with smartphones with which they generate data on
specific tasks. The data is in the form of text, video,
audio, and pictures. A central commander would like
to wirelessly receive information relevant to a query
from each of the users that is collectively comprehen-
sive.
Furthermore, he does not wish the users flood the
network with all of their data, so he requires each
user to send a limited amount of the relevant collected
data. The commander would like to avoid duplicate
data or data that is similar in nature. That is, the com-
mander would like to maximize the dissimilarity of
the received data set, but have the data set be relevant
to the query/task. Which data should be uploaded
from which users to maximize the diversity of infor-
mation while maintaining bandwidth constraints?
One way in which we model such a problem is
as a maximum coverage with group budgets prob-
lem. Chekuri and Kumar (Chekuri and Kumar, 2004)
introduce and analyze the maximum coverage with
group budgets problem. They give a 2-approximation
for the cardinality version of the problem and a 12-
approximation for the cost version.
We also be model such a problem as a variant of
the maximum edge-weighted clique problem. The
maximum edge-weighted clique problem was stud-
ied in(Dijkhuizen and Faigle, 1993; Macambira and
De Souza, 2000; Hunting et al., 2001; Park et al.,
1996).
Receiving a diverse set of high quality informa-
tion is important in many applications. Consider a
shopper is deciding on whether to buy a product. The
shopper would like to receive reviews on a product.
The reviews may be grouped by category or rating. In
particular, the shopper would like to receive a diverse
subset of high quality reviews avoiding duplicate or
similar reviews. Which subset of reviews should be
shown to the shopper? Tsaparas et al. (Tsaparas et al.,
2011) focus on the problem of selecting a comprehen-
sive set of high-quality reviews covering many differ-
ent aspects of the reviewed problem. They model the
problem as a maximum coverage problem and pro-
vide algorithms which they test on a user study of
the Amazon Mechanical Turk. Lappas et al. (Lap-
pas et al., 2012) seek to select a subset of reviews
that maintain the statistical properties of the review
corpus. Yu et al. (Yu et al., 2013) seek to select a
small subset of high quality reviews which are opin-
ion diversified and cover a large set of attributes.
Zhuang et al (Zhuang et al., 2006) consider the prob-
lem of extracting the features on which movie review-
ers express their opinions and determining whether
the opinions are positive or negative. They propose
an effective multi-knowledge based solution.
Chen et al. (Chen et al., 1997) consider the prob-
lem of finding an optimal subset of features, that is,
from a large set of candidate features, selecting a sub-
365
Terlecky P., Jiang Y., Xu X., Bar-Noy A. and Govindan R..
Maximizing the Relevant Diversity of Social Swarming Information.
DOI: 10.5220/0004717503650372
In Proceedings of the 3rd International Conference on Sensor Networks (SENSORNETS-2014), pages 365-372
ISBN: 978-989-758-001-7
Copyright
c
2014 SCITEPRESS (Science and Technology Publications, Lda.)
set of features which are able to represent given ex-
amples (samples) consistently. They proves that the
problem of finding an optimal subset of features is
NP-hard, and presents a heuristic for solution.
Margules et al. (Margules et al., 1988) considers
the problem of selecting a set of vendors in the man-
ufacturing environment. This paper proposes a deci-
sion support approach to selecting vendors under the
conflicting criteria of minimizing the annual material
costs, reducing the number of suppliers and maximiz-
ing suppliers’ delivery and quality performances.
The context of our work is social swarming. This
context has also been the setting for the following
works. Liu et al. (Liu et al., 2012a) considered max-
imizing the credibility of social swarming informa-
tion. In (Liu et al., 2012b), Liu et al. were intersted
in maximizing the number of timely reports sent to
a commander. Jiang et al. (Jiang et al., 2013) devel-
oped a system called Mediascope for selective timely
retrieval of media from mobile devices.
2 MODEL
A commander has m reporters in the field collecting
data in various ways: photos, video, sound record-
ings, text. The commander would like to receive as
much information as possible on a particular event or
cirumstance. He would like the reporters to upload
some subset of their reports so that the total “infor-
mation” collected is maximized. Each report has a
particular size and there is a limited amount of band-
width assigned to each reporter. Given bandwidth
constraints, he would like to determine which reports
should be uploaded so as to maximize the total infor-
mation received. Here information is modeled as tags
which the reporters set to the reports. For example,
if a reporter takes a photo, he tags the images in the
photo. If a reporter reports via text, he includes key-
words as his tags. The tags are the elements in the
information universe I.
2.1 Single Format
First, let us assume all reports are of one format, and
let us assume that this format is photos. The infor-
mation universe is comprised of the elements in the
set I = {i
1
,...,i
n
}. A photo or information set J is
a subset of I. Assume there are m users. User ks
smartphone has u
k
photos. Denote the photo set by
P
k
. Let photo p
k j
denote photo j from user k. We wish
to select s
k
u
k
photo’s from user k, k {1,...,m}
such that we maximize the information obtained, i.e.
the cardinality of the union of the information sets se-
lected. We may assume without loss of generality that
s
k
= 1 for each k for if s
k
> 1: we may make s
k
copies
of user ks photo set and select one photo from each
copy and if s
k
= 0, we may ignore the photo set. We
call this problem the Single Format Maximum Infor-
mation Coverage Problem (SFINFOCOVER) and can
represent this problem by the following program.
max | p
k j
x
k j
|
s.t.
u
k
j=1
x
k j
1 k {1,. . . , m} (1)
x
k j
{0,1} k, j
(2)
It is represented as an integer program in the follow-
ing way:
max
n
i=1
y
i
s.t.
u
k
j=1
x
k j
1 k {1,. . . , m} (3)
y
i
ip
k j
x
k j
i {1,...,n} (4)
y
i
,x
k j
{0,1} i, j,k
(5)
The variables y
i
, i = 1, . . . , n are indicator variables for
selecting the i-th element. The variables x
k j
are indi-
cator variables for selecting the j-th set from photo
set k. Inequality 3 assures that at most one photo is
selected from each photo set. Inequality 4 guarantees
that if no set is selected containing element i, then el-
ement i is not selected.
2.2 Multi-format
We now turn our attention to the multi-format sce-
nario. In this scenario each reporter has a set of re-
ports of different formats i.e. video, audio, text, etc.
that can be uploaded to the commander. Let R
i j
de-
note report j from reporter i and let r
i
= {R
i1
,...,R
iu
i
}
be the collection of u
i
reports from reporter i. Let I =
{i
1
,...,i
n
} denote the information universe. Each re-
port R
i j
is a subset of the information universe R
i j
I,
for all i, j.
There is a size for each report which represents
the file size of that report. Let s(R
i j
) denote the size
of report R
i j
. Certainly reports of different formats
have different sizes. Video files tend to be much larger
than text files or photos, but they can also offer more
information. Even within a format, files can be of
different sizes. Consider video, audio, or text files.
SENSORNETS2014-InternationalConferenceonSensorNetworks
366
They may be of different duration or length and as the
duration or length of a file increase as does its size.
Reporter i has a fixed amount of bandwidth with
which to upload reports and this is a constraint on
the size of the reports that can be uploaded by a re-
porter. Let b
i
denote the total size which can be
uploaded by reporter i. Given the information uni-
verse I = {i
1
,...,i
n
}, bandwidth constraints b
1
...,b
m
for reporters r
1
,...,r
m
respectively, reports {R
i j
}, and
size of reports {s(R
i j
)}, which reports should be up-
loaded to maximize the number of obtained infor-
mation elements subject to the bandwidth constraints
b
1
...,b
m
?
This problem is called the Multi-Format
Maximum Information Coverage Problem
(MFINFOCOVER) and it can be represented by
the following IP.
max
n
i=1
y
i
s.t.
u
i
t=1
s(R
it
)x
it
b
i
i {1,...,m} (6)
y
i
iR
k j
x
k j
i {1,...,n}
y
i
,x
k j
{0,1} i, j,k
(7)
The significant difference of the IP for MFINFO-
COVER compared to the IP for SFINFOCOVER is in-
equality 6 which bounds the sum of the sizes of re-
ports uploaded by a reporter by the bandwidth avail-
able for that reporter.
2.3 Clique Model
In this subsection, we present the clique model. As-
sume the information universe is comprised of the el-
ements in the set I = {i
1
,...,i
n
}. Let R
i j
denote re-
port j from reporter i. In particular, R
i j
I. Rep-
resent each report R
i j
i, j by a vertex. Each ver-
tex R
i j
has a size denoted by s(R
i j
). The set of ver-
tices is partitioned into m classes r
1
,...,r
m
, where a
class represents a reporter’s set of reports. That is,
r
i
= {R
i1
,...,R
iu
i
} where u
i
is the number of reports
of reporter i. Each class r
i
has a capacity b
i
, cor-
responding to the bandwidth allocated to a reporter
to upload his reports. This report graph is a com-
plete graph with edge e
it, jk
having weight w
it, jk
. The
weight of an edge is the symmetric difference of the
information sets of the vertices connected by the edge.
That is,
w
it, jk
= |R
it
R
jk
| |R
it
R
jk
| (8)
This weight measures how distinct the two reports
are by counting the number of elements they differ
in collectively. The optimization objective is to select
a sub-clique of vertices in which the sum of the edge-
weights is maximum over all feasible sub-cliques. A
sub-clique is feasible if the sum of the sizes of all
vertices selected from a class is at most the capacity
of the class, for every class, i.e.
u
i
t=1
s(R
it
)x
it
b
i
,
i = 1,...,m. We call this problem the Clique Infor-
mation Coverage Problem (CLIQUEINFOCOVER).
We can formulate CLIQUEINFOCOVER as the fol-
lowing integer program.
max
it6= jk
w
it, jk
y
it, jk
s.t. y
it, jk
x
it
it 6= jk (9)
y
it, jk
x
jk
it 6= jk (10)
x
it
+ x
jk
y
it, jk
1 it 6= jk (11)
u
i
t=1
s(R
it
)x
it
b
i
i = 1, . ..,m (12)
x
it
{0,1} i
t
y
it, jk
{0,1} it 6= jk
In the integer program, variable x
it
is used as
the indicator variable for selecting vertex R
it
for the
clique. Similarly, variable y
it, jk
is used as the indi-
cator variable for selecting edge e
it, jk
for the clique.
Inequalities 9 and 10 ensure that edge e
it, jk
is not se-
lected if either vertex it or jk is not selected. Inequal-
ity 11 guarantees that y
it, jk
is selected if both both ver-
tices it and jk are selected. Inequality 12 ensures that
the total size of vertices chosen from class r
i
does not
exceed b
i
.
We analyze the integrality gap of the above integer
program. Consider the following example with n an
even integer. There are two classes, with b
1
= b
2
= 1
and I = {1, . . . , n}. Class 1 consists of the reports
{{1},{2},...,{n/2}} with each report being size 1.
Class 2 consists of the reports {{n/2 + 1}, . . . , {n}}
with each report also being of size 1. Note that any
two reports are disjoint. The optimal integer program-
ming solution chooses any set from class 1 and any
set from class 2 and obtains an objective value of 2. A
linear programming solution that selects x
i
= 1/(n/2)
for all i obtains an objective of 2 ·
n
2
· 1/(n/2) which
is 2(n 1). Thus, the integrality gap of such an inte-
ger program is at least n 1. This result implies that
rounding a linear programming relaxation is not very
amenable for solution and that a linear programming
upper bound on the optimal could be a very loose
upper bound. With this in mind, we also consider
a quadratic programming formulation of CLIQUEIN-
FOCOVER.
MaximizingtheRelevantDiversityofSocialSwarmingInformation
367
The following is a quadratic programming for-
mulation of CLIQUEINFOCOVER, it is similar to the
quadratic program presented in (Alidaee et al., 2007)
max
it6= jk
w
it, jk
x
it
x
jk
s.t.
u
i
t=1
s(R
it
)x
it
b
i
i = 1, . ..,m
x
it
{0,1} it
The intuition behind the optimization goal of se-
lecting a sub-clique with largest sum of edge-weights
is to select a subset of reports which have high pair-
wise distinction with the belief that this will maxi-
mize the total number of elements covered. Consider
the following instance: Let I = {1,...,n}, reporter 1
has reports {1,...,n} and {bn/2c 1} each of size
1 and reporter 2 has report {bn/2c, . . . , n} of size 1.
We have that b
1
= b
2
= 1 and thus each reporter can
upload exactly one report. The reports which maxi-
mize the sum of dissimilarity are reports {bn/2c 1}
and {bn/2c,...,n} giving a dissimilarity of bn/2c+1,
while the reports which maximize the amount of in-
formation obtained are {1,...,n} and {bn/2c, . . . , n}
as they give the full information set I. The difference
in obtained information is bn/2c 2 elements.
3 ALGORITHMS AND
HEURISTICS
In this section we provide approximation algorithms
and heuristics for the problems. It follows that de-
fined problems are NP-Hard. The hardness of SFIN-
FOCOVER and MFINFOCOVER follow from a trivial
reduction from the Maximum Coverage problem. It
also holds that CLIQUEINFOCOVER is NP-hard even
for one class with 0/1 edge weights. The hardness of
CLIQUEINFOCOVER follows from a reduction from
the Maximum Clique problem. It also holds that there
is no constant factor approximation for CLIQUEINFO-
COVER, from this reduction.
The following greedy approximation algorithm
called GREEDY for SFINFOCOVER gives a 2-
approximation. The analysis is given in Chekuri and
Kumar (Chekuri and Kumar, 2004). The idea behind
GREEDY is as follows: for each photo set from which
a photo has not already been selected, select the photo
which covers the maximum number of uncovered el-
ements, breaking ties arbitrarily. For a given round,
select the photo out of these maximal photos which
covers the maximum number of uncovered elements.
Remove these elements from consideration in future
rounds, and remove this photo set from consideration
in future rounds.
Algorithm 1: GREEDY.
1: H
/
0, I
0
I, S
/
0
2: while S 6= {1,.. .,m} and I
0
6=
/
0 do
3: for all Reporters k {1,.. . ,m} do
4: if a k 6∈ S then
5: l argmax
j
{|p
k j
I
0
|}
6: G
k
p
kl
7: else
8: G
k
/
0
9: end if
10: end for
11: c argmax
i
|G
i
|
12: H H {G
c
}, S S {c}, I
0
I
0
G
c
13: end while
OUTPUT: H, I I
0
Next, we present an approximation algorithm for
MFINFOCOVER called MFGREEDY. It is a 12-
approximate algorithm (Chekuri and Kumar, 2004).
This algorithm adds reports greedily, in the following
sense. In a given round, it computes the coverage per
size of every remaining feasible report. It adds a fea-
sible report with the largest coverage per size.
Algorithm 2: MFGREEDY.
1: H
/
0, I
0
I, b
0
k
b
k
k {1, ..., m}, G
c
= R
11
2: while G
c
6=
/
0 do
3: for all Reporters k {1,.. . ,m} do
4: for all reports R
kt
do
5: if s(R
kt
) > b
0
k
then
6: R
kt
/
0
7: end if
8: l argmax
j
{|R
k j
I
0
|/s(R
k j
)}
9: G
k
R
kl
10: end for
11: end for
12: c argmax
i
|G
i
I
0
|/s(G
i
)
13: H H {G
c
}, I
0
I
0
G
c
, b
0
r
b
0
r
s(G
c
)
14: end while
OUTPUT: H, I I
0
We propose the following heuristic for CLIQUE-
INFOCOVER which we call CLIQUE-MAXSUM. For
each reporter i, for each report R
it
, t {1,. . . , u
i
} be-
longing to reporter i, compute the ratio of sum of edge
weights to size of the report. Until capacity b
i
would
be exceeded, add the feasible report with largest ratio
from reporter i (and remove this report from r
i
).
SENSORNETS2014-InternationalConferenceonSensorNetworks
368
Algorithm 3: CLIQUE-MAXSUM.
1: C
/
0
2: for all reporters i {1,.. .,m} do
3: for all Reports R
it
, t {1, ... ,u
i
} do
4: S
it
jk
w
it, jk
5: Compute the ratio k
it
= S
it
/s(R
it
)
6: end for
7: while there is a report which can be added within
capacity b
i
do
8: C C {feasible report R
it
with maximum k
it
}
9: Remove R
it
from r
i
10: end while
11: end for
OUTPUT: C and sum of edge-weights of C
4 SIMULATIONS
In the simulation environment, we are interested in
comparing the performance of the SFINFOCOVER
model to the CLIQUEINFOCOVER model. We evalu-
ate the CLIQUE-MAXSUM (abbreviated by CLIQUE)
and GREEDY algorithms as well as an LP-relaxation
(LP) for the SFINFOCOVER IP for randomly gener-
ated instances. LP acts as an upper bound on the op-
timal solution and is therefore a good scalable bench-
mark for comparison. The following parameters are
varied in the simulations: the number of reporters,
number of elements n in the information universe I,
the number of photos per reporter, and the probability
p that an element is in a photo.
In Simulation 1, we set the number of reporters to
3 with each reporter have 3 photos. The number of
items is varied from 1000 to 1050 and for each n 10
runs are performed. In a run, for each photo, an item
appears with probability 0.5. GREEDY outperforms
CLIQUE on average by about 5 items.
In Simulation 2, the number of reporters is 4, and
each reporter has 2 photos from which at most one can
be selected. The number of items varies from 2000 to
2100 with 10 random runs being performed for each
n. In a run, for each photo, an item appears with
probability 0.5. GREEDY outperforms CLIQUE with
GREEDY covering on average about 96% of items.
In Simulation 3 the number of reporters is varied
from 1 to 5. There are 20 items and 2 photos per
reporter. With both the GREEDY and CLIQUE algo-
rithms, the number of items obtained increases as the
number of reporters increases. GREEDY outperforms
CLIQUE in the number of items selected with both
algorithms converging to covering all 20 items when
n = 5. For a run, an item of a photo has a probability
of .5 of appearing.
In Simulation 4, the total number of items is 40,
and there are two photos per reporter. The number
of reporters varies from 1 to 6. GREEDY outperforms
CLIQUE in the number of items selected, but the out-
performance is minor. We see that when there is 1
reporter both algorithms cover on average 20 items.
This is expected as an item for a photo has a probabil-
ity of .5 of appearing.
In Simulation 5, the number of items is varied
from 20 to 40. There are 3 reporters and 2 photos per
reporter. The linear programming relaxation of the IP
is also simulated along with GREEDY and CLIQUE.
GREEDY and CLIQUE perform quite well to the LP
which is an upper bound on the actual IP solution.
GREEDY yet again outperforms CLIQUE, but the out-
performance is quite minor. There were 15 runs for
each n, and for a run, an item of a photo has a proba-
bility of .5 of appearing.
In Simulation 6, we set the number of reporters to
3 with 2 photos per reporter. The number of items
is varied from 100 to 140. A photo matrix is ran-
domly generate with the probability of an item ap-
pearing in a photo of .5. All three algorithms are run
twenty times for a given n. GREEDY once again out-
performs CLIQUE, with GREEDY being on average
about 4 items from the LP upperbound on the opti-
mal.
In Simulation 7, we vary the number of photos per
reporter from 1 to 10. We hold constant the number of
reporters at 3 and the number of items at 20. The num-
ber of runs for each p for each algorithm is 30. We see
that the performance of CLIQUE falls off drastically
from the performance of the LP and GREEDY. The
performance of CLIQUE maintains a mean of about
15 over all p, where both LP and GREEDY increase as
p increases with GREEDY covering on average about
18 items and LP covering about 20 with p = 10.
In Simulation 8 we vary n from 100 to 140 setting
the probability of an item being in a photo to be .2.
For each n, there is 100 runs for each of the 3 algo-
rithms. The number of reporters is 3 and the number
of photos per reporter is 2. GREEDY and CLIQUE
perform comparably with CLIQUE for some n barely
outperforming GREEDY on average. Both algorithms
select on average about 2 items less than the LP.
In Simulation 9, we vary the probability of an item
being included in a photo from 0 to 1 by 0.05. We
hold fixed the number of items at 4, the number of
reporters at 4, and the number of photos per reporter
at 3. We see that both LP and GREEDY converge to
covering all 4 items at a probability of .4. CLIQUE
takes a longer time to converge to 4. It converges at a
probability of .8.
In Simulation 10, the probability that an item is
included in a photo is varied from 0 to 1 by 0.05.
The number of items is 10, the number of reporters
MaximizingtheRelevantDiversityofSocialSwarmingInformation
369
1000 1005 1010 1015 1020 1025 1030 1035 1040 1045 1050
970
980
990
1000
1010
1020
1030
n
# of elements covered
Simulation with m=5, p=3, #runs for each n is 10, photo signiture uniformly generated
greedy
clique
Figure 1: Simulation 1.
2000 2010 2020 2030 2040 2050 2060 2070 2080 2090 2100
1860
1880
1900
1920
1940
1960
1980
2000
n
# of elements covered
Simulation with m=4, p=2, number of runs=10, photo signature uniformly generated
greedy
clique
Figure 2: Simulation 2.
1 1.5 2 2.5 3 3.5 4 4.5 5
10
11
12
13
14
15
16
17
18
19
20
m
# of elements covered
Simulation with n=20, p=2, photo signature uniformly generated
greedy
clique
Figure 3: Simulation 3.
1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6
15
20
25
30
35
40
m
# of elements covered
Simulation with n=40, p=2, photo signature uniformly generated
greedy
clique
Figure 4: Simulation 4.
20 22 24 26 28 30 32 34 36 38 40
15
20
25
30
35
40
n
# of elements covered
Simulation with m=3, p=2
greedy
clique
lp−set
Figure 5: Simulation 5.
100 105 110 115 120 125 130 135 140
85
90
95
100
105
110
115
120
125
130
135
n
# of elements covered
Simulation with m=3, p=2
greedy
clique
lp−set
Figure 6: Simulation 6.
1 2 3 4 5 6 7 8 9 10
15
15.5
16
16.5
17
17.5
18
18.5
19
19.5
20
p
# of elements covered
Simulation where n=20, m=2, #runs=30 for each p
greedy
clique
lp−set
Figure 7: Simulation 7.
100 105 110 115 120 125 130 135 140
50
55
60
65
70
75
80
n
# of elements covered
100 runs, pr=.2
greedy
clique
lp−set
Figure 8: Simulation 8.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.5
1
1.5
2
2.5
3
3.5
4
pr
# of elements covered
n=4,p=3,m=4, Probability that an item is in a set varies from 0 to 1
greedy
clique
lp−set
Figure 9: Simulation 9.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
1
2
3
4
5
6
7
8
9
10
pr
# of elements covered
n=10,p=4,m=4,nruns=15 for each probability
greedy
clique
lp−set
Figure 10: Simulation 10.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
5
10
15
20
25
30
pr
# of elements covered
n=3−,p=3,m=4, #runs for each pr=15
greedy
clique
lp−set
Figure 11: Simulation 11.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
5
10
15
20
25
30
pr
# of elements covered
m=3,p=3, n=30, number of runs=15
greedy
clique
lp−set
Figure 12: Simulation 12.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
10
20
30
40
50
60
70
80
pr
# of elements covered
n=80, m=8,p=10
greedy
clique
lp−set
Figure 13: Simulation 13.
1 2 3 4 5 6 7 8 9 10
22
23
24
25
26
27
28
29
30
p
# of elements covered
n=30, m=2, #runs=30
greedy
clique
lp−set
Figure 14: Simulation 14.
is 4, and there are 4 photos per reporter. The num-
ber of runs for each probability is 15. We see that the
performance of GREEDY is quite close to the perfor-
mance of LP. LP converges to covering all items at a
probability of about 0.4. GREEDY converges to cov-
ering all items at a probability of about .5. CLIQUE
converges to covering all items at a probability of .9.
CLIQUEs performance matches LP and GREEDY for
probabilities of 0 to .3, but then its performance falls
off slightly.
SENSORNETS2014-InternationalConferenceonSensorNetworks
370
In Simulation 11, we set the number of items to be
covered to 30. The number of reporters is 4 and each
reporter has 3 photos from which he needs to choose
one from. The probability of a item being included
in a photo varies from 0 to 1 by .05. The number of
runs for each probability is 15. LP converges to cov-
ering all items at a probabiliy of about .5. GREEDY
converges to covering all items at a probability of .6.
CLIQUE converges to covering all items at a probabil-
ity of 0.85. While its performance is on par with LP
and GREEDY when the probability is between 0 and
.4, it underperforms both from .4 to .85.
In Simulation 12, we set the number of items to be
covered to 30. The number of reporters is 3 and each
reporter has 3 photos from which he needs to choose
one from. The probability of a item being included
in a photo varies from 0 to 1 by .05. The number of
runs for each probability is 15. LP converges to cov-
ering all items at a probabiliy of about .5. GREEDY
converges to covering all items at a probability of .6.
CLIQUE converges to covering all items at a probabil-
ity of 0.85. While its performance is on par with LP
and GREEDY when the probability is between 0 and
.4, it underperforms both from .4 to .85.
In Simulation 13, we once again vary probability
that an item is included in a photo from 0 to 1 by
.05. This time there are 80 total items, the number of
reporters is 8 and the number of photos per reporter
is 10. All three algorithms converge to covering all
items rather quickly, with LP converging at a proba-
bility of about .15, GREEDY converging at a probabil-
ity of about .25 and CLIQUE converging at a probabil-
ity of about .4. A large gap of about 7 items on aver-
age can be seen in this simulation between GREEDY
and CLIQUE at a probability of about 0.2.
In Simulation 14, we vary the number of photos
per reporter. We fix the number of reporters to 2 and
the number of items to be covered to 30. We set the
probability of an item appearing in a photo to be .6.
We run each of the three algorithms 30 times for a
given number of photos per reporter. We see the per-
formance of CLIQUE falls off significantly in com-
parison to the other two algorithms as p increases, se-
lecting on average 23 items over all p, and remaining
relatively flat on average.
From these simulations, we see that GREEDY
outperforms CLIQUE with GREEDY performing very
well compared to an LP-relaxation upper bound on
the optimal solution.
5 CONCLUSIONS AND FUTURE
WORK
We propose two novel models, a maximum cover-
age based model and a clique based model for mod-
eling the problem of maximizing the amount of rel-
evant diversity of social swarming data received by
a commander. We provide well-performing algo-
rithms/heuristics for both models with the maximum
coverage algorithm outperforming the clique heuris-
tic. In future work, we look to experimentally ana-
lyze the multiple-format models and develop a physi-
cal system based on our models.
ACKNOWLEDGEMENTS
Research was sponsored by the Army Research Lab-
oratory and was accomplished under Cooperative
Agreement Number W911NF-09-2-0053. The views
and conclusions contained in this document are those
of the authors and should not be interpreted as rep-
resenting the official policies, either expressed or im-
plied, of the Army Research Laboratory or the U.S.
Government. The U.S. Government is authorized
to reproduce and distribute reprints for Government
purposes notwithstanding any copyright notation here
on.
REFERENCES
Alidaee, B., Glover, F., Kochenberger, G., and Wang,
H. (2007). Solving the maximum edge weight
clique problem via unconstrained quadratic program-
ming. European journal of operational research,
181(2):592–597.
Chekuri, C. and Kumar, A. (2004). Maximum cover-
age problem with group budget constraints and ap-
plications. Approximation, Randomization, and Com-
binatorial Optimization. Algorithms and Techniques,
pages 72–83.
Chen, B., HONG, J., and WANG, Y. (1997). The prob-
lem of finding optimal subset of features. CHINESE
JOURNAL OF COMPUTERS-CHINESE EDITION-,
20:133–138.
Dijkhuizen, G. and Faigle, U. (1993). A cutting-plane ap-
proach to the edge-weighted maximal clique prob-
lem. European Journal of Operational Research,
69(1):121–130.
Hunting, M., Faigle, U., and Kern, W. (2001). A lagrangian
relaxation approach to the edge-weighted clique prob-
lem. European Journal of Operational Research,
131(1):119–131.
Jiang, Y., Xu, X., Terlecky, P., Abdelzaher, T. F., Bar-Noy,
A., and Govindan, R. (2013). Mediascope: selective
MaximizingtheRelevantDiversityofSocialSwarmingInformation
371
on-demand media retrieval from mobile devices. In
IPSN, pages 289–300.
Lappas, T., Crovella, M., and Terzi, E. (2012). Select-
ing a characteristic set of reviews. In Proceedings
of the 18th ACM SIGKDD international conference
on Knowledge discovery and data mining, pages 832–
840. ACM.
Liu, B., Terlecky, P., Bar-Noy, A., Govindan, R., Neely,
M. J., and Rawitz, D. (2012a). Optimizing infor-
mation credibility in social swarming applications.
IEEE Transactions on Parallel and Distributed Sys-
tems, 23(6):1147–1158.
Liu, B., Terlecky, P., Xu, X., Bar-Noy, A., Govindan, R.,
and Rawitz, D. (2012b). Timely report delivery in so-
cial swarming applications. In DCOSS, pages 75–82.
Macambira, E. and De Souza, C. (2000). The edge-
weighted clique problem: valid inequalities, facets
and polyhedral computations. European Journal of
Operational Research, 123(2):346–371.
Margules, C., Nicholls, A., and Pressey, R. (1988). Select-
ing networks of reserves to maximise biological diver-
sity. Biological conservation, 43(1):63–76.
Park, K., Lee, K., and Park, S. (1996). An extended formu-
lation approach to the edge-weighted maximal clique
problem. European Journal of Operational Research,
95(3):671–682.
Tsaparas, P., Ntoulas, A., and Terzi, E. (2011). Select-
ing a comprehensive set of reviews. In Proceedings
of the 17th ACM SIGKDD international conference
on Knowledge discovery and data mining, pages 168–
176. ACM.
Yu, W., Zhang, R., He, X., and Sha, C. (2013). Selecting
a diversified set of reviews. In Web Technologies and
Applications, volume 7808 of Lecture Notes in Com-
puter Science, pages 721–733.
Zhuang, L., Jing, F., and Zhu, X.-Y. (2006). Movie re-
view mining and summarization. In Proceedings of
the 15th ACM international conference on Informa-
tion and knowledge management, CIKM ’06, pages
43–50.
SENSORNETS2014-InternationalConferenceonSensorNetworks
372