Table 1: Jaccard-based, Pair-wise similarity measure for
predicates concerning John Wayne on the imdb dataset.
actedIn directed 0.100
actedIn hasWebsite 0.101
bornOnDate actedIn 0.147
bornOnDate directed 0.055
bornOnDate hasChild 0.026
bornOnDate hasWebsite 0.095
bornOnDate hasWonPrize 0.086
bornOnDate isMarriedTo 0.133
directed hasWebsite 0.025
hasChild actedIn 0.008
hasChild directed 0.005
hasChild hasWebsite 0.013
hasChild hasWonPrize 0.063
hasWonPrize actedIn 0.028
hasWonPrize directed 0.027
hasWonPrize hasWebsite 0.027
isMarriedTo actedIn 0.036
isMarriedTo directed 0.016
isMarriedTo hasChild 0.083
isMarriedTo hasWebsite 0.033
isMarriedTo hasWonPrize 0.135
and b =“directed” (i.e. entities that are both actors
and directors) and that there are 1000 entities in the
dataset that are incident with “actedIn” or “directed”
predicate (actors or directors). The similarity measure
in such case would have value of: sim(a, b) = 0.2.
Table 1 presents the values of similarity measure
computed in this way on the imdb dataset for all pairs
of predicates incident with the entity JohnWayne. For
this example, the optimal layout computed by max-
imising the vim measure and using the sim measure
as described is presented on Figure 2. This example
is given only as an illustration of the discussed con-
cepts and definitely leaves a lot of room for improve-
ments. For example, we observed that the proposed
simple sim similarity measure gives un-intuitive val-
ues for some pairs (e.g. “isMarriedTo”, “hasWon-
Prize”), however is generally very promising, consid-
ering its simplicity. Also the proposed variant of the
vim visual integrity measure should be treated only as
a basis for further improvements.
3 CONCLUSIONS
We introduced a novel notion of integrity in the con-
text of diversity-aware information selection and vi-
sualisation tasks and illustrated it on an example of
semantic entity summarisation problem. As diversity-
awareness has proved to be an important approach in
many applications we argue that integrity-awareness
is a necessary next step to improve this approaches.
Figure 2: Optimal layout of graphical entity summary of
John Wayne computed on imdb knowledge graph with k=7.
Integrity measure for this layout L: V IM(L) = 0.62.
ACKNOWLEDGEMENTS
The work is supported by Polish National Science
Centre 2012/07/B/ST6/01239 ”DISQUSS” grant.
Thanks are due to M.Andruch
´
ow for computing the
example in Table 1 and Figure 2.
REFERENCES
Agrawal, R., Gollapudi, S., Halverson, A., and Ieong, S.
(2009). Diversifying search results. In Proceedings
of the Second ACM International Conference on Web
Search and Data Mining, WSDM ’09, pages 5–14,
New York, NY, USA. ACM.
Carbonell, J. and Goldstein, J. (1998). The use of mmr,
diversity-based reranking for reordering documents
and producing summaries. In Proceedings of the 21st
annual international ACM SIGIR conference on Re-
search and development in information retrieval, SI-
GIR ’98, pages 335–336, New York, NY, USA. ACM.
Chen, H. and Karger, D. R. (2006). Less is more: prob-
abilistic models for retrieving fewer relevant docu-
ments. In Proceedings of the 29th annual interna-
tional ACM SIGIR conference on Research and de-
velopment in information retrieval, SIGIR ’06, pages
429–436, New York, NY, USA. ACM.
Clarke, C. L., Kolla, M., Cormack, G. V., Vechtomova, O.,
Ashkan, A., B
¨
uttcher, S., and MacKinnon, I. (2008).
Novelty and diversity in information retrieval evalua-
tion. In Proceedings of the 31st annual international
ACM SIGIR conference on Research and development
in information retrieval, SIGIR ’08, pages 659–666,
New York, NY, USA. ACM.
Gollapudi, S. and Sharma, A. (2009). An axiomatic ap-
proach for result diversification. In Proceedings of
the 18th international conference on World wide web,
WWW ’09, pages 381–390, New York, USA. ACM.
Sydow, M., Pikula, M., and Schenkel, R. (2013). The no-
tion of diversity in graphical entity summarisation on
TowardsIntegrityinDiversity-awareSmallSetSelectionandVisualisationTasks
483