7 CONCLUSIONS
Generating word clouds from social streams is a dif-
ficult task; users often discuss the same entity using
multiple aliases. This leads to a direct degradation in
the utility of word clouds for accessing this complex
source of data. We proposed a technique that groups
aliases of the same entity and represents them with
a canonical term. The method improves the cover-
age of word clouds and access to the relevant content.
Due to the imperfect nature of state-of-the-art named
entity recognition methods, redundancy of terms in
word clouds is often increased. Therefore, it is nec-
essary to apply a method for diversifying terms. In
this work, we found that the proposed technique not
only significantly decreased redundancy but also at-
tained significantly higher coverage than the baseline
word cloud generation method, leading to better word
clouds and therefore improved information access.
An extrinsic user evaluation supported our hy-
pothesis that word clouds with grouped named enti-
ties are significantly more relevant and diverse than
word clouds with no entity grouping. Further, word
clouds with grouped named entities that attain higher
levels of MAP are more likely to be rated as relevant
by users.
Finally, it was shown that the previously-proposed
MAP metric for automatic cloud evaluation predicts
extrinsic human evaluations of cloud quality. Thus,
when designing word clouds, the MAP metric should
be used as a quality predictor of the cloud generation
technique, enabling automatic assessment of word
cloud quality without a human in the loop.
ACKNOWLEDGEMENTS
This work was partially supported by the European
Union under grant agreement No. 611233 PHEME.
REFERENCES
Augenstein, I., Gentile, A. L., Norton, B., Zhang, Z., and
Ciravegna, F. (2013). Mapping keywords to linked
data resources for automatic query expansion. In Pro-
ceedings of the Second International Workshop on
Knowledge Discovery and Data Mining Meets Linked
Open Data, pages 9–20.
Bernstein, M. S., Suh, B., Hong, L., Chen, J., Kairam, S.,
and Chi, E. H. (2010). Eddi: interactive topic-based
browsing of social status streams. In Proceedings of
the 23nd annual ACM symposium on User interface
software and technology, pages 303–312. ACM.
Bollacker, K., Evans, C., Paritosh, P., Sturge, T., and Taylor,
J. (2008). Freebase: a collaboratively created graph
database for structuring human knowledge. In Pro-
ceedings of SIGMOD, pages 1247–1250. ACM.
Derczynski, L., Maynard, D., Aswani, N., and Bontcheva,
K. (2013). Microblog-genre noise and impact on se-
mantic annotation accuracy. In Proceedings of the
24th ACM Conference on Hypertext and Social Me-
dia, pages 21–30. ACM.
Derczynski, L., Maynard, D., Rizzo, G., van Erp, M., Gor-
rell, G., Troncy, R., Petrak, J., and Bontcheva, K.
(2015). Analysis of named entity recognition and link-
ing for tweets. Information Processing & Manage-
ment, 51(2):32–49.
Finin, T., Murnane, W., Karandikar, A., Keller, N., Mar-
tineau, J., and Dredze, M. (2010). Annotating
named entities in twitter data with crowdsourcing.
In Proceedings of the Workshop on Creating Speech
and Language Data with Amazon’s Mechanical Turk,
pages 80–88. ACL.
Han, B. and Baldwin, T. (2011). Lexical normalisation of
short text messages: Makn sens a# twitter. In Pro-
ceedings of ACL, pages 368–378. ACL.
Hogan, A., Zimmermann, A., Umbrich, J., Polleres, A., and
Decker, S. (2012). Scalable and distributed methods
for entity matching, consolidation and disambiguation
over linked data corpora. Web Semantics: Science,
Services and Agents on the World Wide Web, 10:76–
110.
Kuo, B. Y., Hentrich, T., Good, B. M., and Wilkinson, M. D.
(2007). Tag clouds for summarizing web search re-
sults. In Proceedings of WWW, pages 1203–1204.
ACM.
Lage, R., Dolog, P., and Leginus, M. (2014). The role of
adaptive elements in web-based surveillance system
user interfaces. In Dimitrova, V., Kuflik, T., Chin, D.,
Ricci, F., Dolog, P., and Houben, G.-J., editors, User
Modeling, Adaptation, and Personalization, volume
8538 of Lecture Notes in Computer Science, pages
350–362. Springer International Publishing.
Leginus, M., Dolog, P., and Lage, R. (2013). Graph based
techniques for tag cloud generation. In Proceedings
of the 24th ACM Conference on Hypertext and Social
Media, pages 148–157. ACM.
Leginus, M., Zhai, C., and Dolog, P. (2015). Personalized
generation of word clouds from tweets. Journal of the
Association for Information Science and Technology.
Manning, C. D., Raghavan, P., and Sch
¨
utze, H. (2008). In-
troduction to information retrieval, volume 1. Cam-
bridge university press Cambridge.
Maynard, D. and Greenwood, M. A. (2014). Who cares
about sarcastic tweets? Investigating the impact of
sarcasm on sentiment analysis. In Proceedings of
LREC 2014, Reykjavik, Iceland. ELRA.
McCreadie, R., Soboroff, I., Lin, J., Macdonald, C., Ounis,
I., and McCullough, D. (2012). On building a reusable
twitter corpus. In Proceedings of SIGIR, pages 1113–
1114. ACM.
Mei, Q., Guo, J., and Radev, D. (2010). Divrank: the inter-
WEBIST2015-11thInternationalConferenceonWebInformationSystemsandTechnologies
192