VISUALLY SUMMARIZING THE EVOLUTION OF DOCUMENTS UNDER A SOCIAL TAG

André Gohr, Myra Spiliopoulou, Alexander Hinneburg

Abstract

Tags are intensively used in social platforms to annotate resources: Tagging is a social phenomenon, because users do not only annotate to organize their resources but also to associate semantics to resources contributed by third parties. This leads often to semantic ambiguities: Popular tags are associated with very disparate meanings, even to the extend that some tags (e.g. ”beautiful” or ”toread”) are irrelevant to the semantics of the resources they annotate. We propose a method that learns a topic model for documents under a tag and visualizes the different meanings associated with the tag. Our approach deals with the following problems. First, tag miscellany is a temporal phenomenon: tags acquire multiple semantics gradually, as users apply them to disparate documents. Hence, our method must capture and visualize the evolution of the topics in a stream of documents. Second, the meanings associated to a tag must be presented in a human-understandable way; This concerns both the choice of words and the visualization of all meanings. Our method uses AdaptivePLSA, a variation of Probabilistic Latent Semantic Analysis for streams, to learn and adapt topics on a stream of documents annotated with a specific tag. We propose a visualization technique called Topic Table to visualize document prototypes derived from topics and their evolution over time. We show by a case study how our method captures the evolution of tags selected as frequent and ambiguous, and visualizes their semantics in a comprehensible way. Additionally, we show the effectiveness by adding alien resources under a tag. Our approach indeed visualizes hints to the added documents.

References

  1. AlSumait, L., Barbara, D., and Domeniconi, C. (2008). On-line LDA: adaptive topic models for mining text streams with applications to topic detection and tracking. In ICDM.
  2. Blei, D. and Lafferty, J. (2006). Dynamic topic models. In ICML.
  3. Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003). Latent dirichlet allocation. J. Mach. Learn. Res., 3:993- 1022.
  4. Boyd-Graber, J., Chang, J., Gerrish, S., Wang, C., and Blei, D. (2009). Reading tea leaves: How humans interpret topic models. In Neural Information Processing Systems (NIPS).
  5. Chou, T.-C. and Chen, M. C. (2008). Using incremental PLSI for threshold-resilient online event analysis. IEEE Trans. on Knowl. and Data Eng., 20(3):289- 299.
  6. Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39(1):1-38.
  7. Gohr, A., Hinneburg, A., Schult, R., and Spiliopoulou, M. (2009). Topic evolution in a stream of documents. In SIAM Data Mining Conf. (SDM'09), pages 378-385, Reno, CA.
  8. Goyal, A., Bonchi, F., and Lakshmanan, L. V. (2008). Discovering leaders from community actions. In CIKM'08, pages 499-508, Napa Valley, CA, USA. ACM.
  9. Guha, S., Meyerson, A., Mishra, N., Motwani, R., and O'Callaghan, L. (2003). Clustering data streams: Theory and practice. IEEE Trans. of Knowlende and Data Eng., 15(3):515-528.
  10. Havre, S., Hetzler, E., Whitney, P., and Nowell, L. (2002). Themeriver: Visualizing thematic changed in large document collections. IEEE Trans. Visualization and Computer Graphics, 8(1):9-20.
  11. Hofmann, T. (2001). Unsupervised learning by probabilistic latent semantic analysis. Machine Learning, 42(1):177-196.
  12. Mei, Q., Shen, X., and Zhai, C. (2007). Automatic labeling of multinomial topic models. In KDD, pages 490-499.
  13. Mei, Q. and Zhai, C. (2005). Discovering evolutionary theme patterns from text: an exploration of temporal text mining. In SIGKDD, pages 198-207, New York, NY, USA. ACM.
  14. Moxley, E., Kleban, J., Xu, J., and Manjunath, B. S. (2009). Not all tags are created equal: learning flickr tag semantics for global annotation. In ICME'09: Proceedings of the 2009 IEEE international conference on Multimedia and Expo, pages 1452-1455, Piscataway, NJ, USA. IEEE Press.
  15. Suchanek, F. M., Vojnovic, M., and Gunawardena, D. (2008). Social tags: Meanings and suggestions. In CIKM'08, pages 223-232, Napa Valley, CA, USA. ACM.
  16. W. S. Cleveland (1985, 1994). The Elements of Graphing Data. Hobart Press, Summit, New Jersey, U.S.A.
  17. Wang, X. and McCallum, A. (2006). Topics over time: a non-markov continuous-time model of topical trends. In SIGKDD, pages 424-433. ACM.
Download


Paper Citation


in Harvard Style

Gohr A., Spiliopoulou M. and Hinneburg A. (2010). VISUALLY SUMMARIZING THE EVOLUTION OF DOCUMENTS UNDER A SOCIAL TAG . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2010) ISBN 978-989-8425-28-7, pages 85-94. DOI: 10.5220/0003096100850094


in Bibtex Style

@conference{kdir10,
author={André Gohr and Myra Spiliopoulou and Alexander Hinneburg},
title={VISUALLY SUMMARIZING THE EVOLUTION OF DOCUMENTS UNDER A SOCIAL TAG},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2010)},
year={2010},
pages={85-94},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003096100850094},
isbn={978-989-8425-28-7},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2010)
TI - VISUALLY SUMMARIZING THE EVOLUTION OF DOCUMENTS UNDER A SOCIAL TAG
SN - 978-989-8425-28-7
AU - Gohr A.
AU - Spiliopoulou M.
AU - Hinneburg A.
PY - 2010
SP - 85
EP - 94
DO - 10.5220/0003096100850094