SEMANTIC IDENTIFICATION AND VISUALIZATION OF SIGNIFICANT WORDS WITHIN DOCUMENTS - Approach to Visualize Relevant Words within Documents to a Search Query by Word Similarity Computation

Karolis Kleiza, Patrick Klein, Klaus-Dieter Thoben

Abstract

This paper gives at first an introduction to similarity computation and text summarization of documents by usage of a probabilistic topic model, especially Latent Dirichlet Allocation (LDA). Afterwards it provides a discussion about the need of a better understanding for the reason and transparency at all for the end-user why documents with a computed similarity actually are similar to a given search query. The authors propose for that an approach to identify and highlight words with respect to their semantic relevance directly within documents and provide a theoretical background as well as an adequate visual assignment for that approach.

References

  1. Bahrs, J. et al., 2007. Wissensmanagement in der Praxis - Ergebnisse einer empirischen Untersuchung: Empirische Studien in der Wirtschaftsinformatik 1. ed., Gito.
  2. Blei, D.M., Ng, A.Y. & Jordan, M.I., 2003. Latent dirichlet allocation. The Journal of Machine Learning Research, 3, 993-1022.
  3. Deerwester, S. et al., 1990. Indexing by latent semantic analysis. Journal of the American Sociaty for informations science, 41(6), 391-407.
  4. Dredze, M. et al., 2008. Generating summary keywords for emails using topics. In Proceedings of the 13th international conference on Intelligent user interfaces. Gran Canaria, Spain, pp. 199-206.
  5. Gong, Y. & Liu, X., 2001. Generic text summarization using relevance measure and latent semantic analysis. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval. New Orleans, United States, pp. 19-25.
  6. Griffiths, T.L. et al., 2005. Integrating topics and syntax. Advances in neural information Processing Systems, 17, 537-544.
  7. Hofmann, T., 1999. Probabilistic Latent Semantic Analysis. Proceedings of uncertainty in artificial intelligence, 289-296.
  8. Kleiza, K. et al., 2010. Integrated Semantic Search in the Product Development Phase. to be published in Proceedings of the 16th International Conference on Concurrent Enterprising.
  9. Liu, S. et al., 2009. Interactive, topic-based visual text summarization and analysis. In Proceeding of the 18th ACM conference on Information and knowledge management. Hong Kong, China, pp. 543-552.
  10. Salton, G. & Buckley, C., 1988. Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5), 513-523.
  11. Steyvers, M. & Griffiths, T., 2007. Probabilistic topic models. Handbook of Latent Semantic Analysis, 424- 440.
  12. Wang, D. et al., 2008. Multi-document summarization via sentence-level semantic analysis and symmetric matrix factorization. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval. Singapore, pp. 307-314.
  13. Wei, X. & Croft, W.B., 2006. LDA-based document models for ad-hoc retrieval. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. Seattle, USA, pp. 178-185.
Download


Paper Citation


in Harvard Style

Kleiza K., Klein P. and Thoben K. (2010). SEMANTIC IDENTIFICATION AND VISUALIZATION OF SIGNIFICANT WORDS WITHIN DOCUMENTS - Approach to Visualize Relevant Words within Documents to a Search Query by Word Similarity Computation . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2010) ISBN 978-989-8425-28-7, pages 481-486. DOI: 10.5220/0003099004810486


in Bibtex Style

@conference{kdir10,
author={Karolis Kleiza and Patrick Klein and Klaus-Dieter Thoben},
title={SEMANTIC IDENTIFICATION AND VISUALIZATION OF SIGNIFICANT WORDS WITHIN DOCUMENTS - Approach to Visualize Relevant Words within Documents to a Search Query by Word Similarity Computation},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2010)},
year={2010},
pages={481-486},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003099004810486},
isbn={978-989-8425-28-7},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2010)
TI - SEMANTIC IDENTIFICATION AND VISUALIZATION OF SIGNIFICANT WORDS WITHIN DOCUMENTS - Approach to Visualize Relevant Words within Documents to a Search Query by Word Similarity Computation
SN - 978-989-8425-28-7
AU - Kleiza K.
AU - Klein P.
AU - Thoben K.
PY - 2010
SP - 481
EP - 486
DO - 10.5220/0003099004810486