SEMANTIC IDENTIFICATION AND VISUALIZATION OF SIGNIFICANT WORDS WITHIN DOCUMENTS - Approach to Visualize Relevant Words within Documents to a Search Query by Word Similarity Computation

Karolis Kleiza, Patrick Klein, Klaus-Dieter Thoben



This paper gives at first an introduction to similarity computation and text summarization of documents by usage of a probabilistic topic model, especially Latent Dirichlet Allocation (LDA). Afterwards it provides a discussion about the need of a better understanding for the reason and transparency at all for the end-user why documents with a computed similarity actually are similar to a given search query. The authors propose for that an approach to identify and highlight words with respect to their semantic relevance directly within documents and provide a theoretical background as well as an adequate visual assignment for that approach.


