VISUALIZATION OF DOCUMENT CLUSTERS - An Interactive Visual Tool to Browse Textual Documents

Faryel Allouti, Mohamed Nadif, Benoît Otjacques

Abstract

Handling collections of text documents has become a daily task for many professionals whatever their economic sector or position in the organization. In many cases, little metadata is added to the documents, which makes it difficult to automatically derive a semantic structure within the collection. This paper describes a new tool that combines the clustering and the visualization paradigms to help a user identify similar documents in an unstructured collection. Several clustering algorithms can be used to identify clusters of documents that are subsequently displayed on a plane. In this work, we use the Classification EM algorithm. The originality of our approach is to allow the user to refine the clustering process interactively by means of a visual analysis of the results of the intermediate steps. In addition, the tool also shows some enriched views of the content of documents and allows the user to include a semantic analysis based on personal knowledge to the computer-based clustering process.

References

  1. Allouti, F., Nadif, M., Le Thi, H. A., Otjacques, B., 2009. Mixture model and MDSDCA for textual data. In Proceedings of the 6th International Conference on Cooperative Design, Visualization and Engineering (CDVE 2009), 20-23 September 2009, Luxembourg, published in Cooperative Design, Visualization, and Engineering, Lecture Notes in Computer Science, vol. 5738, Springer, Berlin, Allemagne, pp. 240-244.
  2. Blanc-Brude, T., Scapin, D., 2007. What do People Recall about their Documents ? Implications for Desktop Search Tools. In IUI'07, 2007 International Conference on Intelligent User Interfaces, ACM Press.
  3. Barreau, D., 1995. Context as a factor in personal information management systems. In Journal of the American Society for Information Science, 46(5), 327-339.
  4. Gonc¸alves, D., Jorge, J. A., 2008. In Search of Personal Information: Narrative-Based Interfaces. In Proceedings of the 13th international Conference on intelligent User interfaces (Gran Canaria, Spain, January 13-16, 2008). IUI'08. ACM, New York, NY, 179-188.
  5. Govaert, G., Nadif, M., 2007. Clustering of contingency table and mixture model. European Journal of Operational Research. 36, 1055-1066.
  6. Le Thi, H. A., Pham Dinh, T., 2001. D.C. Programming Approach for Solving the Multidimensional Scaling Problem. Nonconvex Optimizations and Its Applications, Kluwer Academic Publishers. 231-276.
  7. Paulovich, F. V., Minghim, R., 2008. HiPP: A novel hierarchical point placement straegy and its application to the exploration of document collections. IEEE Transactions on Visualization and Computer Graphics 14, 6 (2008), 1229-1236.
Download


Paper Citation


in Harvard Style

Allouti F., Nadif M. and Otjacques B. (2010). VISUALIZATION OF DOCUMENT CLUSTERS - An Interactive Visual Tool to Browse Textual Documents . In Proceedings of the International Conference on Imaging Theory and Applications and International Conference on Information Visualization Theory and Applications - Volume 1: IVAPP, (VISIGRAPP 2010) ISBN 978-989-674-027-6, pages 157-160. DOI: 10.5220/0002838501570160


in Bibtex Style

@conference{ivapp10,
author={Faryel Allouti and Mohamed Nadif and Benoît Otjacques},
title={VISUALIZATION OF DOCUMENT CLUSTERS - An Interactive Visual Tool to Browse Textual Documents},
booktitle={Proceedings of the International Conference on Imaging Theory and Applications and International Conference on Information Visualization Theory and Applications - Volume 1: IVAPP, (VISIGRAPP 2010)},
year={2010},
pages={157-160},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002838501570160},
isbn={978-989-674-027-6},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Imaging Theory and Applications and International Conference on Information Visualization Theory and Applications - Volume 1: IVAPP, (VISIGRAPP 2010)
TI - VISUALIZATION OF DOCUMENT CLUSTERS - An Interactive Visual Tool to Browse Textual Documents
SN - 978-989-674-027-6
AU - Allouti F.
AU - Nadif M.
AU - Otjacques B.
PY - 2010
SP - 157
EP - 160
DO - 10.5220/0002838501570160