Authors:
Faryel Allouti
1
;
Mohamed Nadif
1
and
Benoît Otjacques
2
Affiliations:
1
LIPADE, UFR MI, Paris Descartes University, France
;
2
Centre de Recherche Public - Gabriel Lippmann, Luxembourg
Keyword(s):
Information visualization, Clustering, Documents.
Related
Ontology
Subjects/Areas/Topics:
Abstract Data Visualization
;
Computer Vision, Visualization and Computer Graphics
;
General Data Visualization
;
Interactive Visual Interfaces for Visualization
;
Visual Representation and Interaction
Abstract:
Handling collections of text documents has become a daily task for many professionals whatever their economic sector or position in the organization. In many cases, little metadata is added to the documents, which makes it difficult to automatically derive a semantic structure within the collection. This paper describes a new tool that combines the clustering and the visualization paradigms to help a user identify similar documents in an unstructured collection. Several clustering algorithms can be used to identify clusters of documents that
are subsequently displayed on a plane. In this work, we use the Classification EM algorithm. The originality of our approach is to allow the user to refine the clustering process interactively by means of a visual analysis of the results of the intermediate steps. In addition, the tool also shows some enriched views of the content of documents and allows the user to include a semantic analysis based on personal knowledge to the computer-based cl
ustering process.
(More)