view in real-time files that are closest in terms of ei-
ther euclidian distance or chisquare distance. Thus,
the user can analyze these files in more detail (see
Figure 3).
One can also select a subset to be clustred and
to be projected. It is also possible to select another
subset from the previous one. Thus, the result of the
computer-based clustering is refined by the user in an
iterative loop (see Figure 3).
4.3 Visualization of the
Pre-post-clustering Relationship
Visualization of the pre-post-clustering allows to
show four cases: files that were in the same In classes
and are in the same Out classes, files that were not
in the same In classes and are not in the same Out
classes, files that were in the same In classes but are
not in the same Out classes, files that were not in the
same In classes and are in the same Out classes.
Specifically, this part allows to visualize files
wrongly classified.
4.4 Information Retrieval
Concerning information retreival, our tool allows to
search documents by specifying a word. To illustrate
the result of the research process, we associated color
to the files containing the specified word. We indicate
to the user the number of files containing the specified
word (see Figure 2). The user can visualize only the
files containing the specified word and thus analyse
in more detail these files. By clicking on one of these
files, one can, for example, visualize the occurrence
of the specified word in the selected file.
5 CONCLUSIONS
In this paper we presented an interactive tool that
combines the clustering and visualization methods for
textual data. The tool allows to identify similar doc-
uments into an unstructured collection. Specifically,
we used the multinomial mixture model to cluster
the documents, and MDSDCA for visualization. The
originality of our approach is to allow the user to in-
teractively refine the clustering process based on vi-
sual analysis of the results of the intermediate steps.
In addition, our tool also offers visual cues in the text
view in order to help the user identify the most rele-
vant words in the document as well as in the whole
class. Tool also shows some enriched views of the
content of documents by allowing the user to include
Figure 3: Clustering and projection of a subset of docu-
ments.
a semantic analysis based on personal knowledge to
the computer-based clustering process.
We will illustrate our tool on other real data.
ACKNOWLEDGEMENTS
The research work reported in this paper has been
supported by a grant of the National Research Fund
(FNR) of Luxembourg. For this, we thank the FNR.
REFERENCES
Allouti, F., Nadif, M., Le Thi, H. A., Otjacques, B., 2009.
Mixture model and MDSDCA for textual data. In
Proceedings of the 6th International Conference on
Cooperative Design, Visualization and Engineering
(CDVE 2009), 20-23 September 2009, Luxembourg,
published in Cooperative Design, Visualization, and
Engineering, Lecture Notes in Computer Science, vol.
5738, Springer, Berlin, Allemagne, pp. 240-244.
Blanc-Brude, T., Scapin, D., 2007. What do People Recall
about their Documents ? Implications for Desktop
Search Tools. In IUI’07, 2007 International Confer-
ence on Intelligent User Interfaces, ACM Press.
Barreau, D., 1995. Context as a factor in personal informa-
tion management systems. In Journal of the American
Society for Information Science, 46(5), 327-339.
Gonc¸alves, D., Jorge, J. A., 2008. In Search of Personal
Information: Narrative-Based Interfaces. In Proceed-
ings of the 13th international Conference on intelli-
gent User interfaces (Gran Canaria, Spain, January
13-16, 2008). IUI’08. ACM, New York, NY, 179-188.
Govaert, G., Nadif, M., 2007. Clustering of contingency ta-
ble and mixture model. European Journal of Opera-
tional Research. 36, 1055-1066.
Le Thi, H. A., Pham Dinh, T., 2001. D.C. Programming
Approach for Solving the Multidimensional Scaling
Problem. Nonconvex Optimizations and Its Applica-
tions, Kluwer Academic Publishers. 231-276.
Paulovich, F. V., Minghim, R., 2008. HiPP: A novel hier-
archical point placement straegy and its application to
the exploration of document collections. IEEE Trans-
actions on Visualization and Computer Graphics 14,
6 (2008), 1229-1236.
IVAPP 2010 - International Conference on Information Visualization Theory and Applications
160