Coupled with incremental MDS techniques, e.g., in-
cBoard and incSpace, it is well-suited for handling
text streams and time-stamped document collections,
with limited recalculations.
Some string-based metrics also performed well in
the comparisons, in particular Qgram, string based
Cosine and Overlapping Coefficient. Their major ad-
vantage is not requiring intermediate text representa-
tions such as the vector models, althoug distance cal-
culations are computationally expensive. A next step
is to evaluate iVSM and string measures in a truly in-
cremental setup, by applying them in displaying text
streams with, e.g., incBoard or incSpace.
The approaches considered disregard any kind of
semantic analysis of text. For instance, stemming in
preprocessing impacts semantics in a not very pre-
dictable manner. Although this type of processing
and dissimilarity calculation suffices for many appli-
cations, further investigation should be conducted on
semantic-based distances, as semantics cannot be ig-
nored in some text analytics applications. The impact
of the language model also needs further study.
ACKNOWLEDGEMENTS
The authors acknowledge the support of FAPESP and
CNPq. Ideas and opinions expressed are those of the
authors and do not necessarily reflect those of their
employers or host organizations.
REFERENCES
Alsakran, J., Chen, Y., Luo, D., Zhao, Y., Yang, J., Dou,
W., and Liu, S. (2012). Real-Time Visualization of
Streaming Text with a Force-Based Dynamic System.
IEEE Comp. Graph. and Applic., 32(1):34–45.
Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003). Latent
dirichlet allocation. J. of Mach. Learn. Res., 3:993–
1022.
Cohen, W., Ravikumar, P., and Fienberg, S. (2003). A
Comparison of String Distance Metrics for Name-
Matching Tasks. In Proc. IJCAI-2003 Workshop on
Information Integration on the Web, pages 73–78.
Cuadros, A. M., Paulovich, F. V., Minghim, R., and Telles,
G. P. (2007). Point Placement by Phylogenetic Trees
and its Application to Visual Analysis of Document
Collections. In Proc. 2007 IEEE Symp. Vis. Analytics
Sci. and Techn., pages 99–106.
Huang, S., Ward, M., and Rundensteiner, E. (2005). Ex-
ploration of Dimensionality Reduction for Text Vi-
sualization. In Proc. Coord. and Mult. Views in Ex-
ploratory Vis., pages 63–74.
Kempken, S., Luther, W., and Pilz, T. (2006). Comparison
of distance measures for historical spelling variants.
In Artif. Intel. Theory and Prac., pages 295–304.
Landauer, T. K., McNamara, D. S., Dennis, S., and Kintsch,
W. (2007). Handbook of Latent Semantic Analysis.
Lawrence Erlbaum Assoc.
Lopes, A. A., Pinho, R., Paulovich, F. V., and Minghim,
R. (2007). Visual text mining using association rules.
Comp & Graph., 31(3):316–326.
Paiva, J. G. S., Florian, L., Pedrini, H., Telles, G. P., and
Minghim, R. (2011). Improved Similarity Trees and
their Application to Visual Data Classification. IEEE
Trans. on Vis. and Comp. Graph., 17(12):2459–2468.
Paulovich, F. V. and Minghim, R. (2008). HiPP: A Novel
Hierarchical Point Placement Strategy and its Appli-
cation to the Exploration of Document Collections.
IEEE Tran. Vis. and Comp. Graph., 14(6):1229–1236.
Paulovich, F. V., Nonato, L. G., Minghim, R., and Lev-
kowitz, H. (2008). Least Square Projection: A Fast
High-Precision Multidimensional Projection Tech-
nique and its Application to Document Mapping.
IEEE Trans. Vis. and Comp. Graph., 14(3):564–575.
Pinho, R., de Oliveira, M. C. F., and Lopes, A. A. (2009).
Incremental board: a grid-based space for visualizing
dynamic data sets. In Proc. .2009 ACM Symp. Appl.
Comp., pages 1757–1764.
Pinho, R., de Oliveira, M. C. F., and Lopes, A. A. (2010).
An incremental space to visualize dynamic data sets.
Multimedia Tools and Appl., 50(3):533–562.
Salton, G., Wong, A., and Yang, C. S. (1975). A Vec-
tor Space Model for Automatic Indexing. Commun.
ACM, 18(11):613–620.
Tan, P. N., Steinbach, M., and Kumar, V. (2005). Introduc-
tion to Data Mining. Addison-Wesley.
Telles, G. P., Minghim, R., and Paulovich, F. V. (2007). Nor-
malized compression distance for visual analysis of
document collections. Comp. & Graph., 31(3):327–
337.
Wei, F., Liu, S., Song, Y., Pan, S., Zhou, M. X., Qian, W.,
Shi, L., Tan, L., and Zhang, Q. (2010). TIARA: A
Visual Exploratory Text Analytic System. In Proc. .
16th ACM SIGKDD Int. Conf. on Knowl. Discovery
and Data Min., pages 153–162.
Wise, J. A., Thomas, J. J., Pennock, K., Lantrip, D., Pottier,
M., Schur, A., and Crow, V. (1995). Visualizing the
non-visual: spatial analysis and interaction with in-
formation from text documents. In Proc. .1995 IEEE
Symp. Inf. Vis., pages 51–58.
IVAPP2013-InternationalConferenceonInformationVisualizationTheoryandApplications
438