The Visual Exploration of Aggregate Similarity for Multi-dimensional Clustering

James Twellmeyer, Marco Hutter, Michael Behrisch, Jörn Kohlhammer, Tobias Schreck

2015

Abstract

We present a visualisation prototype for the support of a novel approach to clustering called TRIAGE. TRIAGE uses aggregation functions which are more adaptable and flexible than the weighted mean for similarity modelling. While TRIAGE has proven itself in practice, the use of complex similarity models makes the interpretation of TRIAGE clusterings challenging. We address this challenge by providing analysts with a linked, matrix-based visualisation of all relevant data attributes. We employ data sampling and matrix seriation to support both effective overviews and fluid, interactive exploration using the same visual metaphor for heterogeneous attributes. The usability of our prototype is demonstrated and assessed with the help of real-world usage scenarios from the cyber-security domain.

References

  1. Abello, J. and van Ham, F. (2004). Matrix zoom: A visual interface to semi-external graphs. In IEEE Symposium on Information Visualization, pages 183-190.
  2. Behrisch, M., Davey, J., Fischer, F., Thonnard, O., Schreck, T., Keim, D., and Kohlhammer, J. (2014). Visual analysis of sets of heterogeneous matrices using projection-based distance functions and semantic zoom: Visual analysis of sets of heterogeneous matrices. Computer Graphics Forum, 33(3):411-420.
  3. Beliakov, G., Pradera, A., and Calvo, T. (2007). Aggregation Functions: A Guide for Practitioners, volume 221. Springer Berlin Heidelberg, Berlin and Heidelberg.
  4. Bertin, J. and Berg, W. J. (2010). Semiology of graphics: Diagrams, networks, maps. ESRI Press and Distributed by Ingram Publisher Services, Redlands and Calif, 1st ed edition.
  5. Bremm, S., Schreck, T., Boba, P., Held, S., and Hamacher, K. (2010). Computing and visually analyzing mutual information in molecular co-evolution. BMC Bioinformatics, 11(1):330.
  6. Choquet, G. (1954). Theory of capacities. Annales de l'institut Fourier, 5:131-295.
  7. Ellis, G. and Dix, A. (2002). Density control through random sampling: an architectural perspective. In Sixth International Conference on Information Visualisation, pages 82-90.
  8. Ellis, G. and Dix, A. (2007). A taxonomy of clutter reduction for information visualisation. IEEE transactions on visualization and computer graphics, 13(6):1216- 1223.
  9. Everitt, B. S., Landau, S., Leese, M., and Stahl, D. (2011). Cluster Analysis. John Wiley & Sons.
  10. Fischer, F., Davey, J., Fuchs, J., Thonnard, O., Kohlhammer, J., and Keim, D. A. (2014). A visual analytics field experiment to evaluate alternative visualizations for cyber security applications. In EuroVis Workshop on Visual Analytics, pages 43-47, Swansea, UK. Eurographics Association.
  11. Ghoniem, M., Fekete, J.-D., and Castagliola, P. (2005). On the readability of graphs using node-link and matrix-based representations: a controlled experiment and statistical analysis. Information Visualization, 4(2):114-135.
  12. Gower, J. C. (1971). A general coefficient of similarity and some of its properties. Biometrics, 27(4):857.
  13. Henry, N. and Fekete, J. (2006). MatrixExplorer: a dual-representation system to explore social networks. IEEE Transactions on Visualization and Computer Graphics, 12(5):677-684.
  14. Henry, N., Fekete, J.-D., and McGuffin, M. J. (2007). NodeTrix: a hybrid visualization of social networks. IEEE Transactions on Visualization and Computer Graphics, 13(6):1302-1309.
  15. Holten, D. and van Wijk, J. J. (2010). Evaluation of cluster identification performance for different PCP variants. Computer Graphics Forum, 29(3):793-802.
  16. Isacenkova, J., Thonnard, O., Costin, A., Balzarotti, D., and Francillon, A. (2013). Inside the SCAM jungle: A closer look at 419 scam email operations. In 2013 IEEE Security and Privacy Workshops (SPW), pages 143-150.
  17. Kaufman, L. and Rousseeuw, P. J. (2009). Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley & Sons.
  18. Keller, R., Eckert, C. M., and Clarkson, P. J. (2006). Matrices or node-link diagrams: which visual representation is better for visualising connectivity models? Information Visualization, 5(1):62-76.
  19. Kriegel, H.-P., Krö ger, P., and Zimek, A. (2009). Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Trans. Knowl. Discov. Data, 3(1):1:1-1:58.
  20. Leita, C. and Cova, M. (2011). HARMUR: Storing and analyzing historic data on malicious domains. In Kirda, E. and Holz, T., editors, Proceedings of the First Workshop on Building Analysis Datasets and Gathering Experience Returns for Security (BADGERS), pages 46-53.
  21. Leita, C. and Dacier, M. (2008). SGNET: A worldwide deployable framework to support the analysis of malware threat models. In 2008 Seventh European Dependable Computing Conference EDCC, pages 99- 109.
  22. Lex, A., Streit, M., Partl, C., Kashofer, K., and Schmalstieg, D. (2010). Comparative analysis of multidimensional, quantitative data. IEEE Transactions on Visualization and Computer Graphics, 16(6):1027-1035.
  23. Li, J., Martens, J.-B., and van Wijk, Jarke J (2008). Judging correlation from scatterplots and parallel coordinate plots. Information Visualization, 9(1):13-30.
  24. Mueller, C., Martin, B., and Lumsdaine, A. (2007a). A comparison of vertex ordering algorithms for large graph visualization. In Asia-Pacific Symposium on Visualisation 2007, pages 141-148.
  25. Mueller, C., Martin, B., and Lumsdaine, A. (2007b). Interpreting large visual similarity matrices. In AsiaPacific Symposium on Visualisation 2007, pages 149- 152.
  26. Parsons, L., Haque, E., and Liu, H. (2004). Subspace clustering for high dimensional data: a review. SIGKDD Explor. Newsl., 6(1):90-105.
  27. Rao, R. and Card, S. K. (1995). Exploring large tables with the table lens. In Conference Companion on Human Factors in Computing Systems, CHI 7895, pages 403- 404, New York, NY, USA. ACM.
  28. Shneiderman, B. (1996). The eyes have it: a task by data type taxonomy for information visualizations. In 1996 IEEE Symposium on Visual Languages, pages 336- 343.
  29. Spenke, M., Beilken, C., and Berlage, T. (1996). FOCUS: The interactive table for product comparison and selection. In Proceedings of the 9th Annual ACM Symposium on User Interface Software and Technology, UIST 7896, pages 41-50, New York, NY, USA. ACM.
  30. Strehl, A. and Ghosh, J. (2003). Relationship-based clustering and visualization for high-dimensional data mining. INFORMS Journal on Computing, 15(2):208- 230.
  31. Thonnard, O., Mees, W., and Dacier, M. (2010). On a multicriteria clustering approach for attack attribution. ACM SIGKDD Explorations Newsletter, 12(1):11.
  32. Vidal, R. (2011). Subspace clustering. IEEE Signal Processing Magazine, 28(2):52-68.
  33. Ware, C. (2013). Information Visualization: Perception for Design. Morgan Kaufmann Publishers, Waltham and MA, third edition.
  34. Wong, P. C., Mackey, P., Foote, H., and May, R. (2013). Visual matrix clustering of social networks. Computer Graphics and Applications, IEEE, 33(4):88-96.
  35. Wu, H.-M., Tien, Y.-J., and Chun-houh Chen (2010). GAP: A graphical environment for matrix visualization and cluster analysis. Computational Statistics & Data Analysis, 54(3):767-778.
  36. Yager, R. R. (1988). On ordered weighted averaging aggregation operators in multicriteria decisionmaking. IEEE Transactions on Systems, Man, and Cybernetics, 18(1):183-190.
Download


Paper Citation


in Harvard Style

Twellmeyer J., Hutter M., Behrisch M., Kohlhammer J. and Schreck T. (2015). The Visual Exploration of Aggregate Similarity for Multi-dimensional Clustering . In Proceedings of the 6th International Conference on Information Visualization Theory and Applications - Volume 1: IVAPP, (VISIGRAPP 2015) ISBN 978-989-758-088-8, pages 40-50. DOI: 10.5220/0005304100400050


in Bibtex Style

@conference{ivapp15,
author={James Twellmeyer and Marco Hutter and Michael Behrisch and Jörn Kohlhammer and Tobias Schreck},
title={The Visual Exploration of Aggregate Similarity for Multi-dimensional Clustering},
booktitle={Proceedings of the 6th International Conference on Information Visualization Theory and Applications - Volume 1: IVAPP, (VISIGRAPP 2015)},
year={2015},
pages={40-50},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005304100400050},
isbn={978-989-758-088-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 6th International Conference on Information Visualization Theory and Applications - Volume 1: IVAPP, (VISIGRAPP 2015)
TI - The Visual Exploration of Aggregate Similarity for Multi-dimensional Clustering
SN - 978-989-758-088-8
AU - Twellmeyer J.
AU - Hutter M.
AU - Behrisch M.
AU - Kohlhammer J.
AU - Schreck T.
PY - 2015
SP - 40
EP - 50
DO - 10.5220/0005304100400050