Data Visualization using Decision Trees and Clustering

Olivier Parisot, Yoanne Didry, Pierrick Bruneau, Benoît Otjacques

Abstract

Decision trees are simple and powerful tools for knowledge extraction and visual analysis. However, when applied to complex datasets available nowadays, they tend to be large and uneasy to visualize. This difficulty can be overcome by clustering the dataset and representing the decision tree of each cluster independently. In order to apply the clustering more efficiently, we propose a method for adapting clustering results with a view to simplifying the decision tree obtained from each cluster. A prototype has been implemented, and the benefits of the proposed method are shown using the results of several experiments performed on the UCI benchmark datasets.

References

  1. Bache, K. and Lichman, M. (2013). UCI machine learning repository.
  2. Barlow, T. and Neville, P. (2001). Case study: Visualization for decision tree analysis in data mining. In Proceedings of the IEEE Symposium on Information Visualization 2001 (INFOVIS'01), INFOVIS 7801, pages 149-, Washington, DC, USA. IEEE Computer Society.
  3. Breslow, L. A. and Aha, D. W. (1997). Simplifying decision trees: A survey. Knowl. Eng. Rev., 12(1):1-40.
  4. Farhangfar, A., Greiner, R., and Zinkevich, M. (2008). A fast way to produce optimal fixed-depth decision trees. In ISAIM.
  5. Gan, G., Ma, C., and Wu, J. (2007). Data clustering - theory, algorithms, and applications. SIAM.
  6. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., and Witten, I. H. (2009). The WEKA data mining software: an update. SIGKDD Explor. Newsl., 11(1):10-18.
  7. Herman, I., Delest, M., and Melancon, G. (1998). Tree visualisation and navigation clues for information visualisation. Technical report, Amsterdam, The Netherlands, The Netherlands.
  8. Jaccard, P. (1908). Nouvelles recherches sur la distribution florale. Bulletin de la Société vaudoise des sciences naturelles. Impr. Réunies.
  9. Keim, D., Andrienko, G., Fekete, J.-D., Görg, C., Kohlhammer, J., and Melanc¸on, G. (2008). Information visualization. chapter Visual Analytics: Definition, Process, and Challenges, pages 154-175. SpringerVerlag, Berlin, Heidelberg.
  10. Murthy, S. K. (1998). Automatic construction of decision trees from data: A multi-disciplinary survey. Data Min. Knowl. Discov., 2(4):345-389.
  11. O'Madadhain, J., Fisher, D., White, S., and Boey, Y. (2003). The JUNG (Java Universal Network/Graph) framework. Technical report, UCI-ICS.
  12. Parisot, O., Bruneau, P., Didry, Y., and Tamisier, T. (2013a). User-driven data preprocessing for decision support. In Luo, Y., editor, Cooperative Design, Visualization, and Engineering, volume 8091 of Lecture Notes in Computer Science, pages 81-84. Springer Berlin Heidelberg.
  13. Parisot, O., Didry, Y., Tamisier, T., and Otjacques, B. (2013b). Using clustering to improve decision trees visualization. In Proceedings of the 17th International Conference Information Visualisation (IV), pages 186-191, London, United Kingdom.
  14. Pham, N.-K., Do, T.-N., Poulet, F., and Morin, A. (2008). Treeview, exploration interactive des arbres de dcision. Revue d'Intelligence Artificielle, 22(3-4):473- 487.
  15. Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1:81-106.
  16. Quinlan, J. R. (1987). Simplifying decision trees. Int. J. Man-Mach. Stud., 27(3):221-234.
  17. Quinlan, J. R. (1993). C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.
  18. Rousseeuw, P. J. (1987). Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of computational and applied mathematics, 20:53-65.
  19. Stiglic, G., Kocbek, S., Pernek, I., and Kokol, P. (2012). Comprehensive decision tree models in bioinformatics. PLoS ONE, 7(3):e33812.
  20. van den Elzen, S. and van Wijk, J. (2011). Baobabview: Interactive construction and analysis of decision trees. In Visual Analytics Science and Technology (VAST), 2011 IEEE Conference on, pages 151-160.
  21. Wagner, S. and Wagner, D. (2007). Comparing clusterings: an overview. Universität Karlsruhe, Fakultät für Informatik.
Download


Paper Citation


in Harvard Style

Parisot O., Didry Y., Bruneau P. and Otjacques B. (2014). Data Visualization using Decision Trees and Clustering . In Proceedings of the 5th International Conference on Information Visualization Theory and Applications - Volume 1: IVAPP, (VISIGRAPP 2014) ISBN 978-989-758-005-5, pages 80-87. DOI: 10.5220/0004740800800087


in Bibtex Style

@conference{ivapp14,
author={Olivier Parisot and Yoanne Didry and Pierrick Bruneau and Benoît Otjacques},
title={Data Visualization using Decision Trees and Clustering},
booktitle={Proceedings of the 5th International Conference on Information Visualization Theory and Applications - Volume 1: IVAPP, (VISIGRAPP 2014)},
year={2014},
pages={80-87},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004740800800087},
isbn={978-989-758-005-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 5th International Conference on Information Visualization Theory and Applications - Volume 1: IVAPP, (VISIGRAPP 2014)
TI - Data Visualization using Decision Trees and Clustering
SN - 978-989-758-005-5
AU - Parisot O.
AU - Didry Y.
AU - Bruneau P.
AU - Otjacques B.
PY - 2014
SP - 80
EP - 87
DO - 10.5220/0004740800800087