A Hierarchical Tree Distance Measure for Classification

Kent Munthe Caspersen, Martin Bjeldbak Madsen, Andreas Berre Eriksen, Bo Thiesson

2017

Abstract

In this paper, we explore the problem of classification where class labels exhibit a hierarchical tree structure. Many multiclass classification algorithms assume a flat label space, where hierarchical structures are ignored. We take advantage of hierarchical structures and the interdependencies between labels. In our setting, labels are structured in a product and service hierarchy, with a focus on spend analysis. We define a novel distance measure between classes in a hierarchical label tree. This measure penalizes paths though high levels in the hierarchy. We use a known classification algorithm that aims to minimize distance between labels, given any symmetric distance measure. The approach is global in that it constructs a single classifier for an entire hierarchy by embedding hierarchical distances into a lower-dimensional space. Results show that combining our novel distance measure with the classifier induces a trade-off between accuracy and lower hierarchical distances on misclassifications. This is useful in a setting where erroneous predictions vastly change the context of a label.

References

  1. 20News (2008). 20 newsgroups. http://qwone.com/~jason/ 20Newsgroups/. Accessed: 2016-12-13.
  2. Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer.
  3. Chen, Y.-W. and Lin, C.-J. (2006). Feature Extraction: Foundations and Applications, chapter Combining SVMs with Various Feature Selection Strategies, pages 315 - 324. Springer Berlin Heidelberg, Berlin, Heidelberg.
  4. Dumais, S. and Chen, H. (2000). Hierarchical classification of web content. In Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, pages 256 - 263. ACM.
  5. Labrou, Y. and Finin, T. (1999). Yahoo! as an ontology: using yahoo! categories to describe documents. In Proceedings of the eighth international conference on Information and knowledge management, pages 180 - 187. ACM.
  6. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825-2830.
  7. Programme, U. N. D. (2016). United nations standard products and services code homepage.
  8. Silla, Carlos N., J. and Freitas, A. A. (2011). A survey of hierarchical classification across different application domains. Data Mining and Knowledge Discovery, 22(1 - 2):31 - 72.
  9. Wang, K., Zhou, S., and Liew, S. C. (1999). Building hierarchical classifiers using class proximity. In Proceedings of the 25th International Conference on Very Large Data Bases, pages 363-374, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc.
  10. Weinberger, K. Q. and Chapelle, O. (2009). Large margin taxonomy embedding for document categorization. In Koller, D., Schuurmans, D., Bengio, Y., and Bottou, L., editors, Advances in Neural Information Processing Systems 21, pages 1737-1744. Curran Associates, Inc.
Download


Paper Citation


in Harvard Style

Munthe Caspersen K., Bjeldbak Madsen M., Berre Eriksen A. and Thiesson B. (2017). A Hierarchical Tree Distance Measure for Classification . In Proceedings of the 6th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-758-222-6, pages 502-509. DOI: 10.5220/0006198505020509


in Bibtex Style

@conference{icpram17,
author={Kent Munthe Caspersen and Martin Bjeldbak Madsen and Andreas Berre Eriksen and Bo Thiesson},
title={A Hierarchical Tree Distance Measure for Classification},
booktitle={Proceedings of the 6th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},
year={2017},
pages={502-509},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006198505020509},
isbn={978-989-758-222-6},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 6th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - A Hierarchical Tree Distance Measure for Classification
SN - 978-989-758-222-6
AU - Munthe Caspersen K.
AU - Bjeldbak Madsen M.
AU - Berre Eriksen A.
AU - Thiesson B.
PY - 2017
SP - 502
EP - 509
DO - 10.5220/0006198505020509