Behavior of Consolidated Trees when using Resampling Techniques

Jesús Mª Pérez, Javier Muguerza, Olatz Arbelaitz, Ibai Gurrutxaga, Jose I. Martin

Abstract

Many machine learning areas use subsampling techniques with different objectives: reducing the size of the training set, equilibrate the class imbalance or non-uniform cost error, etc. Subsampling affects severely to the behavior of classification algorithms. Decision trees induced from different subsamples of the same data set are very different in accuracy and structure. This affects the explanation of the classification; very important in some domains. This paper presents a new methodology for building decision trees. The final classifier is a single decision tree, so that it maintains the explaining capacity of the classification. A comparison in error and structural stability of our algorithm and the C4.5 algorithm is done. The decision trees generated using the new algorithm, achieve smaller error rates and structurally more steady trees than C4.5 when using subsampling techniques.

References

  1. 7 (
Download


Paper Citation


in Harvard Style

Mª Pérez J., Muguerza J., Arbelaitz O., Gurrutxaga I. and I. Martin J. (2004). Behavior of Consolidated Trees when using Resampling Techniques . In Proceedings of the 4th International Workshop on Pattern Recognition in Information Systems - Volume 1: PRIS, (ICEIS 2004) ISBN 972-8865-01-5, pages 139-148. DOI: 10.5220/0002665601390148


in Bibtex Style

@conference{pris04,
author={Jesús Mª Pérez and Javier Muguerza and Olatz Arbelaitz and Ibai Gurrutxaga and Jose I. Martin},
title={Behavior of Consolidated Trees when using Resampling Techniques},
booktitle={Proceedings of the 4th International Workshop on Pattern Recognition in Information Systems - Volume 1: PRIS, (ICEIS 2004)},
year={2004},
pages={139-148},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002665601390148},
isbn={972-8865-01-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 4th International Workshop on Pattern Recognition in Information Systems - Volume 1: PRIS, (ICEIS 2004)
TI - Behavior of Consolidated Trees when using Resampling Techniques
SN - 972-8865-01-5
AU - Mª Pérez J.
AU - Muguerza J.
AU - Arbelaitz O.
AU - Gurrutxaga I.
AU - I. Martin J.
PY - 2004
SP - 139
EP - 148
DO - 10.5220/0002665601390148