# MultiResolution Complexity Analysis - A Novel Method for Partitioning Datasets into Regions of Different Classification Complexity

### G. Armano, E. Tamponi

#### Abstract

Systems for complexity estimation typically aim to quantify the overall complexity of a domain, with the goal of comparing the hardness of different datasets or to associate a classification task to an algorithm that is deemed best suited for it. In this work we describe MultiResolution Complexity Analysis, a novel method for partitioning a dataset into regions of different classification complexity, with the aim of highlighting sources of complexity or noise inside the dataset. Initial experiments have been carried out on relevant datasets, proving the effectiveness of the proposed method.

#### References

- Abdi, H. and Williams, L. J. (2010). Principal component analysis. Wiley Interdisciplinary Reviews: Computational Statistics, 2:433--459.
- Alcalá-Fdez, J., Fernández, A., Luengo, J., Derrac, J., and García, S. (2011). Keel data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework. Multiple-Valued Logic and Soft Computing, 17(2-3):255-287.
- Bache, K. and Lichman, M. (2013). UCI machine learning repository.
- Bhattacharyya, A. (1943). On a measure of divergence between two statistical populations defined by their probability distributions. Bulletin of Cal. Math. Soc., 35(1):99-109.
- Fukunaga, K. (1990). Introduction to Statistical Pattern Recognition (2nd ed.). Academic Press Professional, Inc., San Diego, CA, USA.
- Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., and Witten, I. H. (2009). The WEKA data mining software: an update. SIGKDD Explor. Newsl., 11(1):10-18.
- Ho, T. (2000). Complexity of classification problems and comparative advantages of combined classifiers. In Multiple Classifier Systems, volume 1857 of Lecture Notes in Computer Science, pages 97-106. Springer Berlin Heidelberg.
- Ho, T. K. and Basu, M. (2000). Measuring the complexity of classification problems. In in 15th International Conference on Pattern Recognition, pages 43-47.
- Luengo, J., Fernández, A., García, S., and Herrera, F. (2011). Addressing data complexity for imbalanced data sets: analysis of smote-based oversampling and evolutionary undersampling. Soft Computing, 15(10):1909-1936.
- Luengo, J. and Herrera, F. (2012). Shared domains of competence of approximate learning models using measures of separability of classes. Information Sciences, 185(1):43 - 65.
- Mahalanobis, P. C. (1930). On tests and measures of group divergence. Part 1. Theoretical formulae. Journal and Proceedings of the Asiatic Society of Bengal (N.S.), 26:541-588.
- Mansilla, E. B. and Ho, T. K. (2005). Domain of Competence of XCS Classifier System in Complexity Measurement Space.
- Pierson, W. E., Jr., and Pierson, W. E. (1998). Using boundary methods for estimating class separability.
- Pierson, W. E., Ulug, B., Ahalt, S. C., Sancho, J. L., and Figueiras-Vidal, A. (1998). Theoretical and complexity issues for feature set evaluation using boundary methods. In Zelnio, E. G., editor, Algorithms for Synthetic Aperture Radar Imagery V, volume 3370 of Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, pages 625-636.
- Singh, S. (2003). Multiresolution estimates of classification complexity. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 25(12):1534-1539.
- Sohn, S. Y. (1999). Meta analysis of classification algorithms for pattern recognition. IEEE Trans. Pattern Anal. Mach. Intell., 21(11):1137-1144.
- Sotoca, J. M., Mollineda, R. A., and Sánchez, J. S. (2006). A meta-learning framework for pattern classification by means of data complexity measures. Inteligencia Artificial, Revista Iberoamericana de Inteligencia Artificial, 10(29):31-38.
- Tumer, K. and Ghosh, J. (1996). Estimating the bayes error rate through classifier combining. In In Proceedings of the International Conference on Pattern Recognition, pages 695-699.

#### Paper Citation

#### in Harvard Style

Armano G. and Tamponi E. (2015). **MultiResolution Complexity Analysis - A Novel Method for Partitioning Datasets into Regions of Different Classification Complexity** . In *Proceedings of the International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,* ISBN 978-989-758-076-5, pages 334-341. DOI: 10.5220/0005247003340341

#### in Bibtex Style

@conference{icpram15,

author={G. Armano and E. Tamponi},

title={MultiResolution Complexity Analysis - A Novel Method for Partitioning Datasets into Regions of Different Classification Complexity},

booktitle={Proceedings of the International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},

year={2015},

pages={334-341},

publisher={SciTePress},

organization={INSTICC},

doi={10.5220/0005247003340341},

isbn={978-989-758-076-5},

}

#### in EndNote Style

TY - CONF

JO - Proceedings of the International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,

TI - MultiResolution Complexity Analysis - A Novel Method for Partitioning Datasets into Regions of Different Classification Complexity

SN - 978-989-758-076-5

AU - Armano G.

AU - Tamponi E.

PY - 2015

SP - 334

EP - 341

DO - 10.5220/0005247003340341