Improving Cascade Classifier Precision by Instance Selection and Outlier Generation

Judith Neugebauer; Oliver Kramer; Michael Sonnenschein

doi:10.5220/0005702100960104

Improving Cascade Classifier Precision by Instance Selection and Outlier Generation

Judith Neugebauer, Oliver Kramer, Michael Sonnenschein

2016

Abstract

Beside the curse of dimensionality and imbalanced classes, unfavorable data distributions can hamper classification accuracy. This is particularly problematic with increasing dimensionality of the classification task. A classifier that can handle high-dimensional and imbalanced data sets is the cascade classification method for time series. The cascade classifier can compound unfavorable data distributions by projecting the high-dimensional data set onto low-dimensional subsets. A classifier is trained for each of the low-dimensional data subsets and their predictions are aggregated to an overall result. For the cascade classifier, the errors of each classifier accumulate in the overall result and therefore small improvements in each small classifier can improve the classification accuracy. Therefore we propose two methods for data preprocessing to improve the cascade classifier. The first method is instance selection, a technique to select representative examples for the classification task. Furthermore, artificial infeasible examples can improve classification performance. Even if high-dimensional infeasible examples are available, their projection to low-dimensional space is not possible due to projection errors. We propose a second data preprocessing method for generating artificial infeasible examples in low-dimensional space. We show for micro Combined Heat and Power plant power production time series and an artificial and complex data set that the proposed data preprocessing methods increase the performance of the cascade classifier by increasing the selectivity of the learned decision boundaries.

References

Bagnall, A., Davis, L. M., Hills, J., and Lines, J. (2012). Transformation based ensembles for time series classification. In Proceedings of the Twelfth SIAM International Conference on Data Mining, Anaheim, California, USA, April 26-28, 2012., pages 307-318.
Bánhalmi, A., Kocsor, A., and Busa-Fekete, R. (2007). Counter-example generation-based one-class classification. In Kok, J. N., Koronacki, J., Mantaras, R. L., Matwin, S., Mladenic˜, D., and Skowron, A., editors, Machine Learning: ECML 2007, volume 4701 of Lecture Notes in Computer Science, pages 543-550. Springer Berlin Heidelberg.
Bellinger, C., Sharma, S., and Japkowicz, N. (2012). Oneclass versus binary classification: Which and when? In Machine Learning and Applications: ICMLA, 2012 11th International Conference on, volume 2, pages 102-106.
Blachnik, M. (2014). Ensembles of instance selection methods based on feature subset. Procedia Computer Science, 35(0):388 - 396. Knowledge-Based and Intelligent Information & Engineering Systems 18th Annual Conference, KES-2014 Gdynia, Poland, September 2014 Proceedings.
Bremer, J., Rapp, B., and Sonnenschein, M. (2010). Support vector based encoding of distributed energy resources' feasible load spaces. In Innovative Smart Grid Technologies Conference Europe IEEE PES.
Garcia, S., Derrac, J., Cano, J., and Herrera, F. (2012). Prototype selection for nearest neighbor classification: Taxonomy and empirical study. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(3):417-435.
He, H. and Garcia, E. (2009). Learning from imbalanced data. Knowledge and Data Engineering, IEEE Transactions on, 21(9):1263-1284.
Jankowski, N. and Grochowski, M. (2004). Comparison of instances seletion algorithms i. algorithms survey. In Rutkowski, L., Siekmann, J., Tadeusiewicz, R., and Zadeh, L., editors, Artificial Intelligence and Soft Computing - ICAISC 2004, volume 3070 of Lecture Notes in Computer Science, pages 598-603. Springer Berlin Heidelberg.
Japkowicz, N. (2013). Assessment Metrics for Imbalanced Learning, pages 187-206. John Wiley & Sons, Inc.
Lin, W.-J. and Chen, J. J. (2013). Class-imbalanced classifiers for high-dimensional data. Briefings in Bioinformatics, 14(1):13-26.
Liu, H., Motoda, H., Gu, B., Hu, F., Reeves, C. R., and Bush, D. R. (2001). Instance Selection and Construction for Data Mining, volume 608 of The Springer International Series in Engineering and Computer Science. Springer US, 1 edition.
Neugebauer, J., Kramer, O., and Sonnenschein, M. (2015). Classification cascades of overlapping feature ensembles for energy time series data. In Woon, W. L., Aung, Z., and Madnick, S., editors, Data Analytics for Renewable Energy Integration. Springer. in print.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825-2830.
Shang, Y.-W. and Qiu, Y.-H. (2006). A note on the extended rosenbrock function. Evol. Comput., 14(1):119-126.
Tax, D. M. J. and Duin, R. P. W. (2002). Uniform object generation for optimizing one-class classifiers. J. Mach. Learn. Res., 2:155-173.
Tomas?ev, N., Buza, K., Marussy, K., and Kis, P. B. (2015). Hubness-aware classification, instance selection and feature construction: Survey and extensions to timeseries. In StaÁczyk, U. and Jain, L. C., editors, Feature Selection for Data and Pattern Recognition, volume 584 of Studies in Computational Intelligence, pages 231-262. Springer Berlin Heidelberg.
Tsai, C.-F., Eberle, W., and Chu, C.-Y. (2013). Genetic algorithms in feature and instance selection. Knowledge-Based Systems, 39(0):240-247.
Wilson, D. and Martinez, T. (2000). Reduction techniques for instance-based learning algorithms. Machine Learning, 38(3):257-286.
Zhuang, L. and Dai, H. (2006). Parameter optimization of kernel-based one-class classifier on imbalance text learning. In Yang, Q. and Webb, G., editors, PRICAI 2006: Trends in Artificial Intelligence, volume 4099 of Lecture Notes in Computer Science, pages 434- 443. Springer Berlin Heidelberg.

Download

Paper Citation

in Harvard Style

Neugebauer J., Kramer O. and Sonnenschein M. (2016). Improving Cascade Classifier Precision by Instance Selection and Outlier Generation . In Proceedings of the 8th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART, ISBN 978-989-758-172-4, pages 96-104. DOI: 10.5220/0005702100960104

in Bibtex Style

@conference{icaart16,
author={Judith Neugebauer and Oliver Kramer and Michael Sonnenschein},
title={Improving Cascade Classifier Precision by Instance Selection and Outlier Generation},
booktitle={Proceedings of the 8th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,},
year={2016},
pages={96-104},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005702100960104},
isbn={978-989-758-172-4},
}

in EndNote Style

TY - CONF
JO - Proceedings of the 8th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,
TI - Improving Cascade Classifier Precision by Instance Selection and Outlier Generation
SN - 978-989-758-172-4
AU - Neugebauer J.
AU - Kramer O.
AU - Sonnenschein M.
PY - 2016
SP - 96
EP - 104
DO - 10.5220/0005702100960104