GEML: Evolutionary Unsupervised and Semi-Supervised Learning of Multi-class Classification with Grammatical Evolution

Jeannie M. Fitzgerald, R. Mohammed Atif Azad, Conor Ryan

2015

Abstract

This paper introduces a novel evolutionary approach which can be applied to supervised, semi-supervised and unsupervised learning tasks. The method, Grammatical Evolution Machine Learning (GEML) adapts machine learning concepts from decision tree learning and clustering methods, and integrates these into a Grammatical Evolution framework. With minor adaptations to the objective function the system can be trivially modified to work with the conceptually different paradigms of supervised, semi-supervised and unsupervised learning. The framework generates human readable solutions which explain the mechanics behind the classification decisions, offering a significant advantage over existing paradigms for unsupervised and semi-supervised learning. GEML is studied on a range of multi-class classification problems and is shown to be competitive with several state of the art multi-class classification algorithms.

References

  1. Al-Madi, N. and Ludwig, S. A. (2013). Improving genetic programming classification for binary and multiclass datasets. In Hammer, B., Zhou, Z.-H., Wang, L., and Chawla, N., editors, IEEE Symposium on Computational Intelligence and Data Mining, CIDM 2013, pages 166-173, Singapore.
  2. Altman, N. S. (1992). An introduction to kernel and nearestneighbor nonparametric regression. The American Statistician, 46(3):175-185.
  3. Azad, R. M. A. and Ryan, C. (2014). The best things dont always come in small packages: Constant creation in grammatical evolution. In Genetic Programming, pages 186-197. Springer.
  4. Banzhaf, W. (2013). Evolutionary computation and genetic programming. In Lakhtakia, A. and MartinPalma, R. J., editors, Engineered Biomimicry, chapter 17, pages 429-447. Elsevier, Boston.
  5. Barros, R. C., Basgalupp, M. P., De Carvalho, A. C., Freitas, A., et al. (2012). A survey of evolutionary algorithms for decision-tree induction. Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on, 42(3):291-312.
  6. Belhassen, S. and Zaidi, H. (2010). A novel fuzzy c-means algorithm for unsupervised heterogeneous tumor quantification in pet. Medical physics, 37(3):1309- 1324.
  7. Bhowan, U., Johnston, M., and Zhang, M. (2012). Developing new fitness functions in genetic programming for classification with unbalanced data. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 42(2):406-421.
  8. Boser, B. E., Guyon, I. M., and Vapnik, V. N. (1992). A training algorithm for optimal margin classifiers. In Proceedings of the fifth annual workshop on Computational learning theory, pages 144-152. ACM.
  9. Breiman, L., Friedman, J., Stone, C. J., and Olshen, R. A. (1984). Classification and regression trees . CRC press.
  10. Cowgill, M. C., Harvey, R. J., and Watson, L. T. (1999). A genetic algorithm approach to cluster analysis. Computers & Mathematics with Applications, 37(7):99- 108.
  11. Deodhar, S. and Motsinger-Reif, A. A. (2010). Grammatical evolution decision trees for detecting gene-gene interactions. In Pizzuti, C., Ritchie, M. D., and Giacobini, M., editors, 8th European Conference on Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics (EvoBIO 2010), volume 6023 of Lecture Notes in Computer Science, pages 98-109, Istanbul, Turkey. Springer.
  12. Downey, C., Zhang, M., and Liu, J. (2012). Parallel linear genetic programming for multi-class classification. Genetic Programming and Evolvable Machines, 13(3):275-304. Special issue on selected papers from the 2011 European conference on genetic programming.
  13. Fitzgerald, J. and Ryan, C. (2013). A hybrid approach to the problem of class imbalance. In Matousek, R., editor, 19th International Conference on Soft Computing, MENDEL 2013, pages 129-137, Brno, Czech Republic.
  14. Fitzgerald, J. and Ryan, C. (2014). Balancing exploration and exploitation in genetic programming using inversion with individualized self-adaptation. International Journal of Hybrid Intelligent Systems, 11(4):273-285.
  15. Fogel, D. B. (2000). What is evolutionary computation? Spectrum, IEEE, 37(2):26-28.
  16. Fu, W., Johnston, M., and Zhang, M. (2014). Unsupervised learning for edge detection using genetic programming. In Coello Coello, C. A., editor, Proceedings of the 2014 IEEE Congress on Evolutionary Computation, pages 117-124, Beijing, China.
  17. Greene, D., Tsymbal, A., Bolshakova, N., and Cunningham, P. (2004). Ensemble clustering in medical diagnostics. In Computer-Based Medical Systems, 2004. CBMS 2004. Proceedings. 17th IEEE Symposium on, pages 576-581. IEEE.
  18. Hruschka, E. R., Campello, R. J., Freitas, A., De Carvalho, A. C., et al. (2009). A survey of evolutionary algorithms for clustering. Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on, 39(2):133-155.
  19. Kattan, A., Agapitos, A., and Poli, R. (2010). Unsupervised problem decomposition using genetic programming. In Esparcia-Alcazar, A. I., Ekart, A., Silva, S., Dignum, S., and Uyar, A. S., editors, Proceedings of the 13th European Conference on Genetic Programming, EuroGP 2010, volume 6021 of LNCS, pages 122-133, Istanbul. Springer.
  20. Kattan, A., Fatima, S., and Arif, M. (2015). Timeseries event-based prediction: An unsupervised learning framework based on genetic programming. Information Sciences, 301:99-123.
  21. Keijzer, M. and Babovic, V. (2000). Genetic programming, ensemble methods and the bias/variance tradeoff - introductory investigations. In Poli, R., Banzhaf, W., Langdon, W. B., Miller, J. F., Nordin, P., and Fogarty, T. C., editors, Genetic Programming, Proceedings of EuroGP'2000, volume 1802 of LNCS, pages 76-90, Edinburgh. Springer-Verlag.
  22. Koza, J. R. (1990). Genetic programming: A paradigm for genetically breeding populations of computer programs to solve problems. Technical report.
  23. Maulik, U. and Bandyopadhyay, S. (2000). Genetic algorithm-based clustering technique. Pattern Recognition, 33(9):1455-1465.
  24. Mierswa, I. and Wurst, M. (2006). Information preserving multi-objective feature selection for unsupervised learning. In Proceedings of the 8th annual conference on Genetic and evolutionary computation, pages 1545-1552. ACM.
  25. Mojsilovic, A., Popovic, M. V., Nes?kovic, A. N., and Popovic, A. D. (1997). Wavelet image extension for analysis and classification of infarcted myocardial tissue. Biomedical Engineering, IEEE Transactions on, 44(9):856-866.
  26. Morita, M., Sabourin, R., Bortolozzi, F., and Suen, C. Y. (2003). Unsupervised feature selection using multiobjective genetic algorithms for handwritten word recognition. In 2013 12th International Conference on Document Analysis and Recognition, volume 2, pages 666-666. IEEE Computer Society.
  27. Munoz, L., Silva, S., and Trujillo, L. (2015). M3GP: multiclass classification with GP. In Machado, P., Heywood, M. I., McDermott, J., Castelli, M., GarciaSanchez, P., Burelli, P., Risi, S., and Sim, K., editors, 18th European Conference on Genetic Programming, volume 9025 of LNCS, pages 78-91, Copenhagen. Springer.
  28. Neshatian, K. and Zhang, M. (2009). Unsupervised elimination of redundant features using genetic programming. In Nicholson, A. E. and Li, X., editors, Proceedings of the 22nd Australasian Joint Conference on Artificial Intelligence (AI'09) , volume 5866 of Lecture Notes in Computer Science, pages 432-442, Melbourne, Australia. Springer.
  29. Omran, M. G., Engelbrecht, A. P., and Salman, A. (2005). Differential evolution methods for unsupervised image classification. In Evolutionary Computation, 2005. The 2005 IEEE Congress on, volume 2, pages 966-973. IEEE.
  30. O'Neill, M. and Brabazon, A. (2006a). Grammatical differential evolution. In Arabnia, H. R., editor, Proceedings of the 2006 International Conference on Artificial Intelligence, ICAI 2006, volume 1, pages 231- 236, Las Vegas, Nevada, USA. CSREA Press.
  31. O'Neill, M. and Brabazon, A. (2006b). Self-organizing swarm (soswarm): a particle swarm algorithm for unsupervised learning. In Evolutionary Computation, 2006. CEC 2006. IEEE Congress on, pages 634-639. IEEE.
  32. O'Neill, M., Leahy, F., and Brabazon, A. (2006). Grammatical swarm: A variable-length particle swarm algorithm. In Nedjah, N. and de Macedo Mourelle, L., editors, Swarm Intelligent Systems, volume 26 of Studies in Computational Intelligence, chapter 5, pages 59- 74. Springer.
  33. O'Neill, M. and Ryan, C. (1999). Automatic generation of programs with grammatical evolution. In Bridge, D., Byrne, R., O'Sullivan, B., Prestwich, S., and Sorensen, H., editors, Artificial Intelligence and Cognitive Science AICS 1999, number 10 in , University College Cork, Ireland.
  34. Pan, H., Zhu, J., and Han, D. (2003). Genetic algorithms applied to multi-class clustering for gene expression data. Genomics, Proteomics, Bioinformatics, 1(4):279-287.
  35. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825-2830.
  36. Raghavan, U. N., Albert, R., and Kumara, S. (2007). Near linear time algorithm to detect community structures in large-scale networks. Physical Review E, 76(3):036106.
  37. Ryan, C. and O'Neill, M. (2002). How to do anything with grammars. In Barry, A. M., editor, GECCO 2002: Proceedings of the Bird of a Feather Workshops, Genetic and Evolutionary Computation Conference, pages 116-119, New York. AAAI.
  38. Smart, W. and Zhang, M. (2004). Probability based genetic programming for multiclass object classification. Technical Report CS-TR-04-7, Computer Science, Victoria University of Wellington, New Zealand.
  39. Smart, W. and Zhang, M. (2005). Using genetic programming for multiclass classification by simultaneously solving component binary classification problems. In Genetic Programming, pages 227-239. Springer.
  40. Steinhaus, H. (1957). Sur la division des corps matériels en parties. Bull. Acad. Pol. Sci., Cl. III, 4:801-804.
  41. Wu, S. X. and Banzhaf, W. (2011). Rethinking multilevel selection in genetic programming. In Krasnogor, N., Lanzi, P. L. et al., editors, GECCO 7811: Proceedings of the 13th annual conference on Genetic and evolutionary computation, pages 1403-1410, Dublin, Ireland. ACM. Best paper.
  42. Zhang, M. and Smart, W. (2004). Multiclass object classification using genetic programming. In Applications of Evolutionary Computing, pages 369-378. Springer.
Download


Paper Citation


in Harvard Style

M. Fitzgerald J., Azad R. and Ryan C. (2015). GEML: Evolutionary Unsupervised and Semi-Supervised Learning of Multi-class Classification with Grammatical Evolution . In Proceedings of the 7th International Joint Conference on Computational Intelligence - Volume 1: ECTA, ISBN 978-989-758-157-1, pages 83-94. DOI: 10.5220/0005599000830094


in Bibtex Style

@conference{ecta15,
author={Jeannie M. Fitzgerald and R. Mohammed Atif Azad and Conor Ryan},
title={GEML: Evolutionary Unsupervised and Semi-Supervised Learning of Multi-class Classification with Grammatical Evolution},
booktitle={Proceedings of the 7th International Joint Conference on Computational Intelligence - Volume 1: ECTA,},
year={2015},
pages={83-94},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005599000830094},
isbn={978-989-758-157-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 7th International Joint Conference on Computational Intelligence - Volume 1: ECTA,
TI - GEML: Evolutionary Unsupervised and Semi-Supervised Learning of Multi-class Classification with Grammatical Evolution
SN - 978-989-758-157-1
AU - M. Fitzgerald J.
AU - Azad R.
AU - Ryan C.
PY - 2015
SP - 83
EP - 94
DO - 10.5220/0005599000830094