Classification Confusion within Nefclass Caused by Feature Value Skewness in Multi-dimensional Datasets

Jamileh Yousefi, Andrew Hamilton-Wright

2016

Abstract

This paper presents a model for treatment of skewness effect on accuracy of the Nefclass classifier by changing embedded discretization method within the classifier. Nefclass is a common example of the popular construction of a Neurofuzzy system. The popular Nefclass classifier exhibits surprising behaviour when the feature values of the training and testing data sets exhibit significant skew. As skewed feature values are commonly observed in biological data sets, this is a topic that is of interest in terms of the applicability of such a classifier to these types of problems. From this study it is clear that the effect of skewness on classification accuracy is significant and this must be considered in work dealing with skewed data distributions. We compared accuracy of Nefclass classifier with two modified versions of Nefclass embedded with MME and CAIM discretization methods. From this study it is found the CAIM and MME discretization methods results in greater improvements in the classification accuracy of Nefclass classifier as compared to using the original EqualWidth technique.

References

  1. Au, W., Chan, K., and Wong, A. (2006). A fuzzy approach to partitioning continues attributes for classification. IEEE Transactions on Knowledge and Data Engineering, 18:715-719.
  2. Bertoluzza, C. and Forte, B. (1985). Mutual dependence of random variables and maximum discretized entropy. The Annals of Probability, 13(2):630-637.
  3. Cano, A., T., N. D., Ventura, S., and Cios, K. J. (2016). urcaim: improved caim discretization for unbalanced and balanced data. Soft Computing, 33:173-188.
  4. Changyong, F., Hongyue, W., Naiji, L., Tian, C., Hua, H., Ying, L., and Xin, M. (2014). Log-transformation and its implications for data analysis. Shanghai Arch Psychiatry, 26(2):105-109.
  5. Chau, T. (2001). Marginal maximum entropy partitioning yields asymptotically consistent probability density functions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(4):414-417.
  6. Chemielewski, M. R. and Grzymala-Busse, J. W. (1996). Global discretization of continuous attributes as preprocessing for machine learning. International Journal of Approximate Reasoning, 15:319-331.
  7. Chittineni, S. and Bhogapathi, R. B. (2012). A study on the behavior of a neural network for grouping the data. International Journal of Computer Science, 9(1):228- 234.
  8. Gokhale, D. V. (1999). On joint and conditional entropies. Entropy, 1(2):21-24.
  9. Hubert, M. and Van der Veeken, S. (2010). Robust classification for skewed data. Advances in Data Analysis and Classification , 4:239-254.
  10. Kerber, R. (1992). ChiMerge discretization of numeric attributes. In Proceedings of AAAI-92, pages 123-12, San Jose Convention Center, San Jose, California.
  11. Klose, A., Nürnberger, A., and Nauck, D. (1999). Improved NEFCLASS pruning techniques applied to a real world domain. In Proceedings Neuronale Netze in der Anwendung, University of Magdeburg. NN'99.
  12. Kurgan, L. A. and Cios, K. (2004). CAIM discretization algorithm. IEEE Transactions on Knowledge and Data Engineering, 16:145-153.
  13. Liu, Y., Liu, X., and Su, Z. (2008). A new fuzzy approach for handling class labels in canonical correlation analysis. Neurocomputing, 71:1785-1740.
  14. Mansoori, E., Zolghadri, M., and Katebi, S. (2007). A weighting function for improving fuzzy classification systems performance. Fuzzy Sets and Systems, 158:588-591.
  15. Mendel, J. M. (2001). Uncertain Rule-Based Fuzzy Logic Systems. Prentice-Hall.
  16. Monti, S. and Cooper, G. (1999). A latent variable model for multivariate discretization. In The Seventh International Workshop on Artificial Intelligence and Statistics, pages 249-254, Fort Lauderdale, FL.
  17. Natrella, M. (2003). NIST SEMATECH eHandbook of Statistical Methods. NIST.
  18. Nauck, D., Klawonn, F., and Kruse, R. (1996). Neuro-Fuzzy Systems. John Wiley and Sons Inc., New York.
  19. Nauck, D. and Kruse, R. (1998). NEFCLASS-X - a soft computing tool to build readable fuzzy classifiers. BT Technology Journal, 16(3):180-190.
  20. Peker, N. E. S. (2011). Exponential membership function evaluation based on frequency. Asian Journal of Mathematics and Statistics, 4:8-20.
  21. Qiang, Q. and Guillermo, S. (2015). Learning transformations for clustering and classification. Journal of Machine Learning Research, 16:187-225.
  22. Tang, Y. and Chiu, C. (2004). Function approximation via particular input space partition and region-based exponential membership functions. Fuzzy Sets and Systems, 142:267-291.
  23. Zadkarami, M. R. and Rowhani, M. (2010). Application of skew-normal in classification of satellite image. Journal of Data Science, 8:597-606.
Download


Paper Citation


in Harvard Style

Yousefi J. and Hamilton-Wright A. (2016). Classification Confusion within Nefclass Caused by Feature Value Skewness in Multi-dimensional Datasets . In Proceedings of the 8th International Joint Conference on Computational Intelligence - Volume 2: FCTA, (IJCCI 2016) ISBN 978-989-758-201-1, pages 21-29. DOI: 10.5220/0006033800210029


in Bibtex Style

@conference{fcta16,
author={Jamileh Yousefi and Andrew Hamilton-Wright},
title={Classification Confusion within Nefclass Caused by Feature Value Skewness in Multi-dimensional Datasets},
booktitle={Proceedings of the 8th International Joint Conference on Computational Intelligence - Volume 2: FCTA, (IJCCI 2016)},
year={2016},
pages={21-29},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006033800210029},
isbn={978-989-758-201-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 8th International Joint Conference on Computational Intelligence - Volume 2: FCTA, (IJCCI 2016)
TI - Classification Confusion within Nefclass Caused by Feature Value Skewness in Multi-dimensional Datasets
SN - 978-989-758-201-1
AU - Yousefi J.
AU - Hamilton-Wright A.
PY - 2016
SP - 21
EP - 29
DO - 10.5220/0006033800210029