5 CONCLUSIONS
The results of this study indicate that the NEFCLASS
classifier performs increasingly poorly as data feature
value skewness increases. Further, this study indi-
cates that the choice of initial discretization method
affects the classification accuracy of NEFCLASS clas-
sifier, and that this effect is very strong in skewed data
sets. Utilizing MME or CAIM discretization meth-
ods in the NEFCLASS classifier improved classifica-
tions accuracy.
ACKNOWLEDGEMENTS
The authors gratefully acknowledge the support of
NSERC, the National Sciences and Engineering Re-
search Council of Canada, for ongoing grant support.
REFERENCES
Au, W., Chan, K., and Wong, A. (2006). A fuzzy approach
to partitioning continues attributes for classification.
IEEE Transactions on Knowledge and Data Engineer-
ing, 18:715–719.
Bertoluzza, C. and Forte, B. (1985). Mutual dependence of
random variables and maximum discretized entropy.
The Annals of Probability, 13(2):630–637.
Cano, A., T., N. D., Ventura, S., and Cios, K. J. (2016).
urcaim: improved caim discretization for unbalanced
and balanced data. Soft Computing, 33:173–188.
Changyong, F., Hongyue, W., Naiji, L., Tian, C., Hua, H.,
Ying, L., and Xin, M. (2014). Log-transformation and
its implications for data analysis. Shanghai Arch Psy-
chiatry, 26(2):105–109.
Chau, T. (2001). Marginal maximum entropy partition-
ing yields asymptotically consistent probability den-
sity functions. IEEE Transactions on Pattern Analysis
and Machine Intelligence, 23(4):414–417.
Chemielewski, M. R. and Grzymala-Busse, J. W. (1996).
Global discretization of continuous attributes as pre-
processing for machine learning. International Jour-
nal of Approximate Reasoning, 15:319–331.
Chittineni, S. and Bhogapathi, R. B. (2012). A study on the
behavior of a neural network for grouping the data.
International Journal of Computer Science, 9(1):228–
234.
Gokhale, D. V. (1999). On joint and conditional entropies.
Entropy, 1(2):21–24.
Hubert, M. and Van der Veeken, S. (2010). Robust classi-
fication for skewed data. Advances in Data Analysis
and Classification, 4:239–254.
Kerber, R. (1992). ChiMerge discretization of numeric at-
tributes. In Proceedings of AAAI-92, pages 123–12,
San Jose Convention Center, San Jose, California.
Klose, A., N
¨
urnberger, A., and Nauck, D. (1999). Im-
proved NEFCLASS pruning techniques applied to a
real world domain. In Proceedings Neuronale Netze
in der Anwendung, University of Magdeburg. NN’99.
Kurgan, L. A. and Cios, K. (2004). CAIM discretization al-
gorithm. IEEE Transactions on Knowledge and Data
Engineering, 16:145–153.
Liu, Y., Liu, X., and Su, Z. (2008). A new fuzzy approach
for handling class labels in canonical correlation anal-
ysis. Neurocomputing, 71:1785–1740.
Mansoori, E., Zolghadri, M., and Katebi, S. (2007). A
weighting function for improving fuzzy classifica-
tion systems performance. Fuzzy Sets and Systems,
158:588–591.
Mendel, J. M. (2001). Uncertain Rule-Based Fuzzy Logic
Systems. Prentice-Hall.
Monti, S. and Cooper, G. (1999). A latent variable model
for multivariate discretization. In The Seventh Interna-
tional Workshop on Artificial Intelligence and Statis-
tics, pages 249–254, Fort Lauderdale, FL.
Natrella, M. (2003). NIST SEMATECH eHandbook of Sta-
tistical Methods. NIST.
Nauck, D., Klawonn, F., and Kruse, R. (1996). Neuro-Fuzzy
Systems. John Wiley and Sons Inc., New York.
Nauck, D. and Kruse, R. (1998). NEFCLASS-X – a soft
computing tool to build readable fuzzy classifiers. BT
Technology Journal, 16(3):180–190.
Peker, N. E. S. (2011). Exponential membership func-
tion evaluation based on frequency. Asian Journal of
Mathematics and Statistics, 4:8–20.
Qiang, Q. and Guillermo, S. (2015). Learning transforma-
tions for clustering and classification. Journal of Ma-
chine Learning Research, 16:187–225.
Tang, Y. and Chiu, C. (2004). Function approximation via
particular input space partition and region-based ex-
ponential membership functions. Fuzzy Sets and Sys-
tems, 142:267–291.
Zadkarami, M. R. and Rowhani, M. (2010). Application of
skew-normal in classification of satellite image. Jour-
nal of Data Science, 8:597–606.
Classification Confusion within NEFCLASS Caused by Feature Value Skewness in Multi-dimensional Datasets
29