designed to analyse the behaviour and sensitivity of
these measures in imbalanced problems. The results
obtained suggest that AUC, Gm, IBA, wAUC and cwA
are more suitable for dealing with imbalance. How-
ever, two of these metrics, AUC and Gm, do not de-
tect the exchange of positive and negative values in
the confusion matrix, so they may not recognize the
asymmetry of class results. In the case of F-measure,
which has been also proposed as a favorable metric
on imbalanced data sets, the previousanalysis showed
that this measure is highly correlated with the results
on the negative class.
ACKNOWLEDGEMENTS
This work has partially been supported by the
Spanish Ministry of Education and Science un-
der grants CSD2007–00018, AYA2008–05965–0596
and TIN2009–14205, the Fundaci´o Caixa Castell´o-
Bancaixa under grant P1–1B2009–04, and the Gener-
alitat Valenciana under grant PROMETEO/2010/028.
REFERENCES
Batuwita, R. and Palade, V. (2009). A new performance
measure for class imbalance learning: application to
bioinformatics problems. In the 8th ICMLA’09, pages
545–550.
Bradley, A. P. (1997). The use of the area under the ROC
curve in the evaluation of machine learning algorithms.
Pattern Recognition, 30(7):1145–1159.
Cohen, G., Hilario, M., Sax, H., Hugonnet, S., and Geiss-
buhler, A. (2006). Learning from imbalanced data in
surveillance of nosocomial infection. Artificial Intelli-
gence Medicine, 37(1):7–18.
Cohen, J. (1960). A coefficient of agreement for nominal
scales. Educ and Psychol Meas, 20(1):37–46.
Daskalaki, S., Kopanas, I., and Avouris, N. (2006). Evalu-
ation of classifiers for an uneven class distribution prob-
lem. Applied Artificial Intelligence, 20(5):381–417.
Fatourechi, M., Ward, R., Mason, S., Huggins, J., Schlogl,
A., and Birch, G. (2008). Comparison of evaluation
metrics in classification applications with imbalanced
datasets. In the 7th ICMLA’08, pages 777 –782.
Ferri, C., Hern´andez-Orallo, J., and Modroiu, R. (2009). An
experimental comparison of performance measures for
classification. Pattern Recognition Letters, 30(1):27–38.
Folleco, A., Khoshgoftaar, T. M., and Napolitano, A.
(2008). Comparison of four performance metrics for
evaluating sampling techniques for low quality class-
imbalanced data. In the 7th ICMLA’08, pages 153–158.
Garc´ıa, V., Mollineda, R. A., and S´anchez, J. S. (2010).
Theoretical analysis of a performance measure for im-
balanced data. In the 20th ICPR’2010, pages 617–620.
Gu, Q., Zhu, L., and Cai, Z. (2009). Evaluation measures
of the classification performance of imbalanced data sets.
In the 4th ISICA’09, pages 461–471. Springer-Verlag.
Huang, J. and Ling, C.-X. (2005). Using AUC and ac-
curacy in evaluating learning algorithms. IEEE Trans
Knowl Data Eng, 17(3):299–310.
Huang, J. and Ling, C.-X. (2007). Constructing new and
better evaluation measures for machine learning. In the
20th IJCAI’07, pages 859–864.
Kamal, A., Zhu, X., Pandya, A., Hsu, S., and Shoaib, M.
(2009). The impact of gene selection on imbalanced mi-
croarray expression data. In the 1st BICoB’09, pages
259–269.
Kennedy, K., Mac Namee, B., and Delany, S. (2010).
Learning without default: A study of one-class classi-
fication and the low-default portfolio problem. In the
AICS’09, pages 174–187.
Khalilia, M., Chakraborty, S., and Popescu, M. (2011). Pre-
dicting disease risks from highly imbalanced data using
random forest. BMC Medical Informatics and Decision
Making, 11(1):51.
Kubat, M. and Matwin, S. (1997). Addressing the curse of
imbalanced training sets: one-sided selection. In 14th
ICML, pages 179–186.
Ranawana, R. and Palade, V. (2006). Optimized Precision -
a new measure for classifier performance evaluation. In
the IEEE CEC’09, pages 2254–2261.
Rijsbergen, C. J. V. (1979). Information Retrieval. Butter-
worths, London, UK.
Seliya, N., Khoshgoftaar, T., and Van Hulse, J. (2009). A
study on the relationships of classifier performance met-
rics. In the 21st ICTAI’09, pages 59 –66.
Sokolova, M., Japkowicz, N., and Szpakowicz, S. (2006).
Beyond accuracy, f-score and roc: A family of discrim-
inant measures for performance evaluation. In the AJ-
CAI’06, pages 1015–1021.
Sokolova, M. and Lapalme, G. (2009). A systematic analy-
sis of performance measures for classification tasks. Inf
Process & Manag, 45(4):427–437.
Sun, Y., Wong, A., and Kamel, M. S. (2009). Classifica-
tion of imbalanced data: A review. International Jour-
nal of Pattern Recognition and Artificial Intelligence,
23(4):687–719.
Tan, P.-N., Kumar, V., and Srivastava, J. (2002). Selecting
the right interestingness measure for association patterns.
In the 8th ACM SIGKDD’02, pages 32–41.
Weng, C. G. and Poon, J. (2008). A new evaluation measure
for imbalanced datasets. In the 7th AusDM’08, pages 27–
32.
ON THE SUITABILITY OF NUMERICAL PERFORMANCE MEASURES FOR CLASS IMBALANCE PROBLEMS
313