Table 1: Mean results for each algorithm.
Algorithm ACC NPV TNR MCC
SurfOpt 88.5% 11% 38.4% 0.156
SVM 97.5% 39.8% 0.7% 0.0409
Adaboost 97.3% 34.5% 7.4% 0.148
Adabag 97.3% 32.8% 8.5% 0.154
Table 1 shows the mean results for 500 synthetic
datasets. Note that the SurfOpt algorithm attains com-
parable performance in terms of mean Matthews Cor-
relation Coefficient (MCC) to the ensemble meth-
ods of Adabag and Adaboost. Support Vector Ma-
chines shows little classification value in terms of
mean MCC. The low mean MCC metric for the SVM
algorithm models can be explained by the fact it is
unable to deal with trade-off regions even with the
radial kernel, as it optimizes for distance to points.
As expected, the skewness made the SVM models bi-
ased to the class with most points. Although SurfOpt
algorithm shows a similar mean MCC to Adaboost
and Adabag ensemble methods, in our algorithm, the
mean True Negative Rate (TNR) is higher than the en-
semble methods, meaning that it correctly predicted a
higher ratio of the negative class data points. With
the observation of those three metrics, it can be iden-
tified the SurfOpt algorithm as a True positive rate
(TPR) detrimental algorithm, as it trades overall ac-
curacy to attain a better classification of the minor-
ity class. SVM, Adaboost and Adabag classifiers are
True Negative Rate (TNR) detrimental, as they only
will predict the minority class if it is a very safe bet.
6 FUTURE WORK
We expect to investigate the properties of an en-
semble approach to the SurfOpt algorithm, using
sets of ununiform curves with complementary opti-
mization criteria, to compensate for individual clas-
sifier’s disadvantages. Other studies may also test
SurfOpt performance regarding oversampling and un-
dersampling approaches, explore the classifier prop-
erties with noisy data, and find a generalization to n-
dimensional spaces.
7 CONCLUSION
In this paper, we have introduced SurfOpt algorithm.
It brings together classification performance with the
benefits of optimizing a classification surface. Op-
posed to common algorithms, SurfOpt algorithm can
better classify the minority class of a binary set even
under severe skewness. We devised a diagnostic to
classification performance for imbalanced data based
on three basic metrics, as an effort to study classifica-
tion under skewness. The source code of this work is
being made avaliable online (Da Silva, 2018).
ACKNOWLEDGMENT
The authors would like to thank the financial support
from CAPES/Brazil and CNPq/Brazil, which was of-
fered through the project Abys (401364/2014-3).
REFERENCES
Alfaro, E., Gamez, M., Garcia, N., et al. (2013). Adabag:
An R Package For Classification with Boosting and
Bagging. Journal of Statistical Software, 54(2):1–35.
Barber, C. B., Dobkin, D. P., and Huhdanpaa, H. (1996).
The Quickhull Algorithm For Convex Hulls. ACM
Transactions on Mathematical Software (TOMS),
22(4):469–483.
Boser, B. E., Guyon, I. M., and Vapnik, V. N. (1992). A
Training Algorithm for Optimal Margin Classifiers. In
Proceedings of the 5th annual workshop on Computa-
tional learning theory, pages 144–152. ACM.
Boughorbel, S., Jarray, F., and El-Anbari, M. (2017).
Optimal Classifier for Imbalanced Data Using
Matthews Correlation Coefficient Metric. PloS one,
12(6):e0177678.
Carneiro, N., Figueira, G., and Costa, M. (2017). A Data
Mining Based System For Credit-Card Fraud Detec-
tion In E-tail. Decision Support Systems, 95:91–101.
Da Silva, A. R. (2018). Surfopt.
https://github.com/andreblumenau/SurfOpt.
Devroye, L. and Toussaint, G. T. (1981). A Note on Linear
Expected Time Algorithms for Finding Convex Hulls.
Computing, 26(4):361–366.
Dimitriadou, E., Hornik, K., Leisch, F., Meyer, D.,
Weingessel, A., and Leisch, M. F. (2006). The e1071
Package. Misc Functions of Department of Statistics
(e1071), TU Wien.
Duchi, J., Hazan, E., and Singer, Y. (2011). Adaptive sub-
gradient methods for online learning and stochastic
optimization. Journal of Machine Learning Research,
12(Jul):2121–2159.
Guo, P., Liu, T., Zhang, Q., Wang, L., Xiao, J., Zhang, Q.,
Luo, G., Li, Z., He, J., Zhang, Y., et al. (2017). Devel-
oping a Dengue Forecast Model using Machine Learn-
ing: A Case Study in China. PLoS neglected tropical
diseases, 11(10):e0005973.
Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue,
H., and Bing, G. (2017). Learning from Class-
imbalanced Data: Review of Methods and Applica-
tions. Expert Systems with Applications, 73:220–239.
Hofmann, M. (2006). Support Vector Machines—Kernels
and the Kernel Trick. Notes, 26.
SurfOpt: A New Surface Method for Optimizing the Classification of Imbalanced Dataset
729