![](bg9.png)
tistically comparable to that of COCOA and ECC. We
should also note that, only in one case, NaNUML-
100% has achieved a statistically inferior perfor-
mance (against COCOA, on average precision). The
above-summarized results ascertain the appropriate-
ness of the proposed method, NaNUML, over existing
schemes dedicated to multi-label learning and class-
imbalance mitigation. It is to be noted that, being an
undersampling scheme, NaNUML reduces the com-
plexity associated with the classifier modeling.
7 CONCLUSION
In this work, we have presented a novel label-specific
undersampling scheme, NaNUML, for multi-label
datasets. NaNUML is based on the parameter-free
natural neighbor search, and the critical factor, neigh-
borhood size ’k’, is determined without invoking any
parameter optimization. In our scheme, we eliminate
the majority instances closer to the minority class. In
addition, we preserve the critical lattices of the major-
ity class by looking at the majority natural neighbor
count of the majority class. The other advantage of
the scheme is that we require only one natural neigh-
bor search for all labels. Undersampling schema has
the intrinsic characteristic of reducing the complex-
ity in the classifier modeling phase (through the re-
duction in training data), and NaNUML is no excep-
tion. The performance of NaNUML indicates its abil-
ity to mitigate the class-imbalance issue in multi-label
datasets to a considerable extent.
In our future work, we would like to design
a natural-neighborhood-based oversampling scheme
for class-imbalanced datasets. We would also like to
explore if we can incorporate label correlations in our
undersampling scheme.
REFERENCES
Ali, H., Salleh, M. N. M., Hussain, K., Ahmad, A., Ul-
lah, A., Muhammad, A., Naseem, R., and Khan, M.
(2019). A review on data preprocessing methods for
class imbalance problem. International Journal of En-
gineering & Technology, 8:390–397.
Charte, F., Rivera, A. J., del Jesus, M. J., and Herrera, F.
(2014). Mlenn: a first approach to heuristic multil-
abel undersampling. In International Conference on
Intelligent Data Engineering and Automated Learn-
ing, pages 1–9. Springer.
Charte, F., Rivera, A. J., del Jesus, M. J., and Herrera, F.
(2015a). MLSMOTE: approaching imbalanced mul-
tilabel learning through synthetic instance generation.
Knowledge-Based Systems, 89:385–397.
Charte, F., Rivera, A. J., del Jesus, M. J., and Herrera, F.
(2015b). Mlsmote: Approaching imbalanced multi-
label learning through synthetic instance generation.
Knowledge-Based Systems, 89:385–397.
Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer,
W. P. (2002). SMOTE: synthetic minority over-
sampling technique. J. Artif. Int. Res., 16(1):321–357.
Choirunnisa, S. and Lianto, J. (2018). Hybrid method of
undersampling and oversampling for handling imbal-
anced data. In 2018 International Seminar on Re-
search of Information Technology and Intelligent Sys-
tems (ISRITI), pages 276–280. IEEE.
Daniels, Z. and Metaxas, D. (2017). Addressing imbalance
in multi-label classification using structured hellinger
forests. In Proceedings of the AAAI Conference on
Artificial Intelligence, volume 31.
Elisseeff, A. and Weston, J. (2001). A kernel method for
multi-labelled classification. In Proceedings of the
14th International Conference on Neural Information
Processing Systems: Natural and Synthetic, NIPS’01,
pages 681–687, Cambridge, MA, USA. MIT Press.
Fürnkranz, J., Hüllermeier, E., Loza Mencía, E., and
Brinker, K. (2008). Multilabel classification via cali-
brated label ranking. Mach. Learn., 73(2):133–153.
Godbole, S. and Sarawagi, S. (2004). Discriminative meth-
ods for multi-labeled classification. In Proceedings
of the 8th Pacific-Asia Conference on Knowledge Dis-
covery and Data Mining, pages 22–30.
Gonzalez-Lopez, J., Ventura, S., and Cano, A. (2018). Dis-
tributed nearest neighbor classification for large-scale
multi-label data on spark. Future Generation Com-
puter Systems, 87:66–82.
He, H. and Garcia, E. A. (2009). Learning from imbal-
anced data. IEEE Trans. on Knowl. and Data Eng.,
21(9):1263–1284.
Joachims, T. (1998). Text categorization with support vec-
tor machines: Learning with many relevant features.
In European conference on machine learning, pages
137–142. Springer.
Katakis, I., Tsoumakas, G., and Vlahavas, I. (2008). Multi-
label text classification for automated tag suggestion.
In: Proceedings of the ECML/PKDD-08 Workshop on
Discovery Challenge.
Li, X., Zhao, F., and Guo, Y. (2014). Multi-label image
classification with a probabilistic label enhancement
model. In Uncertainty in Artificial Intelligence.
Liu, B. and Tsoumakas, G. (2020). Dealing with class
imbalance in classifier chains via random undersam-
pling. Knowledge-Based Systems, 192:105292.
Liu, Y., Wen, K., Gao, Q., Gao, X., and Nie, F. (2018). Svm
based multi-label learning with missing labels for im-
age annotation. Pattern Recognition, 78:307–317.
Ludera, D. T. (2021). Credit card fraud detection
by combining synthetic minority oversampling and
edited nearest neighbours. In Future of Informa-
tion and Communication Conference, pages 735–743.
Springer.
Moyano, J. M., Gibaja, E. L., Cios, K. J., and Ventura, S.
(2018). Review of ensembles of multi-label classi-
Parameter-Free Undersampling for Multi-Label Data
405