tistically comparable to that of COCOA and ECC. We
should also note that, only in one case, NaNUML-
100% has achieved a statistically inferior perfor-
mance (against COCOA, on average precision). The
above-summarized results ascertain the appropriate-
ness of the proposed method, NaNUML, over existing
schemes dedicated to multi-label learning and class-
imbalance mitigation. It is to be noted that, being an
undersampling scheme, NaNUML reduces the com-
plexity associated with the classifier modeling.
In this work, we have presented a novel label-specific
undersampling scheme, NaNUML, for multi-label
datasets. NaNUML is based on the parameter-free
natural neighbor search, and the critical factor, neigh-
borhood size ’k’, is determined without invoking any
parameter optimization. In our scheme, we eliminate
the majority instances closer to the minority class. In
addition, we preserve the critical lattices of the major-
ity class by looking at the majority natural neighbor
count of the majority class. The other advantage of
the scheme is that we require only one natural neigh-
bor search for all labels. Undersampling schema has
the intrinsic characteristic of reducing the complex-
ity in the classifier modeling phase (through the re-
duction in training data), and NaNUML is no excep-
tion. The performance of NaNUML indicates its abil-
ity to mitigate the class-imbalance issue in multi-label
datasets to a considerable extent.
In our future work, we would like to design
a natural-neighborhood-based oversampling scheme
for class-imbalanced datasets. We would also like to
explore if we can incorporate label correlations in our
undersampling scheme.
Parameter-Free Undersampling for Multi-Label Data