Integrating Unsupervised Clustering and Label-Specific Oversampling to Tackle Imbalanced Multi-Label Data

Payel Sadhukhan, Arjun Pakrashi, Sarbani Palit, Brian Mac Namee

2023

Abstract

There is often a mixture of very frequent labels and very infrequent labels in multi-label datasets. This variation in label frequency, a type class imbalance, creates a significant challenge for building efficient multi-label classification algorithms. In this paper, we tackle this problem by proposing a minority class oversampling scheme, UCLSO, which integrates Unsupervised Clustering and Label-Specific data Oversampling. Clustering is performed to find out the key distinct and locally connected regions of a multi-label dataset (irrespective of the label information). Next, for each label, we explore the distributions of minority points in the cluster sets. Only the intra-cluster minority points are used to generate the synthetic minority points. Despite having the same cluster set across all labels, we will use the label-specific class information to obtain a variation in the distributions of the synthetic minority points (in congruence with the label-specific class memberships within the clusters) across the labels. The training dataset is augmented with the set of label-specific synthetic minority points, and classifiers are trained to predict the relevance of each label independently. Experiments using 12 multi-label datasets and several multi-label algorithms shows the competency of the proposed method over other competing algorithms in the given context.

Download


Paper Citation


in Harvard Style

Sadhukhan P., Pakrashi A., Palit S. and Mac Namee B. (2023). Integrating Unsupervised Clustering and Label-Specific Oversampling to Tackle Imbalanced Multi-Label Data. In Proceedings of the 15th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART, ISBN 978-989-758-623-1, pages 489-498. DOI: 10.5220/0011901200003393


in Bibtex Style

@conference{icaart23,
author={Payel Sadhukhan and Arjun Pakrashi and Sarbani Palit and Brian Mac Namee},
title={Integrating Unsupervised Clustering and Label-Specific Oversampling to Tackle Imbalanced Multi-Label Data},
booktitle={Proceedings of the 15th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,},
year={2023},
pages={489-498},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011901200003393},
isbn={978-989-758-623-1},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 15th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,
TI - Integrating Unsupervised Clustering and Label-Specific Oversampling to Tackle Imbalanced Multi-Label Data
SN - 978-989-758-623-1
AU - Sadhukhan P.
AU - Pakrashi A.
AU - Palit S.
AU - Mac Namee B.
PY - 2023
SP - 489
EP - 498
DO - 10.5220/0011901200003393