Authors:
S. Garcia López
1
;
J. A. Jaramillo-Garzón
2
;
J. C. Higuita-Vásquez
1
and
C. G. Castellanos-Domínguez
1
Affiliations:
1
Universidad Nacional de Colombia, Colombia
;
2
Universidad Nacional de Colombia and Instituto Tecnológico Metropolitano, Colombia
Keyword(s):
Class imbalance, Filter, PSO, Separability criterion, Subsampling, Wrapper.
Related
Ontology
Subjects/Areas/Topics:
Bioinformatics
;
Biomedical Engineering
;
Genomics and Proteomics
;
Pattern Recognition, Clustering and Classification
Abstract:
Recent advances in proteomic research have generated an unprecedented amount of stored data. Given the size of current databases, manual annotation has become an almost intractable process, paving the way to the use of computational methods. In this context, considering that a single protein can belong to several functional classes, a multi-label classification problem is generated. The most common way to cope with these problems is by training a number of classifiers equal to the number of classes that will allow taking independent decisions on the membership of proteins. Nevertheless, this methodology leads to a high degree of imbalance between classes, magnifying the disparity already present in their size. Current balancing techniques are based on the optimization of criteria leading to a better subset that represent the data. Moreover, most of the sample selection criteria are based on the Wrapper type metrics. However, Wrapper metrics are computationally quite expensive. This w
ork presents a comparative analysis between the Wrapper and Filter metrics as the sample selection criteria in balance techniques. In order to accomplish this task, a subsampling technique based on the Particle Swarm Optimization method to obtain the optimal balance subset is used. The results show that filter metrics notably improved the computational cost obtaining a similar performance when compared with the Wrapper type metrics.
(More)