rics offer a similar, even superior performance than
Wrapper metrics. Also, Filter type metrics allow very
drastic reduction costs. Here, a great alternative for
the evaluation of the criteria for sample selection is
suggested. This alternative reduces the computational
time required to predict protein location without de-
creasing accuracy even obtaining better performances
than with Wrapper metrics. Nevertheless, it is nec-
essary to develop a methodology that includes class
information to get a better understanding of the influ-
ence of this feature on the interaction performances
balance using filter metrics.
ACKNOWLEDGEMENTS
This work was partially funded by the Research
office (DIMA) at the Universidad Nacional de
Colombia at Manizales and the Colombian National
Research Centre (COLCIENCIAS) through grant
No.111952128388 and the ”jovenes investigadores e
innovadores - 2010 Virginia Gutierrez de Pineda” fel-
lowship.
REFERENCES
Al-Shahib, A., Breitling, R., and Gilbert, D. (2005). Feature
selection and the class imbalance problem in predict-
ing protein function from sequence. In Applied Bioin-
formatics, volume 4, page 195.
Ashburner, M., Ball, C., Blake, J., Botstein, D., Butler, H.,
Cherry, J., Davis, A., Dolinski, K., Dwight, S., and
Eppig, J. (2000). Gene ontology: tool for the uni-
fication of biology. In Nature genetics, volume 25,
page 25.
Chawla, N., Hall, L. O., Bowyer, K. W., and Kegelmeyer,
W. P. (2002). Smote: Synthetic minority oversam-
pling technique. In Journal of Artificial Intelligence
Research., volume 16, page 321.
Chawla, N., Japkowicz, N., and Kotcz, A. (2004). Editorial:
special issue on learning from imbalanced data sets. In
ACM SIGKDD Explorations Newsletter, volume 6.
Chou, K. and Shen, H. (2010). Plant-mploc: a top-down
strategy to augment the power for predicting plant pro-
tein subcellular localization. In PLoS One, volume 5.
Cortes, C. and Mohri, M. (2004). Auc optimization vs error
rate minimization. In In Advances in neural informa-
tion processing systems 16: proceedings of the 2003
conference, volume 16, page 313.
Ehrlich, J., Hansen, M., and Nelson, W. (2002). Spatiotem-
poral regulation of rac1 localization and lamellipodia
dynamics during epithelial cell-cell adhesion. In De-
velopmental Cell, volume 3.
Garc
´
ıa-L
´
opez, S., Jaramillo-Garz
´
on, J. A., and Castellanos-
Dom
´
ınguez, C. G. (2011). Estudio de m
´
etodos de
balance de clases en la predicci
´
on de ubicaciones
subcelulares de prote
´
ınas a trav
´
es de m
´
etodos de
reconocimiento de patrones. In XVI Simposio de
tratamiento de se
˜
nales, im
´
agenes y visi
´
on artificial,
STSIVA.
Garc
´
ıa, S. and Herrera, F. (2008). Evolutionary under-
sampling for classification with imbalanced data
sets:proposals and taxonomy. In Evolutionary Com-
putation.
Glory, E. and Murphy, R. (2007). Automated subcellu-
lar location determination and high-throughput mi-
croscopy. In Developmental Cell, volume 12.
He, H. and Garcia, E. (2008). Learning from imbalanced
data. In IEEE Transactions on Knowledge and Data
Engineering, page 1263.
Jain, E., Bairoch, A., Duvaud, S., Phan, I., Redaschi, N.,
Suzek, B., Martin, M., McGarvey, P., and Gasteiger,
E. (2009). Infrastructure for the life sciences: design
and implementation of the uniprot website. In BMC
bioinformatics, volume 10.
Jaramillo-Garz
´
on, J. A., Perera-Lluna, A., and Castellanos-
Dom
´
ınguez, C. G. (2010). Predictability of pro-
tein subcellular locations by pattern recognition tech-
niques. In Proceedings of the 32nd Annual Inter-
national Conference of the IEEE EMBS 2010, pages
5512–5515.
Luengo, I., Navas, E., Hern
´
andez, I., and S
´
anchez, J.
(2005). Reconocimiento automtico de emociones uti-
lizando parmetros prosdicos. In Procesamiento del
lenguaje natural, volume 35, page 1320.
Meyer, I. (2007). A practical guide to the art of rna gene
prediction. brie fings in bioinformatics. In Briefings
in bioinformatics, volume 8.
Pengyi, Y., Liang, X., Bing, Z., Zili, Z., and Albert, Z.
(2009). A particle swarm based hybrid system for im-
balanced medical data sampling. In BMC Genomics,
volume 10, page 396.
Sonnenburg, S., Schweikert, G., Philips, P., Behr, J., and
Ratsch, G. (2007). Accurate splice site prediction us-
ing support vector machines. In BMC bioinformatics,
volume 8.
Webb, A. (2002). Statistical pattern recognition. In John
Wiley and Sons Inc.
Yu, L. and Liu, H. (2004). Efficient feature selection via
analysis of relevance and redundancy. In The Journal
of Machine Learning Research, volume 5, page 1205.
WRAPPER AND FILTER METRICS FOR PSO-BASED CLASS BALANCE APPLIED TO PROTEIN SUBCELLULAR
LOCALIZATION
219