Authors:
S. García-López
1
;
J. A. Jaramillo-Garzón
2
;
L. Duque-Muñoz
2
and
C. G. Castellanos-Domínguez
1
Affiliations:
1
Universidad Nacional de Colombia, Colombia
;
2
Universidad Nacional de Colombia and Instituto Tecnológico Metropolitano, Colombia
Keyword(s):
Molecular Functions Prediction, Proteins, Cuckoo Search, Cost Sensitive Learning, Class Imbalance.
Related
Ontology
Subjects/Areas/Topics:
Algorithms and Software Tools
;
Artificial Intelligence
;
Bioinformatics
;
Biomedical Engineering
;
Computational Intelligence
;
Genomics and Proteomics
;
Pattern Recognition, Clustering and Classification
;
Sequence Analysis
;
Soft Computing
Abstract:
Due to the large amount of data generated by genomics and proteomics research, the use of computational methods has been a great support tool for this purpose. However, tools based on machine learning, face several problems associated to the nature of the data, one of them is the class-imabalance problem. Several balancing techniques exist to obtain an improvement in prediction performance, such as boosting and resampling, but they have multiple weaknesses in difficult data spaces. On the other hand, cost sensitive learning is an alternative solution, yet, the obtention of appropriate cost matrix to induce a good prediction model is complex, and still remains an open problem. In this paper, a methodology to obtain an optimal cost matrix to train models based on cost sensitive learning is proposed. The results show that cost sensitive learning with a proper cost can be very competitive, and even outperform many class-balance strategies in the state of the art. Tests were applied to pr
ediction of molecular functions in Embryophyta plants.
(More)