Authors:
Nic Herndon
and
Doina Caragea
Affiliation:
Kansas State University, United States
Keyword(s):
Naïve Bayes, Domain Adaptation, Supervised Learning, Semi-supervised Learning, Self-training, Biological
Sequences, Protein Localization.
Related
Ontology
Subjects/Areas/Topics:
Bioinformatics
;
Biomedical Engineering
;
Data Mining and Machine Learning
;
Sequence Analysis
Abstract:
The increased volume of biological data requires automatic computation tools to analyze it. Although machine learning methods have been successfully used with biological sequences in a supervised framework, their accuracy usually suffers when a classifier is learned on a source domain and applied to a different, less studied domain, in a domain adaptation framework. To address this issue, we propose to use an algorithm that combines labeled sequences from a well studied organism, the source domain, with labeled and unlabeled sequences from a related, less studied organism, the target domain. Our experimental results show that this algorithm has high classifying accuracy on the target domain.