Naïve Bayes Domain Adaptation for Biological Sequences

Nic Herndon, Doina Caragea


The increased volume of biological data requires automatic computation tools to analyze it. Although machine learning methods have been successfully used with biological sequences in a supervised framework, their accuracy usually suffers when a classifier is learned on a source domain and applied to a different, less studied domain, in a domain adaptation framework. To address this issue, we propose to use an algorithm that combines labeled sequences from a well studied organism, the source domain, with labeled and unlabeled sequences from a related, less studied organism, the target domain. Our experimental results show that this algorithm has high classifying accuracy on the target domain.


