Authors:
Dawit Nigatu
and
Werner Henkel
Affiliation:
Jacobs University Bremen, Germany
Keyword(s):
Essential Genes, Information-theoretic Features, Machine Learning, SVM, Markov Order Estimation.
Abstract:
Computational tools have enabled a relatively simple prediction of essential genes (EGs), which would otherwise
be done by costly and tedious gene knockout experimental procedures. We present a machine learning
based predictor using information-theoretic features derived exclusively from DNA sequences. We used
entropy, mutual information, conditional mutual information, and Markov chain models as features. We employed
a support vector machine (SVM) classifier and predicted the EGs in 15 prokaryotic genomes. A fivefold
cross-validation on the bacteria E. coli, B. subtilis, and M. pulmonis resulted in AUC score of 0.85, 0.81,
and 0.89, respectively. In cross-organism prediction, the EGs of a given bacterium are predicted using a model
trained on the rest of the bacteria. AUC scores ranging from 0.66 to 0.9 and averaging 0.8 were obtained. The
average AUC of the classifier on a one-to-one prediction among E. coli, B. subtilis, and Acinetobacter is 0.85.
The performance of our predictor
is comparable with recent and state-of-the art predictors. Considering that
we used only sequence information on a problem that is much more complicated, the achieved results are very
good.
(More)