Prediction of Essential Genes based on Machine Learning and Information Theoretic Features
Dawit Nigatu, Werner Henkel
2017
Abstract
Computational tools have enabled a relatively simple prediction of essential genes (EGs), which would otherwise be done by costly and tedious gene knockout experimental procedures. We present a machine learning based predictor using information-theoretic features derived exclusively from DNA sequences. We used entropy, mutual information, conditional mutual information, and Markov chain models as features. We employed a support vector machine (SVM) classifier and predicted the EGs in 15 prokaryotic genomes. A fivefold cross-validation on the bacteria E. coli, B. subtilis, and M. pulmonis resulted in AUC score of 0.85, 0.81, and 0.89, respectively. In cross-organism prediction, the EGs of a given bacterium are predicted using a model trained on the rest of the bacteria. AUC scores ranging from 0.66 to 0.9 and averaging 0.8 were obtained. The average AUC of the classifier on a one-to-one prediction among E. coli, B. subtilis, and Acinetobacter is 0.85. The performance of our predictor is comparable with recent and state-of-the art predictors. Considering that we used only sequence information on a problem that is much more complicated, the achieved results are very good.
DownloadPaper Citation
in Harvard Style
Nigatu D. and Henkel W. (2017). Prediction of Essential Genes based on Machine Learning and Information Theoretic Features. In - BIOINFORMATICS, (BIOSTEC 2017) ISBN , pages 0-0. DOI: 10.5220/0006165700001488
in Bibtex Style
@conference{bioinformatics17,
author={Dawit Nigatu and Werner Henkel},
title={Prediction of Essential Genes based on Machine Learning and Information Theoretic Features},
booktitle={ - BIOINFORMATICS, (BIOSTEC 2017)},
year={2017},
pages={},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006165700001488},
isbn={},
}
in EndNote Style
TY - CONF
JO - - BIOINFORMATICS, (BIOSTEC 2017)
TI - Prediction of Essential Genes based on Machine Learning and Information Theoretic Features
SN -
AU - Nigatu D.
AU - Henkel W.
PY - 2017
SP - 0
EP - 0
DO - 10.5220/0006165700001488