HOW TO DEAL WITH SMALL OPEN READING FRAMES?

Małgorzata Wańczyk, Paweł Błażej, Paweł Mackiewicz, Stanisław Cebrat

2012

Abstract

Current ’classical’ algorithms recognizing protein coding sequences do not work effectively with sequences of small length. To deal with this problem we have proposed some improvements of the existing gene finders without any assumed arbitrary threshold. Introduced parameters describe position of tested sequences in the ranking of all small Open Reading Frames and short protein coding genes found in the analyzed genome. The sequences can be ranked according to the coding potential calculated by ’standard’ gene prediction algorithms. As an example, we used two algorithms for gene recognition and tested the set of selected small ORFs which were selected from prokaryotic genomes using sequence similarity methods. The applied approach enabled to identify promising sequence that can code for small proteins.

References

  1. Azad, R. K. (2008). Genes in prokaryotic genomes and their computational prediction. College Press.
  2. Blaz?ej, P., Mackiewicz, P., and Cebrat, S. (2010). Using the genetic code wisdom for recognizing protein coding sequences. In Proceedings of the 2010 International Conference on Bioinformatics & Computational Biology (BIOCOMP 2010), pages 302-305.
  3. Blaz?ej, P., Mackiewicz, P., and Cebrat, S. (2011). Algorithm for finding coding signal using homogeneous markov chains independently for three codon positions. In Proceedings of the 2011 International Conference on Bioinformatics and Computational Biology (ICBCB 2011), pages 20-24.
  4. Borodovsky, M. and Lukashin, A. (1998). Genemark.hmm: new solutions for gene finding. Nucleic Acids Research, 26(4):1107-1115.
  5. Borodovsky, M. and Mcinich, J. (1993). Genmark: pararell gene recognition for both DNA strands. Comput. Chem., 17:123-133.
  6. Delcher, A., Bratke, K., Powers, E., and Salzberg, S. (2007). Identifying bacterial genes and endosymbiont DNA with glimmer. Bioinformatics, 23(6):673-679.
  7. Larsen, T. and Krogh, A. (2003). Easygene-a prokaryotic gene finder that ranks orfs by statistical significance. BMC Bioinformatics, page 4:21.
  8. Majoros, W. (2007). Methods for Computational Gene Prediction. Cambridge University Press, Cambridge, 1nd edition.
  9. WaÁczyk, M., Blaz?ej, P., and Mackiewicz, P. (2011). Comparison of two algorithms based on markov chains applied in recognition of protein coding sequences in prokaryotes. In Proceedings of the Seventeeth National Conference on Applications of Mathematics in Biology and Medicine, pages 118-123.
  10. Warren, A., Archuleta, J., Feng, W., and Setubal, J. (2010). Missing genes in the annotation of prokaryotic genomes. BMC Bioinformatics, 11(131):12.
Download


Paper Citation


in Harvard Style

Wańczyk M., Błażej P., Mackiewicz P. and Cebrat S. (2012). HOW TO DEAL WITH SMALL OPEN READING FRAMES? . In Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2012) ISBN 978-989-8425-90-4, pages 246-250. DOI: 10.5220/0003856202460250


in Bibtex Style

@conference{bioinformatics12,
author={Małgorzata Wańczyk and Paweł Błażej and Paweł Mackiewicz and Stanisław Cebrat},
title={HOW TO DEAL WITH SMALL OPEN READING FRAMES?},
booktitle={Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2012)},
year={2012},
pages={246-250},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003856202460250},
isbn={978-989-8425-90-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2012)
TI - HOW TO DEAL WITH SMALL OPEN READING FRAMES?
SN - 978-989-8425-90-4
AU - Wańczyk M.
AU - Błażej P.
AU - Mackiewicz P.
AU - Cebrat S.
PY - 2012
SP - 246
EP - 250
DO - 10.5220/0003856202460250