A n2 RNA SECONDARY STRUCTURE PREDICTION ALGORITHM

Markus E. Nebel, Anika Scheid

Abstract

Several state-of-the-art tools for predicting RNA secondary structures have worst-case time and space requirements of O(n3) and O(n2) for sequence length n, limiting their applicability for practical purposes. Accordingly, biologists are interested in getting results faster, where a moderate loss of accuracy would willingly be tolerated. For this reason, we propose a novel algorithm for structure prediction that reduces the time complexity by a linear factor to O(n2), while still being able to produce high quality results. Basically, our method relies on a probabilistic sampling approach based on an appropriate stochastic context-free grammar (SCFG): using a well-known or a newly introduced sampling strategy it generates a random set of candidate structures (from the ensemble of all feasible foldings) according to a “noisy” distribution (obtained by heuristically approximating the inside-outside values) for a given sequence, such that finally a corresponding prediction can be efficiently derived. Sampling can easily be parallelized. Furthermore, it can be done in-place, i.e. only the best (most probable) candidate structure generated so far needs to be stored and finally communicated. Together, this allows to efficiently handle increased sample sizes necessary to achieve competitive prediction accuracy in connection with the noisy distribution.

References

  1. Akutsu, T. (1999). Approximation and exact algorithms for RNA secondary structure prediction and recognition of stochastic context-free languages. J. Comb. Optim., 3(2-3):321-336.
  2. Backofen, R., Tsur, D., Zakov, S., and Ziv-Ukelson, M. (2011). Sparse RNA folding: Time and space efficient algorithms. Journal of Discrete Algorithms, 9:12-31.
  3. Ding, Y., Chan, C. Y., and Lawrence, C. E. (2004). Sfold web server for statistical folding and rational design of nucleic acids. Nucleic Acids Research, 32:W135- W141.
  4. Ding, Y. and Lawrence, C. E. (2003). A statistical sampling algorithm for RNA secondary structure prediction. Nucleic Acids Research, 31(24):7280-7301.
  5. Do, C. B., Woods, D. A., and Batzoglou, S. (2006). CONTRAfold: RNA secondary structure prediction without physics-based models. Bioinformatics, 22(14):e90-e98.
  6. Dowell, R. D. and Eddy, S. R. (2004). Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction. BMC Bioinformatics, 5:71.
  7. Frid, Y. and Gusfield, D. (2010). A simple, practical and complete O(n3= log(n))-time algorithm for RNA folding using the Four-Russians speedup. Algorithms for Molecular Biology, 5(1):5-13.
  8. Hofacker, I., Fontana, W., Stadler, P., Bonhoeffer, S., Tacker, M., and Schuster, P. (1994). Fast folding and comparison of rna secondary structures (the Vienna RNA package). Monatsh Chem., 125(2):167-188.
  9. Hofacker, I. L. (2003). The vienna RNA secondary structure server. Nucleic Acids Research, 31(13):3429-3431.
  10. Knudsen, B. and Hein, J. (1999). RNA secondary structure prediction using stochastic context-free grammars and evolutionary history. Bioinformatics, 15(6):446-454.
  11. Knudsen, B. and Hein, J. (2003). Pfold: RNA secondary structure prediction using stochastic contextfree grammars. Nucleic Acids Research, 31(13):3423- 3428.
  12. McCaskill, J. S. (1990). The equilibrium partition function and base pair binding probabilities for RNA secondary structure. Biopolymers, 29:1105-1119.
  13. Nebel, M. E. and Scheid, A. (2011). Evaluation of a sophisticated SCFG design for RNA secondary structure prediction. Submitted.
  14. Wexler, Y., Zilberstein, C., and Ziv-Ukelson, M. (2007). A study of accessible motifs and RNA folding complexity. Journal of Computational Biology, 14(6):856- 872.
  15. Zuker, M. (1989). On finding all suboptimal foldings of an RNA molecule. Science, 244:48-52.
  16. Zuker, M. (2003). Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res., 31(13):3406-3415.
Download


Paper Citation


in Harvard Style

E. Nebel M. and Scheid A. (2012). A n2 RNA SECONDARY STRUCTURE PREDICTION ALGORITHM . In Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2012) ISBN 978-989-8425-90-4, pages 66-75. DOI: 10.5220/0003764600660075


in Bibtex Style

@conference{bioinformatics12,
author={Markus E. Nebel and Anika Scheid},
title={A n2 RNA SECONDARY STRUCTURE PREDICTION ALGORITHM},
booktitle={Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2012)},
year={2012},
pages={66-75},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003764600660075},
isbn={978-989-8425-90-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2012)
TI - A n2 RNA SECONDARY STRUCTURE PREDICTION ALGORITHM
SN - 978-989-8425-90-4
AU - E. Nebel M.
AU - Scheid A.
PY - 2012
SP - 66
EP - 75
DO - 10.5220/0003764600660075