derived with a purposive proof-of-concept implemen-
tation of the described methods. A more sophisti-
cated tool will be realized in the future, hoping that
the proposed prediction approach proves capable of
yielding acceptable accuracies even for such types of
RNAs whose molecules imply a great variety of struc-
tural features (due to large sequence lengths). In fact,
we here only considered exemplary applications for
one particular tRNA molecule in order to get positive
feedback that (at least) the MP predictions obtained
via approximated SCFG based sampling can be of
high quality. Accordingly, more general experiments
are needed, e.g., in connection with RNA molecules
of sizes n = 3000 − 30000 (for which the memory
constraints of our approach are not restrictive assum-
ing 1GB of memory for each core) and where long
distance base pairs in a global folding are of interest.
In such a scenario the proposed algorithm could be
the method of choice provided it performs similarly
well.
This line of research is work in progress, but we
found the first impressions presented within this note
so motivating that we wanted to share them with the
scientific community already at this point, primarily
because this work leaves a number of open questions
that may be inspiration for further research of other
groups. For instance, recall that we used a sophis-
ticated SCFG (representing a formal language coun-
terpart to the thermodynamic model applied in the
Sfold program) as probabilistic basis for the consid-
ered sampling strategies. However, it would also be
possible to employ other SCFG designs, for example
one of the commonly known lightweight grammars
from (Dowell and Eddy, 2004). This might of course
yield at least noticeable if not significant changes in
the resulting sampling quality, which could be an in-
teresting subject to be explored.
It should also be noted that a similar approxima-
tive approach could potentially be considered when
attempting to reduce the worst-case time complexity
of the sampling extension of the PF approach. In fact,
since sequence information is incorporated into the
used (equilibrium) PFs and corresponding sampling
probabilities only in the form of particular sequence-
dependent free energy contributions, it seems reason-
able to believe that the time complexity for the for-
ward step (preprocessing) could possibly be reduced
by a linear factor to O(n
2
) when using some sort
of approximated (averaged) free energy contributions
that do not depend on the actual sequence (but con-
tain as much sequence information as possible), in
analogy to the approximated preprocessing step (in-
side and outside calculations) considered in this work,
where we eventually only had to use averaged emis-
sion terms instead of the exact emission probabilities
in order to save time.
REFERENCES
Akutsu, T. (1999). Approximation and exact algorithms for
RNA secondary structure prediction and recognition
of stochastic context-free languages. J. Comb. Optim.,
3(2–3):321–336.
Backofen, R., Tsur, D., Zakov, S., and Ziv-Ukelson, M.
(2011). Sparse RNA folding: Time and space efficient
algorithms. Journal of Discrete Algorithms, 9:12–31.
Ding, Y., Chan, C. Y., and Lawrence, C. E. (2004). Sfold
web server for statistical folding and rational design
of nucleic acids. Nucleic Acids Research, 32:W135–
W141.
Ding, Y. and Lawrence, C. E. (2003). A statistical sam-
pling algorithm for RNA secondary structure predic-
tion. Nucleic Acids Research, 31(24):7280–7301.
Do, C. B., Woods, D. A., and Batzoglou, S. (2006).
CONTRAfold: RNA secondary structure predic-
tion without physics-based models. Bioinformatics,
22(14):e90–e98.
Dowell, R. D. and Eddy, S. R. (2004). Evaluation of sev-
eral lightweight stochastic context-free grammars for
RNA secondary structure prediction. BMC Bioinfor-
matics, 5:71.
Frid, Y. and Gusfield, D. (2010). A simple, practical and
complete O(n
3
/log(n))-time algorithm for RNA fold-
ing using the Four-Russians speedup. Algorithms for
Molecular Biology, 5(1):5–13.
Hofacker, I., Fontana, W., Stadler, P., Bonhoeffer, S.,
Tacker, M., and Schuster, P. (1994). Fast folding and
comparison of rna secondary structures (the Vienna
RNA package). Monatsh Chem., 125(2):167–188.
Hofacker, I. L. (2003). The vienna RNA secondary structure
server. Nucleic Acids Research, 31(13):3429–3431.
Knudsen, B. and Hein, J. (1999). RNA secondary structure
prediction using stochastic context-free grammars and
evolutionary history. Bioinformatics, 15(6):446–454.
Knudsen, B. and Hein, J. (2003). Pfold: RNA sec-
ondary structure prediction using stochastic context-
free grammars. Nucleic Acids Research, 31(13):3423–
3428.
McCaskill, J. S. (1990). The equilibrium partition function
and base pair binding probabilities for RNA secondary
structure. Biopolymers, 29:1105–1119.
Nebel, M. E. and Scheid, A. (2011). Evaluation of a so-
phisticated SCFG design for RNA secondary structure
prediction. Submitted.
Wexler, Y., Zilberstein, C., and Ziv-Ukelson, M. (2007). A
study of accessible motifs and RNA folding complex-
ity. Journal of Computational Biology, 14(6):856–
872.
Zuker, M. (1989). On finding all suboptimal foldings of an
RNA molecule. Science, 244:48–52.
Zuker, M. (2003). Mfold web server for nucleic acid fold-
ing and hybridization prediction. Nucleic Acids Res.,
31(13):3406–3415.
A n2 RNA SECONDARY STRUCTURE PREDICTION ALGORITHM
75