each molecule. The shape is a topological concept
widely used by Bon (Bon et al., 2008) and Reydis et
al. (Reidys et al., 2011). Moreover, it is also pos-
sible to define an algorithm to compute some topo-
logical invariants, such as genus and crossing num-
ber (Vernizzi et al., 2016). Another possible proce-
dure over the word can be easily defined to detect
whether or not a pseudoknot belongs to a given class.
Understanding if two structures are characterized by
the same pseudoknots is useful for the choice of the
particular algorithm for comparing the two structures
taking into account the biological relevant operations
such as addition, deletion, and substitution of nu-
cleotides or base pairs.
5 CONCLUSIONS
The biological function of an RNA molecule depends
on its structure. As a consequence, the molecule can-
not sustain substantial changes to its secondary and
tertiary structures to preserve the particular function.
Thus, the knowledge of the structure is very important
and the ability to compare the RNA structure motifs
supports the study of function and evolution of RNA.
In this paper, we proposed a measure to compare
RNA secondary structures with pseudoknots in terms
of interactions among loops. From a biological point
of view, it is useful to identify the conserved struc-
tures during the evolution since its primary structure
is often unpreserved. In fact, this measure is able to
detect the global properties of the molecules taking
advantage of the set theory. Consequently, a bene-
fit is that it can be computed quickly. Its properties
make the measure easy to be handled theoretically. A
statistical study over a large set of molecules can be
performed in order to determine a new clusterization.
This clusterization can be compared with others taken
from differnt approaches present in the literature.
We plan to improve the developed software that
implements the measure and the whole methodology
presented in this paper in order to investigate and an-
alyze in statistical terms the correlations between the
proposed measure and the functions of RNAs. More-
over, we plan to evaluate the five similarity func-
tions in order to classify the performance of the dif-
ferent similarity functions as measured. For reach-
ing the goals, we have decided to compare molecules
extracted from the Rfam (Nawrocki et al., 2015)
database. This database classifies non-coding RNAs
in families whose member posses a similar secondary
structure, suggesting evolutionary relationships and
similar functions. Moreover, this database provides
a consensus secondary structure for each family.
ACKNOWLEDGEMENTS
We acknowledge the financial support of the Fu-
ture and Emerging Technologies (FET) programme
within the Seventh Framework Programme (FP7)
for Research of the European Commission, un-
der the FET-Proactive grant agreement TOPDRIM
(www.topdrim.eu), number FP7-ICT- 318121.
REFERENCES
Blin, G. and Touzet, H. (2006). How to compare arc-
annotated sequences: The alignment hierarchy. In In-
ternational Symposium on String Processing and In-
formation Retrieval, pages 291–303. Springer.
Bon, M., Vernizzi, G., Orland, H., and Zee, A. (2008).
Topological classification of RNA structures. Journal
of molecular biology, 379(4):900–911.
Burke, D. H., Scates, L., Andrews, K., and Gold, L. (1996).
Bent pseudoknots and novel rna inhibitors of type 1
human immunodeficiency virus (hiv-1) reverse tran-
scriptase. Journal of molecular biology, 264(4):650–
666.
Chen, J.-L., Blasco, M. A., and Greider, C. W. (2000). Sec-
ondary structure of vertebrate telomerase rna. Cell,
100(5):503 – 514.
Corpet, F. and Michot, B. (1994). Rnalign program: align-
ment of rna sequences using both primary and sec-
ondary structures. Computer applications in the bio-
sciences: CABIOS, 10(4):389–399.
Dill, K. (1990). Dominant forces in protein folding. Bio-
chemistry, 29(31):7133–55.
Ding, Y., Chan, C. Y., and Lawrence, C. E. (2005). Rna
secondary structure prediction by centroids in a boltz-
mann weighted ensemble. Rna, 11(8):1157–1166.
Dulucq, S. and Tichit, L. (2003). Rna secondary structure
comparison: exact analysis of the zhang–shasha tree
edit algorithm. Theoretical Computer Science, 306(1-
3):471–484.
Evans, P. (1999). Algorithms and Complexity for Anno-
tated Sequences Analysis. PhD thesis, University of
Victoria.
Evans, P. A. (2011). Finding common rna pseudoknot struc-
tures in polynomial time. Journal of Discrete Algo-
rithms, 9(4):335 – 343.
Ferr
´
e-D’Amar
´
e, A. and Doudna, J. (1999). Rna folds: in-
sights from recent crystal structures. Annual review of
biophysics and biomolecular structure, 28(1):57–73.
Harrison, M. A. (1978). Introduction to formal language
theory. Addison-Wesley Longman Publishing Co.,
Inc.
Herrbach, C., Denise, A., and Dulucq, S. (2010). Av-
erage complexity of the jiang–wang–zhang pairwise
tree alignment algorithm and of a rna secondary struc-
ture alignment algorithm. Theoretical Computer Sci-
ence, 411(26):2423–2432.
BIOINFORMATICS 2018 - 9th International Conference on Bioinformatics Models, Methods and Algorithms
36