Multiple RNA Interaction
Formulations, Approximations, and Heuristics
Saad Mneimneh
1,
, Syed Ali Ahmed
1,
and Nancy L. Greenbaum
2,
1
Department of Computer Science, Hunter College, City University of New York, New York, U.S.A.
2
Department of Chemistry, Hunter College, City University of New York, New York, U.S.A.
Keywords:
Multiple RNA Interaction, RNA Structure, NP-hardness, Dynamic Programming, Approximation Algorithms,
Heuristics.
Abstract:
The interaction of two RNA molecules involves a complex interplay between folding and binding that war-
ranted recent developments in RNA-RNA interaction algorithms. However, biological mechanisms in which
more than two RNAs take part in an interaction exist. Therefore, we formulate multiple RNA interaction as a
computational problem, which not surprisingly turns out to be NP-complete. Our experiments with approxi-
mation algorithms and heuristics for the problem suggest that this formulation is indeed useful to determine
interaction patterns of multiple RNAs when information about which RNAs interact is not necessarily avail-
able (as opposed to the case of two RNAs where one must interact with the other), and because the resulting
RNA structure often cannot be predicated by existing algorithms when RNAs are simply handled in pairs.
1 INTRODUCTION
The interaction of two RNA molecules has been in-
dependently formulated as a computational problem
in several works, e.g. (Pervouchine, 2004; Alkan
et al., 2006; Mneimneh, 2009). In their most general
form, these formulations lead to NP-hard problems.
To overcome this hurdle, researchers have been either
reverting to approximation algorithms, or imposing
algorithmic restrictions; for instance, analogous to the
avoidance of pseudoknot formation in the folding of
RNAs.
While these algorithms had limited use in the be-
ginning, they became important venues for (and in
fact popularized) an interesting biological fact: RNAs
interact. For instance, micro-RNAs (miRNAs) bind
to a complementary part of messenger RNAs (mR-
NAs) and inhibit their translation (Meyer, 2008). One
might argue that such a simple interaction does not
present a pressing need for RNA-RNA interaction al-
gorithms; however, more complex forms of RNA-
RNA interaction exist. In E. Coli, CopA binds to the
ribosome binding site of CopT, also as a regulation
mechanism to prevent translation (Kolb et al., 2000);
Supported by NSF Award CCF-AF 1049902.
Supported by the above and a CUNY Graduate Center
Science Fellowship.
Supported by NSF Award MCB 0929394.
so does OxyS to fhlA (Argaman and Altuvia, 2000).
In both of these structures, the simultaneous folding
(within the RNA) and binding (to the other RNA) are
non-trivial to be predicted as separate events. To ac-
count for this, most of the RNA-RNA interaction al-
gorithms calculate the probability for a pair of subse-
quences (one of each RNA) to participate in the in-
teraction, and in doing so they generalize the energy
model used for the partition function of a single RNA
to the case of two RNAs (Muckstein et al., 2006; Chit-
saz et al., 2009a; Chitsaz et al., 2009b; Salari et al.,
2010; Huang et al., 2009; Li et al., 2010). This gen-
eralization takes into consideration the simultaneous
aspect of folding and binding.
Not surprisingly, there exist other mechanisms in
which more than two RNA molecules take part in an
interaction. Typical scenarios involve the interaction
of multiple small nucleolar RNAs (snoRNAs) with ri-
bosomal RNAs (rRNAs) in guiding the methylation
of the rRNAs (Meyer, 2008), and multiple small nu-
clear RNAs (snRNA) with mRNAs in the splicing
of introns (Sun and Manley, 1995). Even with the
existence of a computational framework for a sin-
gle RNA-RNA interaction, it is reasonable to believe
that interactions involving multiple RNAs are gener-
ally more complex to be treated pairwise. In addition,
given a pool of RNAs, it is not trivial to predict which
RNAs interact without some prior biological informa-
242
Mneimneh S., Ali Ahmed S. and L. Greenbaum N..
Multiple RNA Interaction - Formulations, Approximations, and Heuristics.
DOI: 10.5220/0004341402420249
In Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms (BIOINFORMATICS-2013), pages 242-249
ISBN: 978-989-8565-35-8
Copyright
c
2013 SCITEPRESS (Science and Technology Publications, Lda.)
tion.
To the best of our knowledge, no formulations
and/or algorithms exist for the problem of multiple
RNA interaction. Based on the above narrative, we
formulate this problem by bringing forward an op-
timization perspective where each part of an RNA
will contribute certain weights to the entire interaction
when binding to differentparts of other RNAs and we,
therefore, seek to maximize the total weight. This
notion of weight can be obtained by using existing
RNA-RNA interaction algorithms on pairs of RNAs.
We call our formulation the Pegs and Rubber Bands
problem. We show that under certain restrictions,
which are similar to those against pseudoknots, the
problem remains NP-hard (in fact it becomes equiv-
alent to a special instance of the interaction of two
RNAs). We describe a polynomial time approxima-
tion scheme PTAS for the problem, some heuristics,
and experimentalresults. For instance, given a pool of
RNAs in which interactions between pairs of RNAs
are known, our algorithm is capable of identifying
those pairs and predicting satisfactorily the pattern
of interaction between them (Chitsaz et al., 2009a).
Moreover, our algorithm finds the correct interaction
of a given instance of splicing consisting of two snR-
NAs (a modified U2-U6 human snRNA complex) and
two structurally autonomous parts of an intron (Zhao
et al., ), a total of four RNAs. When (partially) mix-
ing the two examples in one pool, our algorithm struc-
turally separates them.
2 PEGS AND RUBBER BANDS: A
FORMULATION
We introduce an optimization problem we call Pegs
and Rubber Bands that will serve a base framework
for the multiple RNA interaction problem. The link
between the two problems will be made shortly after
the description of Pegs and Rubber Bands.
Consider m levels numbered 1 to m with n
l
pegs
in level l numbers 1 to n
l
. There is an infinite supply
of rubber bands that can be placed around two pegs
in consecutive levels. For instance, we can choose to
place a rubber band around peg i in level l and peg j
in level l + 1; we call it a rubber band at [l,i, j]. Every
such pair of pegs [l,i] and [l + 1, j] contribute their
own weight w(l,i, j). The Pegs and Rubber Bands
problem is to maximize the total weight by placing
rubber bands around pegs in such a way that no two
rubber bands intersect. In other words, each peg can
have at most one rubber band around it, and if a rub-
ber band is placed at [l,i
1
, j
1
] and another at [l,i
2
, j
2
],
then i
1
< i
2
j
1
< j
2
. We assume without loss
of generality that w(l,i, j) 6= 0 to avoid the unneces-
sary placement of rubber bands and, therefore, either
w(l,i, j) > 0 or w(l, i, j) = . Figure 1 shows an
example.
formulate this problem by bringing forward an op-
timization perspective where each part of an RNA
will contribute certain weights to the entire interaction
when binding to different parts of other RNAs and we,
therefore, seek to maximize the total weight. This
notion of weight can be obtained by using existing
RNA-RNA interaction algorithms on pairs of RNAs.
We call our formulation the Pegs and Rubber Bands
problem. We show that under certain restrictions,
which are similar to those against pseudoknots, the
problem remains NP-hard (in fact it becomes equiv-
Figure 1: Pegs and Rubber Bands. All positive weights are
Figure 1: Pegs and Rubber Bands. All positive weights are
equal to 1 and are represented by dashed lines. The optimal
solution achieves a total weight of 8.
Given an optimal solution, it can always be re-
constructed from left to right by repeatedly placing
some rubber band at [l, i, j] such that, at the time
of this placement, no rubber band is around peg
[l, k] for k > i and no rubber band is around peg
[l + 1,k] for k > j. This process can be carried out
by a dynamic programming algorithm to compute
the maximum weight (in exponential time), by defin-
ing W(i
1
,i
2
,.. . ,i
m
) to be the maximum weight when
we truncate the levels at pegs [1,i
1
],[2, i
2
],.. . ,[m, i
m
]
(see Figure 2). The maximum weight is given by
W(n
1
,n
2
,.. . ,n
m
) and the optimal solution can be ob-
tained by standard backtracking. When all levels have
O(n) pegs, this algorithm runs in O(mn
m
) time and
O(n
m
) space.
2.1 Multiple RNA Interaction as Pegs
and Rubber Bands
To provide some initial context we now describe how
the formulation of Pegs and Rubber Bands, though
in a primitive way, captures the problem of multiple
RNA interaction. We think of each level as an RNA
and each peg as one base of the RNA. The weight
w(l,i, j) corresponds to the negative of the energy
contributed by the binding of the i
th
base of RNA l
to the j
th
base of RNA l + 1. It should be clear, there-
fore, that an optimal solution for Pegs and Rubber
Bands represents the lowest energy conformation in
a base-pair energy model, when a pseudoknot-like re-
striction is imposed on the RNA interaction (rubber
bands cannot intersect). In doing so, we obviously
assume that an order on the RNAs is given with al-
ternating sense and antisense, and that the first RNA
interacts with the second RNA, which in turn inter-
acts with the third RNA, and so on. We later relax
this ordering and condition on the interaction pattern
MultipleRNAInteraction-Formulations,Approximations,andHeuristics
243
W(i
1
,i
2
,... ,i
m
) = max
W(i
1
1,i
2
,... ,i
m
)
W(i
1
,i
2
1,i
3
,... ,i
m
)
.
.
.
W(i
1
,... ,i
m1
,i
m
1)
W(i
1
1,i
2
1,i
3
,... ,i
m
) + w(1,i
1
,i
2
)
W(i
1
,i
2
1,i
3
1,i
4
,... ,i
m
) + w(2,i
2
,i
3
)
.
.
.
W(i
1
,... ,i
m2
,i
m1
1,i
m
1) + w(m 1,i
m1
,i
m
)
where W(0,0, ... ,0) = 0.
Figure 2: Dynamic programming algorithm for Pegs and Rubber Bands.
of the RNAs. While a simple base-pairing model is
not likely to give realistic results, our goal for the mo-
ment is simply to establish a correspondence between
the two problems.
2.2 Complexity of the Problem and
Approximations
With the above correspondence in mind, the problem
of Pegs and Rubber Bands can be viewed as a instance
of a classical RNA-RNA interaction, involving only
two RNAs that is: We construct the first as RNA 1
followed by RNA 4 reversed followed by RNA 5 fol-
lowed by RNA 8 reversed and so on, and the second
as RNA 2 followed by RNA 3 reversed followed by
RNA 6 followed by RNA 7 reversed and so on, as
shown in Figure 3.
1 4 5 8 first RNA
3 6 7 second RNA2
Figure 3: Pegs and Rubber Bands as a special instance of
Figure 3: Pegs and Rubber Bands as a special instance of
RNA-RNA interaction, vertical and curved lines indicate
the binding/folding pattern of interaction.
Therefore, Pegs and Rubber Bands can be solved
as an RNA-RNA interaction problem. While this
RNA-RNA interaction represents a restricted instance
of the more general NP-hard problem, it is still NP-
hard. In fact, Pegs and Rubber Bands itself is NP-
hard.
Theorem 1. Pegs and Rubber Bands is NP-hard.
Proof: We make a reduction from the longest
common subsequence (LCS) for a set of binary
strings, which is an NP-hard problem. In this reduc-
tion, pegs are labeled and w(l,i, j) depends only on
the label of peg [l, i] and the label of peg [l + 1, j]. We
describe this weight as a function of labels shortly.
Each binary string is modified by adding the sym-
bol b between every two consecutive bits. A string
of original length n is then transformed into two con-
secutive (identical) levels of 2n 1 pegs each, where
each peg is labeled by the corresponding symbol in
{0,1,b}. For any given integer k, the first and last
levels consist of k pegs labeled . We now define the
weight as a function of labels: w(0,0) = w(1,1) =
w(b,b) = w(,0) = w(,1) = w(0,) = w(1,) = 1
and w(x,y) = otherwise. It is easy to verify that
the strings have a common subsequence of length k
if and only if the optimal solution has a weight of
i
(2n
i
1)+k = 2
i
n
i
m+k (when everypeghas a
rubber band around it), where n
i
is the original length
of string i and m is the number of strings.
* * * *
| | | |
0b0b1b0b1b1b1
|| | | | ||||
0b0b1b0b1b1b1
| | | |
0b1b0b1b0
| | | ||
0b1b0b1b0
| | | |
1b0b0b1b0b1
|||| | | |
1b0b0b1b0b1
| | | |
* * * *
Figure 4: Reduction from LCS for
{0010111,01010,100101} to Pegs and Rubber Bands
(the symbol | denotes a rubber band). The optimal solution
with weight 2(7 + 5 + 6) 3 + 4 = 37 corresponds to a
common subsequence of length 4, namely 0101.
While our problem is NP-hard, we can show that
the same formulation can be adapted to obtain a poly-
nomial time approximation. A maximization prob-
BIOINFORMATICS2013-InternationalConferenceonBioinformaticsModels,MethodsandAlgorithms
244
lem admits a polynomial time approximation scheme
(PTAS) iff for every fixed ε > 0 there is an algorithm
with a running time polynomial in the size of the input
that finds a solution within (1 ε) of optimal (Cor-
men et al., ). We show below that we can find a solu-
tion within (1 ε) of optimal in time O(m
1
ε
n
1
ε
),
where m is the number of levels and each level has
O(n) pegs.
Theorem 2. Pegs and Rubber Bands admits a PTAS.
Proof: Let OPT be the weight of the opti-
mal solution and denote by W[i... j] the weight of
the optimal solution when the problem is restricted
to levels i,i + 1,..., j (a sub-problem). For a given
ε > 0, let k =
1
ε
. Consider the following k solutions
(weights), each obtained by a concatenation of opti-
mal solutions for sub-problems consisting of at most
k levels.
W
1
= W[1...1]+W[2.. .k+1]+W[k+2...2k+1]+...
W
2
= W[1...2]+W[3.. .k+2]+W[k+3...2k+2]+...
.
.
.
W
k
= W[1...k]+W[k+1. .. 2k]+W[2k+1... 3k]+. ..
It is easy to verify that every pair of consecu-
tive levels appear in exactly k 1 of the above sub-
problems. Therefore,
k
i=1
W
i
(k 1)OPT
max
i
W
i
k 1
k
OPT (1 ε)OPT
If m is the total number of levels, then there
are O(m) sub-problems of at most k levels each
and, therefore, the running time required to find
max
i
W
i
when every level has O(n) pegs is O(mkn
k
) =
O(m
1
ε
n
1
ε
).
For a given integer k, the (1 1/k)-factor ap-
proximation algorithm is to simply choose the best
W
i
= W[1.. .i] +W[i+ 1...i + k] +W[i+ k+ 1...i +
2k] + ... as a solution, where W[i. .. j] denotes the
weight of the optimal solution for the sub-problem
consisting of levels i,i+1.. . , j. However, as a practi-
cal step, and instead of using the W
i
s for the compar-
ison, we can fill in for each W
i
some additional rubber
bands (interactions) between (RNAs) level i and level
i + 1, between level i + k and level i + k + 1, and so
on, by identifying the pegs of these levels (regions of
RNAs) that are not part of the solution. This does
not affect the theoretical guarantee but gives a larger
weight to the solution. We call it gap filling.
3 WINDOWS AND GAPS: A
BETTER FORMULATION FOR
RNA INTERACTION
In the previous section, we described our initial at-
tempt to view the interaction of m RNAs as a Pegs and
Rubber Bands problem with m levels, where the first
RNA interacts with the second RNA, and the second
with the third, and so on (so they alternate in sense
and antisense). This used a simple base-pair energy
model, which is not too realistic. We now address this
issue (and leave the issues of the ordering and the in-
teraction pattern to the next section). A better model
for RNA interaction will consider windows of inter-
action instead of single bases. In terms of our Pegs
and Rubber Bands problem, this translates to placing
rubber bands around a stretch of contiguous pegs in
two consecutive levels, e.g. around pegs [l,i
1
], [l,i
2
],
[l + 1, j
1
], and [l + 1, j
2
], where i
2
i
1
and j
2
j
1
.
The weight contribution of placing such a rubber band
is now given by w(l,i
2
, j
2
,u, v), where i
2
and j
2
are
the last two pegs covered by the rubber band in level l
and level l + 1, and u = i
2
i
1
+ 1 and v = j
2
j
1
+ 1
represent the length of the two windows covered in
level l and level l + 1, respectively.
It is easy to verify that every pair of consecu-
1 of the above sub-
j
1
i
1
i
2
j
2
Figure 5: A rubber band can now be placed around a win-
Figure 5: A rubber band can now be placed around a win-
dow of pegs, here u = 3 and v = 2 in the big window.
As a heuristic, we also allow for the possibility
of imposing a gap g 0 between windows as it may
be energetically favorable for interaction regions on
a single RNA not to be too close (which is not cap-
tured by the maximization of total weight). This gap
is also taken into consideration when we perform the
gap filling procedure described at the end of Section
2. The modified algorithm is shown in Figure 6, and
if we set u = v = 1 and g = 0, then we retrieve the
original algorithm of Figure 2.
The running time of the modified algorithm is
O(mw
2
n
m
) and O(mw
2
1
ε
n
1
ε
) for the approxima-
tion scheme, where w is the maximumwindowlength.
If we impose that u = v, then those running times be-
come O(mwn
m
) and O(mw
1
ε
n
1
ε
) respectively.
For the correctness of the algorithm, we now have
to assume that windows are sub-additive. In other
words, we require the following condition (otherwise,
the algorithm may compute an incorrect optimum due
to the possibility of achieving the same window by
MultipleRNAInteraction-Formulations,Approximations,andHeuristics
245
W(i
1
,i
2
,... ,i
m
) = max
W(i
1
1,i
2
,... ,i
m
)
W(i
1
,i
2
1,i
3
,... ,i
m
)
.
.
.
W(i
1
,... ,i
m1
,i
m
1)
W((i
1
u g)
+
,(i
2
v g)
+
,i
3
,... ,i
m
) + w(1,i
1
,i
2
,u,v)
W(i
1
,(i
2
u g)
+
,(i
3
v g)
+
,i
4
,... ,i
m
) + w(2,i
2
,i
3
,u,v)
.
.
.
W(i
1
,... ,i
m2
,(i
m1
u g)
+
,(i
m
v g)
+
) + w(m 1,i
m1
,i
m
,u,v)
where x
+
denotes max(0,x), w(l,i, j,u,v) = if u > i or v > j, 0 < u,v w (the maximum window size), g 0 (the gap), and W(0,0,...,0) = 0.
Figure 6: Modified dynamic programming algorithm for Pegs and Rubber Bands with the windows and gaps formulation.
two or more smaller ones with higher total weight):
w(l,i, j, u
1
,v
1
) + w(l,i u
1
, j v
1
,u
2
,v
2
)
w(l,i, j,u
1
+ u
2
,v
1
+ v
2
)
In our experience, most existing RNA-RNA in-
teraction algorithms produce weights (the negative
of the energy values) of RNA interaction windows
that mostly conform to the above condition. In rare
cases, we filter the windows to eliminate those that
are not sub-additive. For instance, if the above condi-
tion is not met, we set w(l, i, j,u
1
,v
1
) = w(l,iu
1
, j
v
1
,u
2
,v
2
) = .
4 INTERACTION PATTERN AND
PERMUTATIONS: A
HEURISTIC
We now describe how to relax the ordering and the
condition on the interaction pattern of the RNAs.
We first identify each RNA as being even (sense) or
odd (antisense), but this convention can obviously be
switched. Given m RNAs and a permutation on the
set {1,. .. ,m}, we map the RNAs onto the levels of
a Pegs and Rubber Bands problem as follows: We
place the RNAs in the order in which they appear
in the permutation on the same level as long as they
have the same parity (they are either all even or all
odd). We then increase the number of levels by one,
and repeat. RNAs that end up on the same level are
virtually considered as one RNA that is the concate-
nation of all. However, in the corresponding Pegs
and Rubber Bands problem, we do not allow win-
dows to span multiple RNAs, nor do we enforce a
gap between two windows in different RNAs. For ex-
ample, if we consider the following permutation of
RNAs {1,3, 4,7,5, 8,2,6}, where the RNA number
also indicates its parity (for the sake of illustration),
then we end up with the following placement: RNA
1 and RNA 3 in that order on the first level, followed
by RNA 4 on the second level, followed by RNA 7
and RNA 5 in that order on the third level, followed
by RNA 8, RNA 2, and RNA 6 in that order on the
fourth level, resulting in four virtual RNAs on four
levels of pegs as shown in Figure 7.
---RNA 1---RNA 3---
---RNA 4---
---RNA 7---RNA 5---
---RNA 8---RNA 2---RNA 6---
Figure 7: Placement of the permutation {1, 3, 4, 7,5,8,2,6}
where the RNA number also indicates its parity. The in-
teraction pattern is less restrictive then before; for instance,
RNA 7 can interact with RNA 2, RNA 4, RNA 6, and RNA
8.
But what is the best placement as a Pegs and Rub-
ber Bands problem for a given set of RNAs? Figure 8
shows a possible (greedy) heuristic that will attempt
to answer this question by finding the best permuta-
tion (and recall that the permutation uniquely deter-
mines the placement).
Given ε = 1/k and m RNAs
produce a random permutation π on {1,... ,m}
let W be the weight of the (1 ε)-optimal solution given π
repeat
betterfalse
generate a set Π of neighboring permutations for π
for every π
Π (in any order)
let W
be the weight of the (1 ε)-optimal solution given π
if W
> W
then W W
π π
bettertrue
until not better
Figure 8: A heuristic for multiple RNA interaction using the
PTAS algorithm.
To generate neighboring permutations for this
heuristic algorithm one could adapt a standard 2-opt
BIOINFORMATICS2013-InternationalConferenceonBioinformaticsModels,MethodsandAlgorithms
246
OxyS 5’ ...CCCUUG... ...GUG... ...UCCAG... 3’
|||||| ||| |||||
fhlA 3’ ...GGGAAC... ...CAC... ...AGGUC... 5’
CopA 5’ ...CGGUUUAAGUGGG... ...UUUCGUACUCGCCAAAGUUGAAGA... ...UUUUGCUU 3’
||||||||||||| |||||||||||||||||||||||| ||||||||
CopT 3’ GCCAAAUUCACCC... ...AAAGCAUGAGCGGUUUCAACUUCU... ...AAAACGAA 5’
MicA 5’ ...GCGCA... ...CUGUUUUC... ...CGU... 3’
||||| |||||||| |||
lamB 3’ ...CGCGU... ...GAUAGAGG... ...GCA... 5’
Figure 9: Known pairs of interacting RNAs.
method used in the Traveling Salesman Problem (or
other techniques). For instance, given permutation π,
a neighboring permutation π
can be obtained by di-
viding π into three parts and making π
the concate-
nation of the first part, the reverse of the second part,
and the third part. In other words, if π = (α,β,γ),
then π
= (α,β
R
,γ) is a neighbor of π, where β
R
is the
reverse of β.
5 EXPERIMENTAL RESULTS
We apply the algorithm of Section 4 using the 2-opt
method, where the PTAS is based on the Windows
and Gaps formulation of Section 3, with windows sat-
isfying 2 u,v w = 26 (RNAup’s default (Muck-
stein et al., 2006)) and a gap g = 4. The weights
w(l,i, j, u,v) are obtained as the negative of the en-
ergy values produced by RNAup. The windows are
filtered for sub-additivity as described in Section 3.
In addition, we impose that u = v for every win-
dow. We also compress RNAs on level l by removal
of a base i whenever w(l,i, j,u, u) is less than some
threshold for every j (threshold 0 is used). However,
peg [l,i] can still be part of some window, e.g. if
w(l,i+ x, j,x+y,x+y) is added to the solution, where
x,y > 0. The purpose of the last two conditions (u = v
and compression) is to speed up the algorithm. We
pick the largest weight solution among several runs
of the algorithm. The value of k and the gap filling
criterion depend on the scenario, as described below.
5.1 Fishing for Pairs
Six RNAs of which three pairs are known to interact
are used (Chitsaz et al., 2009a). We are interested in
identifying the three pairs. For this purpose, it will
suffice to set k = 2 and to ignore gap filling. Fur-
thermore, we only consider solutions in which each
RNA interacts with at most one other RNA. The so-
lution with the largest weight identifies the three pairs
correctly (Figure 9). In addition, the interacting sites
in each pair are consistent with the predictions of ex-
isting RNA-RNA interaction algorithms, e.g. (Salari
et al., 2010).
5.2 Structure Prediction
The human snRNA complex U2-U6 is necessary for
the splicing of a specific mRNA intron (Zhao et al., ).
Only the preserved regions of the intron are consid-
ered, which consist of two structurally autonomous
parts, resulting in an instance with a total of four
RNAs. The algorithm is performed with k = 2, 3,4
and gap filling. In all three cases, the solution with the
largest weight consistently finds the structure shown
in Figure 10. This structure reveals a correct pat-
tern described in (Sun and Manley, 1995; Zhao et al.,
), and cannot be easily predicted by considering the
RNAs in pairs; for instance, AUAC in U6 will bind
to UAUG in both U2 and I1, and it is not immediately
obvious which one to break without a global view, e.g.
that AUGAU in U2 binds with UACUA in I2 as well.
This is a typical issue of local versus global optimal.
5.3 Structural Separation
Six RNAs are used: CopA, CopT, and the four RNAs
of the previous scenario. The algorithm is performed
with k = 3 and gap filling. The solution with the
largest weight results in a successful prediction that
separates the RNA complex CopA-CopT of Figure 9
from the RNA structure shown in Figure 10.
MultipleRNAInteraction-Formulations,Approximations,andHeuristics
247
I1 3’ UGUAUG 5’
||||
U6 5’ AUACA.....GAUUa... ...cGUGAAGCGU 3’
|||| |||||||||
U2 3’ UAUGAUg....CUAGAAu..........gCACUUCGCA 5’
|||||
I2 5’ UACUAAc 3’
Figure 10: A modified human snRNA U2-U6 complex in the splicing of an intron, as reported in (Zhao et al., ). Bases
indicated by small letters are missing from the interaction. From left to right: g-c and a-u are missing due to the condition
2 u = v 26, but also due to the added instability of a bulge loop when this condition is relaxed; c-g ends up being not
favored by RNAup. I1 is shifted (UGU should interact with ACA instead) but this is a computational artifact of optimization
that is hard to avoid. Overall, the structure is accurate and cannot be predicted by a pairwise handling of the RNAs.
6 CONCLUSIONS
While RNA-RNA interaction algorithms exist, they
are not suitable for predicting RNA structures in
which more than two RNA molecules interact. For
instance, the interaction pattern may not be known,
in contrast to the case of two RNAs where one must
interact with the other. Moreover, even with some ex-
isting knowledge on the pattern of interaction, treat-
ing the RNAs pairwise may not lead to the best global
structure. In this work, we formulate multiple RNA
interaction as an optimization problem, prove it is
NP-complete, and provide approximation and heuris-
tic algorithms. We explore three scenarios: 1) fish-
ing for pairs: given a pool of RNAs, we identify the
pairs that are known to interact; 2) structure predic-
tion: we predict a correct complex of two snRNAs
(modified human U2 and U6) and two structurally au-
tonomous parts of an intron, a total of four RNAs;
and 3) structural separation: we successfully divide
the RNAs into independent groups of multiple inter-
acting RNAs.
REFERENCES
Alkan, C., Karakoc, E., Nadeau, J. H., Sahinalp, S. C., and
Zhang, K. (2006). Rna-rna interaction prediction and
antisense rna target search. In Journal of Computa-
tional Biology 13(2).
Argaman, L. and Altuvia, S. (2000). fhla repression by
oxys: Kissing complex formation at two sites results
in a stable antisense-target rna complex. In Journal of
Molecular Biology 300.
Chitsaz, H., Backofen, R., and Sahinalp, S. C. (2009a).
birna: Fast rna-rna binding sites prediction. In 9
th
International Conference on Algorithms in Bioinfor-
matics.
Chitsaz, H., Salari, R., Sahinalp, S. C., and Backofen, R.
(2009b). A partition function algorithm for interacting
nucleic acid strands. In Journal of Bioinformatics.
Cormen, T., Leiserson, C. E., Rivest, R. L., and Stein,
C. Approximation Algorithms in Introduction to Al-
gorithms. MIT Press.
Huang, F. W. D., Qin, J., Reidys, C. M., and Stadler, P. F.
(2009). Partition function and base pairing probabil-
ities for rna-rna interaction prediction. In Journal of
Bioinformatics 25(20).
Kolb, F. A., Malmgren, C., Westhof, E., Ehresmann, C.,
Ehresmann, B., Wagner, E. G. H., and Romby, P.
(2000). An unusual structure formed by antisense-
target rna binding involves an extended kissing com-
plex with a four-way junction and a side-by-side heli-
cal alignment. In RNA Society.
Li, A. X., Marz, M., Qin, J., and Reidys, C. M. (2010). Rna-
rna interaction prediction based on multiple sequence
alignments. In Journal of Bioinformatics.
Meyer, I. M. (2008). Predicting novel rna-rna interactions.
In Current Opinions in Structural Biology 18.
Mneimneh, S. (2009). On the approximation of opti-
mal structures for rna-rna interaction. In IEEE/ACM
Transactions on Computational Biology and Bioinfor-
matics.
Muckstein, U., Tafer, H., Hackermuller, J., Bernhart, S. H.,
Stadler, P. F., and Hofacker, I. L. (2006). Thermody-
namics of rna-rna binding. In Journal of Bioinformat-
ics.
Pervouchine, D. D. (2004). Iris: Intermolecular rna inter-
action search. In 15
th
International Conference on
Genome Informatics.
Salari, R., Backofen, R., and Sahinalp, S. C. (2010). Fast
prediction of rna-rna interaction. In Algorithms for
Molecular Biology 5(5).
Sun, J. S. and Manley, J. L. (1995). A novel u2-u6 snrna
structure is necessary for mammalian mrna splicing.
In Genes and Development 9.
Zhao, C., Bachu, R., Popovic, M., Devany, M., Brenowitz,
M., Schlatterer, J. C., and Greenbaum, N. L. Con-
formational heterogeneity of the protein-free human
spliceosomal u2-u6 snrna complex. Under revision at
RNA. The first two authors contributed equally to the
work.
BIOINFORMATICS2013-InternationalConferenceonBioinformaticsModels,MethodsandAlgorithms
248
APPENDIX: RNA SEQUENCES
MicA (even)
5’ GAAAGACGCGCAUUUGUUAUCAUCAUCCCUGUUUUCAGC
GAUGAAAUUUUGGCCACUCCGUGAGUGGCCUUUU 3’
lamB (odd)
5’ GGCAGCGCAUGUCGUCGUCUGCAUCAAGAGCCGGGUGUU
UAAGGCCUCCAUAAAAAAACGAAACGCAAAACCAUUCGC
AGUUUUAGAAGGUGGCAGCGUUUAAAGAAAAGCAAUGAU
CUCAGGAGAUAGAAUGAUGAUUACUCUGCGCAAACUCCC
ACUGGCGGUUGCUGUCGCAGCGG 3’
CopA (even)
5’ AUAGCUGAAUUCUUGGCUAUACGGUUUAAGUGGGCCCCG
GUAAUCUUUUCGUACUCGCCAAAGUUGAAGAAGAUUAUC
GGGGUUUUUGCUU 3’
CopT (odd)
5’ AAGCAAAAACCCCGAUAAUCUUCUUCAACUUUGGCGAGU
ACGAAAAGAUUACCGGGGCCCACUUAAACCG 3’
OxyS (even)
5’ GAAACGGAGCGGCACCUCUUUUAACCCUUGAAGUCACUG
CCCGUUUCGAGAGUUUCUCAACUCGAAUAACUAAAGCCA
ACGUGAACUUUUGCGGAUCUCCAGGAUCCGCU 3’
fhlA (odd)
5’ AGUUAGUCAAUGACCUUUUGCACCGCUUUGCGGUGCUUU
CCUGGAAGAACAAAAUGUCAUAUACACCGAUGAGUGAUC
UCGGACAACAAGGGUUGUUCGACAUCACUCGGACA 3’
I1 (odd)
5’ NNNNNNNNNNGUAUGUNNNNNNNNNN 3’
U6 (even)
5’ AUACAGAGAAGAUUAGCAUGGCCCCUGCGCAAGGAUGAC
ACGCAAAUUCGUGAAGCGU 3’
U2 (odd)
5’ ACGCUUCACGGCCUUUUGGCUAAGAUCAAGUGUAGUAU 3’
I2 (even)
5’ NNNNNNNNNNUACUAACNNNNNNNNNN 3’
MultipleRNAInteraction-Formulations,Approximations,andHeuristics
249