to the category of reactions that require strand sep-
aration, we will model cruciform formation depen-
dent on the number of superhelical twists that exist or
could exist in the molecule at a given moment. When
linear DNA is converted into cruciform DNA, T de-
creases by one for every 10.5 bp participating in cru-
ciform formation (Sinden, 1994). Here we assume
that the part of the molecule that forms the new sec-
ondary structure continues to exist in a superhelicaly
relaxed state. At the same time we assume, that the
rest of the DNA molecule remains at previous super-
helical density corrected for the difference caused by
the newly formed cruciform. If the superhelical den-
sity before the formation of the cruciform was σ
old
, it
will become divided by a four-way junction into two
domains: i) the cruciform of length l with intrastrand
basepairs and ii) the rest of the molecule. Based on
Eqs. 1 and 3, the superhelical density of the rest of
the molecule will now be
σ
new
=
σ
old
∗ N + l
N − l
(6)
Using Eq. 6 for a hypothetical 1000bp plasmid with
superhelical density of -0.065 harboring a 20bp palin-
drome, after cruciform formation we obtain superhe-
lical density for the rest of the molecule
−0.065 ∗ 1000+ 20
1000− 20
=
−45
980
= −0.046 (7)
This clearly indicates that at the given length and su-
perhelical density, the plasmid in this example has the
potential to form only one secondary structure (cru-
ciform). Let us now suppose the plasmid contains
several palindromes capable of forming a cruciform
structure. Which of them will actually form in vivo
will most likely depend on chance. Once a cruci-
form is formed, it is very difficult for it to return to
the linear form. However, if we have a population of
plasmids, as is usually the case in populations of bac-
teria or isolated molecules used in experiments, each
of them may be in a different state in terms of which
of the existing palindromes actually formed a cruci-
form. This clearly shows the need for a probabilistic
model of cruciform formation, which will not only
take into account stability of cruciform structures that
can be formed (each palindrome is a potential cruci-
form), but also the presence of other cruciforms in the
same topological domain (which will affect the resid-
ual superhelical density calculated from Eq. 6) and
the melting/folding path by which cruciforms form in
linear DNA.
The path by which a linear segment of B-DNA be-
comes cruciform DNA has been discussed intensively
in literature. C-type formation has been described in
low salt solutions and S-type formation at physiolog-
ical salt levels (Lilley, 1989). Since our model aspires
to describe conditions in vivo, we will uniformly as-
sume S-type formation in our model. Under this sce-
nario, the segment containing the palindrome has to
partially melt in a region of about 10bp in the area
of the future loop. Upon this partial melting, the free
strands can return to original configuration, or form
a four-way junction initiating cruciform extrusion at
the given location. Once the cruciform is formed, ex-
tra energy is required for its transformation back to
linear B-DNA.
In our model, we divide the sequence of the palin-
drome into three zones: i) the melting zone is in the
loop area, it extends to the first two bases that will
form a pair in the stem of the cruciform as predicted
by UNAFold, however, it is forced to be at least 9bp
long; ii) the cruciform stem zone formed by the bulk
of the paired bases and iii) the nucleation zone formed
by two paired bases at the boundary of the stem and
the loop.
Based on the available knowledge on cruciform
formation, we can now formulate a stochastic math-
ematical model of cruciform formation in a topolog-
ically isolated segment of DNA. A schematic draw-
ing of the model is given in Figure 1. We start by
identifying a set of palindromes in the DNA sequence
by computational methods. Let us designate them
p1, p2,..., p
k
. Each palindrome will exist in one of
three states: L (linear B-DNA), M (melted loop zone)
or C (cruciform). The transitions will be modelled
based on calculated free energies or melting temper-
atures of the appropriate duplex state. The lower the
energy (or higher the melting temperature), the less
likely the molecule will be to leave that state. The
best way to calculate the relative probabilities of tran-
sition for each palindrome remains to be determined.
Currently, we use a formula that takes into account
the exponential relationship between free energy or
melting temperature on one side and rate constants of
the underlying reactions on the other. An exponen-
tial function of free energy or temperature difference
from melting temperature is calculated for each palin-
drome and the resulting values are then normalized to
a sum of 100% to obtained the relative probabilities of
transition (refer to MATLAB (see Url 1) code for the
exact formula). The whole system can now be simu-
lated by iterating through a predefined number of cy-
cles (event time). In each of the cycles the state of a
given molecule will be changed based on the calcu-
lated transition probability distribution. If calculated
on a population of molecules (or cells), we obtain per-
centages of palindromes present as linear B-DNA or
cruciform for each of the palindromes.
BIOINFORMATICS 2012 - International Conference on Bioinformatics Models, Methods and Algorithms
126