Otherwise, DCJ(a b) first creates a new cycle that is
then destroyed by DCJ(x y).
We denote by I (G), the set of intervals of all the
adjacencies of G that do not contain marker ◦.
Remark 4.5. Note that, if G contains n distinct mark-
ers, then there are 2n − 1 adjacencies in G that do not
contain marker ◦, defining 2n − 1 intervals in I (G).
Definition 4.6. Two intervals I(a b) and I(x y) of
I (G) are said to be compatible if they are overlapping
and x 6= a and y 6= b.
In the following, we prove the BI halving distance
formula by showing that if genome G contains more
than three distinct markers, n > 3, then there exist
two compatible intervals in I (G), and if n = 2 or n = 3
then d
t
BI
(G) = 1 and 2 ≤ d
p
DCJ
(G) ≤ 3. This means
that there exists a BI halving scenario S such that all
BI operations in S, possibly excluding the last one, are
equivalent to two successive sorting DCJ operations.
From now on, until the end of the section, (a b)
is an adjacency of G that is not a double-adjacency,
A is a genome consisting in a linear chromosome L
and a circular chromosome C, obtained by applying the
sorting DCJ, DCJ(a b), on G.
If there exists an interval I(x y) in I (G) compatible
with I(a b), then applying DCJ(x y) on A consists in
the integration of the circular chromosome C into the
linear chromosome L such that the adjacency (x y)
is formed. Such an integration can only be performed
by cutting an adjacency (x u) in C and an adjacency
(v y) in L (or inversely) to produce adjacencies (x y)
and (v u). This means that there must be an adjacency
(x y) in either C or L such that x is in C and y in L or
inversely. Hence, we have the following property :
Property 4.7. C cannot be reintegrated into L by ap-
plying a sorting DCJ, DCJ(x y), on A if and only if
either:
(1) for any adjacency (x y) in C (resp. L), markers x
and y are in L (resp. C), or
(2) for any adjacency (x y) in C (resp. L), markers x
and y are also in C (resp. L).
Proof. If there exists no adjacency (x y) in A such that
x is in C and y in L or inversely, then A necessarily
satisfies either (1), or (2).
Definition 4.8. An interval I(a b) in I (G) is called
interval of type 1 (resp. interval of type 2) if DCJ(a b)
produces a genome A satisfying configuration (1)
(resp. configuration (2)) described in Property 4.7.
For example, in genome (◦ 2 1 1 3 2 3 ◦),
I(1 3) is of type 1 as DCJ(1 3) produces genome
(◦ 2 1 3 ◦) (1 3 2) ; I(2 3) is of type 2 as DCJ(2 3)
produces genome (◦ 2 3 2 3 ◦) (1 1).
Now we give the maximum numbers of intervals of
type 1 and type 2 that can be contained in genome G.
Lemma 4.9. The maximum number of intervals of type
1 in I (G) is 2.
Proof. First, note that there cannot be two intervals I
and J of I (G) such that I 6= J, and both I and J are of
type 1. Now, if I is an interval of type 1, there can be
at most two different adjacencies (x y) and (u v) such
that I(x y) = I(u v) = I. In this case G necessarily has
a chromosome of the form (. . . x v . . . u y . . .) or
(... u y . . . x v . . .). Therefore, there are at most
two intervals of type 1 in I (G).
Lemma 4.10. The maximum number of intervals of
type 2 in I (G) is n.
Proof. First, note that for two adjacencies (x y) and
(x z) in G that do not contain marker ◦, if (x y) is of
type 2 then (x z) cannot be of type 2. Now, there is
only one marker u such that (u ◦) is an adjacency of
G. Let (u v) be the adjacency of G having u as first
marker, then at most half of the intervals in I (G) −
{I(u v)} can be of type 2. Therefore, there are at most
n intervals of type 2 in I (G).
Theorem 4.11. If NG(G) contains C cycles, then the
BI halving distance of G is given by:
d
t
BI
(G) =
n −C
2
Proof. Since there are 2n − 1 intervals in I (G), and at
most n+2 are of type 1 or 2, then if G is a genome con-
taining more than three distinct markers n > 3, then
2n − 1 > n + 2 and there exist two compatible inter-
vals in I (G) inducing a BI operation that decreases the
DCJ distance by 2.
Next, we show that if n = 2 or n = 3, then d
t
BI
(G) =
1 and 2 ≤ d
p
DCJ
(G) ≤ 3.
If n = 2, then the genome can be written, either
as (◦ a b b a ◦), in which case a BI can swap a
and b to produce a tandem-duplicated genome, or as
(◦ a a b b ◦), in which case a BI can swap a and a b to
produce a tandem-duplicated genome.
If n = 3, then the genome has two double-
adjacencies to be constructed, of the form (a b), (x y),
with (a b) and (x y) being two adjacencies already
present in the genome such that b = x or b = x and a
and y are distinct markers. One can rewrite (a b) and
(x y) as single markers since they will not be split-
ted, which makes a genome with 4 markers such that
at most 2 are misplaced. Then, a single BI can produce
a tandem-duplicated genome.
Now, it is easy to see to see that if n = 2 or n = 3,
then d
p
DCJ
(G) = n − C ≤ 3. Finally, if n = 2 or n = 3,
GENOME HALVING BY BLOCK INTERCHANGE
63