Evolution of Bacterial Genome under Changing Mutational Pressure

Computer Simulation Studies

Paweł Bła

zej, Paweł Mackiewicz, Małgorzata Wa

nczyk and Stanisław Cebrat

Department of Genomics, Faculty of Biotechnology, University of Wrocław, ul. Przybyszewskiego 63/77, Wrocław, Poland

Keywords:

DNA Asymmetry, Genome Rearrangements, Monte Carlo Simulation, Mutational Pressure.

Abstract:

The main force shaping the structure of bacterial chromosomes is the replication-associated mutational pres-

sure which is characterized by distinct nucleotide substitution patterns acting on differently replicated DNA

strands (leading and lagging). Therefore, the composition of DNA strands is asymmetric and it is important

at which strand a gene is located and into which strand it could be translocated. Thus, the mutational pressure

restricts also intragenomic translocations. To analyze this effect, we have elaborated a simulation model of

bacterial genome evolution assuming translocation of protein coding genes and different types of selection act-

ing on their sequences. The ’negative’ selection eliminated individuals if the coding signal of any gene in its

genome dropped below the acceptable range, whereas the ’stabilizing’ selection did not allow for the decrease

in the coding signal of any gene below its original value. Under the ’negative’ selection more genes stayed

or were translocated to the lagging strand, whereas under the ’stabilizing’ selection more genes preferred the

leading strand. The ’stabilizing’ selection eliminated more individuals because of the coding signal loss and

slightly fewer because of the stop codon generation. The ’stabilizing’ selection allowed also for much less

gene translocations between strands than the ’negative’ selection.

1 INTRODUCTION

The conserved position of genes on bacterial chro-

mosomes is observed only between closely related

species or strains and it disappears very quickly dur-

ing divergence of the bacterial genomes (Mushegian

and Koonin, 1996; Watanabe et al., 1997; Bellgard

et al., 1999; Itoh et al., 1999; Hughes, 2000; Rocha,

2006). The main force shaping the structure of bac-

terial chromosomes is the mutational pressure associ-

ated with DNA replication. Since the mechanisms of

DNA synthesis are different for leading and lagging

strands, the probabilities of speciﬁc nucleotide substi-

tutions are different for those strands (Frank and Lo-

bry, 1999; Kowalczuk et al., 001a; Kowalczuk et al.,

001b; Rocha and Danchin, 2001; Rocha et al., 2006).

The direct result of this phenomenon is compositional

bias between the differently replicated DNA strands,

which is called DNA asymmetry. (Lobry, 1996; Grig-

oriev, 1998; McLean et al., 1998; Mrazek and Karlin,

1998; Mackiewicz et al., 1999; Tillier and Collins,

000a; Lobry and Sueoka, 2002).

The genome rearrangements and genes transloca-

tions are strongly related to the asymmetric structure

of bacterial chromosomes (Achaz et al., 2003; Mack-

iewicz et al., 2003). The large inversions observed in

closely related bacterial genomes are symmetrical in

relation to the origin of replication as a result of

higher frequency of recombination events at the repli-

cation forks and/or selection for the maintenance of:

(i) highly expressed genes near the replication ori-

gin in the proper copy number, (ii) the same length

of two replichores and (iii) position of genes on the

same type of DNA strand subjected to the stable muta-

tional pressure (Eisen et al., 2000; Tillier and Collins,

000b; Mackiewicz et al., 001a). However, compar-

isons of more distantly related genomes revealed that

the number of orthologs which changed DNA strand

increases quickly with the phylogenetic distance leav-

ing only a small fraction of highly conserved genes

for ribosomal proteins on the leading strand in ana-

lyzed genomes (Mackiewicz et al., 2003). It proba-

bly results from selection for location of genes essen-

tial for cell functioning in the leading strand (Rocha

and Danchin, 003a; Rocha and Danchin, 003b). Such

requirements may also explain the observed higher

frequency of gene translocations from the lagging to

the leading strand rather than in the opposite direc-

tion (Mackiewicz et al., 001b). The last observation

should result in a very strong bias in coding capacity

of the two DNA strands. To keep the coding capac-

ity of DNA strands more balanced, the other selection

272

Bła

zej P., Mackiewicz P., Wa

nczyk M. and Cebrat S..

Evolution of Bacterial Genome under Changing Mutational Pressure - Computer Simulation Studies.

DOI: 10.5220/0004192802720277

In Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms (BIOINFORMATICS-2013), pages 272-277

ISBN: 978-989-8565-35-8

 2013 SCITEPRESS (Science and Technology Publications, Lda.)

force should be expected. It could be the different se-

lection pressure for gene surviving under the different

mutational pressure.

The translocation of genes between differently

replicated DNA strands and the inﬂuence of chang-

ing mutational pressure associated with replication

on such genes was studied in computer simulations

(Mackiewicz et al., 2004; Dudkiewicz et al., 2005;

Mackiewicz and Cebrat, 2009). They showed that

the most advantageous strategy for the majority of

genes is to switch their position between the DNA

strands from time to time. The translocation involves

a change in the direction of the mutational pressure,

which introduces intragenic suppression mutations

complementing the former ones which occurred in

the same gene. Those previous simulations used the

amino acid composition of particular gene products as

selection parameter. In the approach presented here,

we applied selection at the nucleotide level according

to the algorithm for prediction of protein coding se-

quences (Bła

zej et al., 2010; Bła

zej et al., 2011).

2 METHODS

In computer simulations, we applied the modiﬁed

model of prokaryotic genome evolution proposed in

(Bła

zej et al., 2012). We considered the popula-

tion of 72 individuals. Such number was optimal

for the simulation program to perform computer cal-

culations. Each individual (genome) were repre-

sented by 475 protein coding sequences of Borrelia

burgdorferi genome, which is characterized by very

strong asymmetry in DNA composition. The gene se-

quences were downloaded from the NCBI database

(www.ncbi.nlm.nih.gov). This set of genes was di-

vided into subsets: 333 genes located on the leading

strand and 142 genes located on the lagging strand.

Nucleotides from the gene sequences were chosen

for mutation according to the Poisson process assum-

ing one mutation per genome. Then the nucleotide

was substituted by another one with probability in the

mutational matrices for the leading or lagging strands

(Tab. 1) accordingly to the current location of the

gene. However, in contrast to (Bła

zej et al., 2012), we

also applied translocation of genes to the other DNA

strand with the probability 10

−3

, which is close to the

rearrangement rate per generation expected in bacte-

ria without selection (Rocha, 2006). In the next step

of simulation, such gene was subjected to the substitu-

tion matrix from this new strand. For comparison, we

also carried out simulations without translocations.

After the mutation stage, genes were eliminated

if they obtained a stop codon or their coding signal

Table 1: The substitution matrix describing mutational pres-

sure acting on the leading DNA strand for the B. burgdorferi

genome (Kowalczuk et al., 001b). A nucleotide in the ﬁrst

column is substituted by a nucleotide in the ﬁrst row. Be-

cause of strands’ complementarity, the substitution from the

lagging strand is complementary to the corresponding one

from the leading strand. For example, the lagging strand

substitution from C to T corresponds to the substitution

from G to A from the leading strand.

A T G C

A 0.81 0.10 0.07 0.02

T 0.07 0.87 0.03 0.03

G 0.16 0.12 0.71 0.01

C 0.07 0.26 0.05 0.62

was destroyed according to the algorithm for predic-

tion of protein coding sequences (Bła

zej et al., 2010;

Bła

zej et al., 2011). This algorithm calculates av-

erage coding signal uses three independent homoge-

neous Markov chains to describe occurrence of nu-

cleotides for each of three codon positions in a given

DNA sequence, separately. However, unlike (Bła

zej

et al., 2012) we applied separate selection pressures

for genes according to their location on the leading or

lagging strands. Genes from the given DNA strand

were checked with the appropriate gene recognition

model based on the learning sets of protein coding

genes from the suitable strand.

We considered two types of selection of individu-

als according to the coding signal. An individual was

eliminated when at least one of its mutated gene se-

quences accumulated so many mutations that its cod-

ing signal became lower than: (i) the coding signal

from an alternative reading frame (’negative’ selec-

tion), or (ii) the original value at the beginning of the

simulation (’stabilizing’ selection). Simulations were

run over 10 million Monte Carlo steps (MCS).

3 RESULTS AND DISCUSSION

3.1 Coding Signal for Different Gene

Sets and Models

In Fig. 1, we compared the average coding sig-

nal calculated separately for the leading and lagging

strand genes based on two gene recognition models

trained on: (i) genes from the DNA strand on which

the analyzed genes are located (right model) and (ii)

genes from the opposite strand (wrong model). The

maximum of the coding signal for alternative read-

ing frames of the analyzed genes was also shown for

comparison. As expected, the application of the right

model for every gene set gives the higher coding sig-

EvolutionofBacterialGenomeunderChangingMutationalPressure-ComputerSimulationStudies

273

nal than in the case of the wrong model. However, the

difference is smallest for the lagging strand genes than

for the leading strand genes although the coding sig-

nal for the leading strand genes with the right model

shows the widest range and overlaps with the range of

the wrong model for the leading strand genes. Gen-

erally, the coding signal of correct reading frames is

much stronger than of the non-coding sequences for

all gene sets and models combinations. The small-

est difference between coding signals of the correct

and incorrect reading frames is if the coding signal of

lagging strand genes is calculated based on the learn-

ing set from the leading strand. It suggests that the

genes from the lagging strand should be more fre-

quently eliminated when they were subjected to the

selection typical of the leading strand genes. How-

ever, these expectation can be also inﬂuenced by the

applied mutational and selection pressures acting on

gene sequences.

Figure 1: Average coding signal for the leading and lagging

strand genes (lead g and lagg g, respectively) calculated ac-

cording to the gene recognition model trained on learning

sets from two DNA strands (lead m and lagg m, respec-

tively). Non-coding sequences represented by alternative

reading frames of genes from two DNA strands were also

included for comparison (lead n and lagg n, respectively).

The thick horizontal line in the boxplots indicates median,

the rectangle means quartile range and the whiskers show

the range without outliers.

3.2 Stability of Gene Location on DNA

Strands during Simulation

The simulations were started with 70% of genes com-

ing from the leading strand and 30% of genes from

the lagging strand (like in the real B. burgdorferi

genome). These proportions were almost unchanged

under the ’stabilizing’ selection while they changed

signiﬁcantly under negative selection and the percent-

age of the genes staying on the leading strand dropped

to about 40% after 10 million MCS of simulation

(Fig. 2).

Figure 2: Fraction of genes occupying the leading strand

during simulation.

To check what kind of translocation led to this

distribution, we studied changes in fraction of genes

from DNA strands separately (Fig. 3 and Fig. 4).

These analyses showed that till 10, 000 MCS, about

the half of genes both from the leading and the lag-

ging strand were translocated to the opposite strand

under the ’negative’ selection. However, during the

simulation under this selection many lagging strand

genes returned to their proper strand reaching at the

end of the simulation about 60% (Fig. 3). In contrast

to that, more of leading strand genes left their own

strand and about 40% occupied the leading strand at

the end of the simulation for the ’negative’ selection

(Fig. 4). Interestingly, the ’stabilizing’ selection was

more conserving for positions of the leading than lag-

ging strand genes because it kept more than 90% of

the genes on the leading strand and about 80% of the

genes on the lagging strand.

This results indicate that for the ’negative’ selec-

tion, it is more advantageous for most of genes from

both DNA strands to stay under mutational and se-

lection pressures characteristic of the lagging strand.

However, for the more conserved ’stabilizing’ selec-

tion, more genes prefer the leading strand conditions

as it is observed in the real genomes (Mackiewicz

et al., 001b; Mackiewicz et al., 2003; Rocha and

Danchin, 003a; Rocha and Danchin, 003b).

BIOINFORMATICS2013-InternationalConferenceonBioinformaticsModels,MethodsandAlgorithms

274

Figure 3: Fraction of lagging strand genes staying on their

own strand during simulation.

Figure 4: Fraction of leading strand genes staying on their

own strand during simulation.

3.3 Comparison of Two Selection Types

and Simulation Models

The used two selection types differ in their restric-

tions. The ’negative’ selection eliminated individu-

als when the coding signal of genes was smaller than

their alternative reading frames, whereas the ’stabiliz-

ing’ selection was more restrictive because a mutated

gene was not allowed to decrease its original coding

signal. Then, the ’stabilizing’ selection eliminated

more genomes than the ’negative’ one for the model

with translocations (Fig. 5). Moreover, the simula-

tion with the latter selection begun with a long de-

lay in which none or very few individuals were elimi-

nated until their gene sequences accumulated enough

number of mutations that decreased their coding sig-

nal. Therefore, the ’stabilizing’ selection was more

restrictive for the maintenance of the coding signal

during the whole simulation, whereas the ’negative’

selection allowed for its decrease (Fig. 6). The model

with gene translocations eliminated more individuals

than the model without them under the ’negative’ se-

lection (Fig. 5). This stronger selection resulted in

a slightly higher average coding signal of survived

genes for the model with translocations when the sim-

ulations reached equilibrium (Fig. 6).

Figure 5: Cumulated number of individuals eliminated be-

cause of the coding signal loss in their genes.

Figure 6: Change of the coding signal of gene sequences

during simulation.

Differences between two types of selections and

simulation models were much weaker when individ-

EvolutionofBacterialGenomeunderChangingMutationalPressure-ComputerSimulationStudies

275

Figure 7: Cumulated number of individuals eliminated be-

cause of the codon stop.

Figure 8: Cumulated number of substitutions accepted in

gene sequences.

uals were eliminated by generation of the stop codon

inside their genes (Fig. 7). Then, the ’stabilizing’ se-

lection turned out less restrictive than the ’negative’

one for the model with translocations. Moreover, the

admission of gene movements between DNA strands

eliminated only slightly more individuals than the

model without translocations (Fig. 7).

Generally, the number of eliminated individuals

by stop codons was much bigger than the number of

individuals eliminated because of the coding signal

loss for models with and without translocations using

the ’negative’ selection (Fig. 5). The opposite was

for the ’stabilizing’ selection model with transloca-

tions. It seems that translocation effect on individual

elimination depends on the applied selection type.

Figure 9: Cumulated number of gene translocation between

differently replicated DNA strands.

The applied selections also differed in the num-

ber of accepted substitutions (Fig. 8) and the number

of translocated genes (Fig. 9). The ’negative’ selec-

tion allowed for only slightly more substitutions but

signiﬁcantly more gene movements than the ’stabiliz-

ing’ one. The model with translocations caused accu-

mulation of more substitutions than the model with-

out them. It agrees with studies of bacterial genomes

showing higher divergence between homologs lying

on different than the same DNA strands (Rocha and

Danchin, 2001; Mackiewicz et al., 001b; Mackiewicz

et al., 2003). It may result from weaker selection on

translocated genes or higher susceptibility of their se-

quences to the opposite mutational pressure.

The presented simulations were based on Borre-

lia burgdorferi genes but similar results are expected

for other bacteria also showing compositional bias be-

tween differently replicated DNA strands. However,

pure mutational pressures should be found for these

genomes to perform comparative studies.

REFERENCES

Achaz, G., Coissac, E., Netter, P., and Rocha, E. (2003). As-

sociations between inverted repeats and the structural

evolution of bacterial genomes. Genetics, 164:1279–

1289.

Bellgard, M., Itoh, T., Watanabe, H., Imanishi, T., and Go-

jobori, T. (1999). Dynamic evolution of genomes and

the concept of genome space. Ann. N. Y. Acad. Sci.,

870:293–300.

Bła

zej, P., Mackiewicz, P., and Cebrat, S. (2010). Using the

genetic code wisdom for recognizing protein coding

sequences. In Proceedings of the 2010 International

BIOINFORMATICS2013-InternationalConferenceonBioinformaticsModels,MethodsandAlgorithms

276

Conference on Bioinformatics & Computational Biol-

ogy (BIOCOMP 2010), pages 302–305.

Bła

zej, P., Mackiewicz, P., and Cebrat, S. (2011). Algo-

rithm for ﬁnding coding signal using homogeneous

markov chains independently for three codon posi-

tions. In Proceedings of the 2011 International Con-

ference on Bioinformatics and Computational Biology

(ICBCB 2011), pages 20–24.

Bła

zej, P., Mackiewicz, P., and Cebrat, S. (2012). Simula-

tion of bacterial genome evolution under replicational

mutational pressures. In Proceedings of 5th Interna-

tional Joint Conference on Biomedical Engineering

Systems and Technologies (BIOSTEC 2012), Interna-

tional Conference on Bioinformatics Models, Methods

and Algorithms (Bioinformatics 2012), pages 51–57.

Dudkiewicz, M., Mackiewicz, P., Mackiewicz, D., Kowal-

czuk, M., Nowicka, A., Polak, N., Smolarczyk, K.,

Kiraga, J., Dudek, M., and Cebrat, S. (2005). Higher

mutation rate helps to rescue genes from the elimina-

tion by selection. Biosystems, 80:192–199.

Eisen, J., Heidelberg, J., White, O., and Salzberg, S.

(2000). Evidence for symmetric chromosomal in-

versions around the replication origin in bacteria.

Genome Biol., 1:11.11–11.19.

Frank, A. and Lobry, J. (1999). Asymmetric substitution

patterns: a review of possible underlying mutational

or selective mechanisms. Gene, 238:65–77.

Grigoriev, A. (1998). Analysing genomes with cumulative

skew diagrams. Nucleic Acids Res., 26:2286–2290.

Hughes, D. (2000). Evaluating genome dynamics: the con-

straints on rearrangements within bacterial genomes.

Genome Biol., 1:REVIEWS0006.

Itoh, T., Takemoto, K., Mori, H., and Gojobori, T. (1999).

Evolutionary instability of operon structures disclosed

by sequence comparisions of complete microbial

genomes. Mol. Biol. Evol., 16:332–346.

Kowalczuk, M., Mackiewicz, P., Mackiewicz, D., Now-

icka, A., Dudkiewicz, M., Dudek, M., and Cebrat, S.

(2001a). DNA asymmetry and the replicational muta-

tional pressure. J. Appl. Genet., 42:553–577.

Kowalczuk, M., Mackiewicz, P., Mackiewicz, D., Now-

icka, A., Dudkiewicz, M., Dudek, M., and Cebrat,

S. (2001b). High correlation between the turnover of

nucleotides under mutational pressure and the DNA

composition. BMC Evol. Biol., 1:13.

Lobry, J. (1996). Asymmetric substitution patterns in the

two DNA strands of bacteria. Mol. Biol. Evol., 13:,

660–665.

Lobry, J. and Sueoka, N. (2002). Asymmetric directional

mutation pressures in bacteria. Genome Biol., 3:58.

Mackiewicz, D. and Cebrat, S. (2009). To understand nature

- computer modelling between genetics and evolution.

In J. Miekisz and M. Lachowicz (eds), From Genetics

to Mathematics, Series on Advances in Mathematics

for Applied Sciences 79, pages 1–33. World Scientiﬁc.

Mackiewicz, D., Mackiewicz, P., Kowalczuk, M., Dud-

kiewicz, M., Dudek, M., and Cebrat, S. (2003). Re-

arrangements between differently replicating DNA

strands in asymmetric bacterial genomes. Acta Mi-

crobiol. Pol., 52:245–261.

Mackiewicz, P., Dudkiewicz, M., Kowalczuk, M., Mack-

iewicz, D., Kiraga, J., Polak, N., Smolarczyk, K.,

Nowicka, A., Dudek, M., and Cebrat, S. (2004). Dif-

ferential gene survival under asymmetric directional

mutational pressure. LNCS, 3039:687–693.

Mackiewicz, P., Gierlik, A., Kowalczuk, M., Dudek, M.,

and Cebrat, S. (1999). Asymmetry of nucleotide com-

position of prokaryotic chromosomes. J. Appl. Genet.,

40:1–14.

Mackiewicz, P., Mackiewicz, D., Kowalczuk, M., and Ce-

brat, S. (2001a). Flip-ﬂop around the origin and ter-

minus of replication in prokaryotic genomes. Genome

Biol., 2:1004.1–1004.4.

Mackiewicz, P., Szczepanik, D., Gierlik, A., Kowalczuk,

M., Nowicka, A., Dudkiewicz, M., Dudek, M., and

Cebrat, S. (2001b). The differential killing of genes

by inversions in prokaryotic genomes. J. Mol. Evol.,

53:615–621.

McLean, M., Wolfe, K., and Devine, K. (1998). Base com-

position skews, replication orientation, and gene ori-

entation in 12 prokaryote genomes. J. Mol. Evol.,

47:691–696.

Mrazek, J. and Karlin, S. (1998). Strand compositional

asymmetry in bacterial and large viral genomes. Proc.

Natl. Acad. Sci. U.S.A., 95:3720–3725.

Mushegian, A. and Koonin, E. (1996). Gene order is

not conserved in bacterial evolution. Trends Genet.,

12:289–290.

Rocha, E. (2006). Inference and analysis of the relative

stability of bacterial chromosomes. Mol. Biol. Evol.,

23:513–522.

Rocha, E. and Danchin, A. (2001). Ongoing evolution of

strand composition in bacterial genomes. Mol. Biol.

Evol., 18:1789–1799.

Rocha, E. and Danchin, A. (2003a). Gene essentiality deter-

mines chromosome organisation in bacteria. Nucleic

Acids Res., 31:5202–5211.

Rocha, E. and Danchin, A. (2003b). Essentiality, not ex-

pressiveness, drives gene strand bias in bacteria. Na-

ture Genet., 34:377–378.

Rocha, E., Touchon, M., and Feil, E. (2006). Similar com-

positional biases are caused by very different muta-

tional effects. Genome Res., 16:1537–1547.

Tillier, E. and Collins, R. (2000a). The contributions of

replication orientation, gene direction, and signal se-

quences to base composition asymmetries in bacterial

genomes. J. Mol. Evol., 50:249–257.

Tillier, E. and Collins, R. (2000b). Genome rearrangement

by replication-directed translocation. Nature Genet.,

26:195–197.

Watanabe, H., Mori, H., Itoh, T., and Gojobori, T. (1997).

Genome plasticity as a paradigm of eubacterial evolu-

tion. J. Mol. Evol., 44 (Suppl. 1):57–64.

EvolutionofBacterialGenomeunderChangingMutationalPressure-ComputerSimulationStudies

277