Studies of Mutation Accumulation in Three Codon Positions using Monte

Carlo Simulations and Metropolis-Hastings Algorithm

Małgorzata Grabi

nska, Paweł Bła

zej and Paweł Mackiewicz

Department of Genomics, Faculty of Biotechnology, University of Wrocław, Wrocław, Poland

Keywords:

Codon, Evolution, Metropolis-Hastings Algorithm, Monte Carlo Simulations, Mutation, Nucleotide Compo-

sition, Protein Coding Sequence, Rate Matrix, Selection, Substitution, Transition, Transversion.

Abstract:

Protein coding sequences are characterized by speciﬁc nucleotide composition in three codon positions as

a result of mutational and selection pressures. To analyse the impact of mutations and different transi-

tion/transversion ratio on three codon position in protein coding sequences, we elaborated a model of genome

evolution based Monte Carlo simulation. Selection was applied against stop translation codons and modi-

ﬁed Metropolis-Hastings algorithm to maintain typical nucleotide composition of particular codon positions.

The simulations were performed on genomes consisting of bacterial gene sequences. We used a series of

nucleotide substitution matrices assuming different transition/transversion ratio and nucleotide stationary dis-

tribution characteristic of the real mutational pressure. The simulations showed exponential decrease in the

number of eliminated genomes with the growth of the transition/transversion ratio. The same trend was also

observed both for accepted and to lesser extent for rejected mutations. The third codon positions much more

mutations accepted than rejected because of very similar composition to the mutational stationary distribution,

whereas the ﬁrst positions accumulated the smallest number of mutations and rejected the most as a result of

strong selection on its nucleotide composition. The obtained results showed different response of three codon

positions on mutational pressure related with their characteristic nucleotide composition.

1 INTRODUCTION

One of characteristic features of protein coding se-

quences resulting from their coding and functional re-

quirements is their triplet (codon) structure, which is

related to a speciﬁc nucleotide composition of three

codon positions (Wong and Cedergren, 1986; An-

derson and Kurland, 1990; Zhang and Zhang, 1991;

Gutierrez et al., 1996; Cebrat et al., 1997a; Cebrat

et al., 1998; Wang, 1998). There are two forces, mu-

tation pressure and selection constraints, which can

change or maintain this composition (Frank and Lo-

bry, 1999).

The ﬁrst two codon positions are usually sub-

jected to strong selective constraints although some

inﬂuence of replication-associated mutational pres-

sure was also observed (McLean et al., 1998; Ce-

brat et al., 1999; Mackiewicz et al., 1999a; Tillier

and Collins, 2000; Kowalczuk et al., 2001b). Gen-

erally, the ﬁrst codon positions of protein coding se-

quences are rich in purines, guanine (G) and adenine

(A), whereas the second positions are poor in guanine

and contain more cytosine (C) and adenine. The dom-

inance of purines, and particularly guanine in the ﬁrst

codon position and their deﬁciency in the second po-

sition can ensure the correct reading frame of tran-

scripts during translation by interaction of nucleotides

in the ﬁrst codon positions of mRNA with also period-

ically distributed cytosines in rRNA (Trifonov, 1987;

Lagunez-Otero and Trifonov, 1992; Trifonov, 1992).

In support of this, highly expressed genes are char-

acterized by increased usage of codons starting from

guanine and to lesser extent from adenine, which does

not depend on the overall G+C content in the genome

(Gutierrez et al., 1996; Pan et al., 1998; Akashi,

2003; Das et al., 2005). This composition reﬂects

also the frequent usage of acidic amino acids coded

by GAN codons (Karlin and Mrazek, 1996) as well as

glycine, alanine and valine in coded proteins (Karlin

et al., 1992). It was found that these amino acids are

very common in products of highly transcribed genes

(Jansen and Gerstein 2000, Akashi 2003, Marin et al.

2003). The excess of purines in the coding sequences

(Shepherd, 1981; Smithies et al., 1981; Karlin and

Burge, 1995; Cebrat et al., 1997b; Freeman et al.,

1998) was also explained by their less susceptibility to

245

Grabi

nska M., Blazej P. and Mackiewicz P..

Studies of Mutation Accumulation in Three Codon Positions using Monte Carlo Simulations and Metropolis-Hastings Algorithm.

DOI: 10.5220/0004911502450252

In Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms (BIOINFORMATICS-2014), pages 245-252

ISBN: 978-989-758-012-3

 2014 SCITEPRESS (Science and Technology Publications, Lda.)

mutations than pyrimidines (Hutchinson, 1996). Dur-

ing transcription process, the sense strand of genes

stays longer in the single-stranded state. Therefore,

it is more exposed than the antisense strand, which

is preferably repaired and protected by proteins (Mel-

lon and Hanawalt, 1989; Hanawalt, 1991). Changes

in the second position in codons are generally more

conserved than in the ﬁrst one because mutations in

the former more often lead to changes in hydropho-

bicity and polarity of coded amino acid residues.

On the other hand, the third codon positions are

most of all subjected to accumulations of mutations

because most nucleotide substitutions in these sites

usually do not change coded amino acid residues or

their properties. However, not all substitutions are

necessarily neutral. Some preferences in usage of

synonymous codons (i.e. coded the same amino acid)

were observed in highly expressed genes, which is

positively correlated with tRNA content in cells and

the rate of translation (Ikemura, 1981; Ikemura, 1985;

Bennetzen and Hall, 1982; Sharp and Cowe, 1991;

Kanaya et al., 1999). The third codon position are

usually rich in pyrimidines, particulary in thymine

(T), probably as a result of the most frequent point

mutation, deamination of cytosine and its homologue

5-methylcytosine to uracile, which ﬁnally leads to

substitution C to T (Echols and Goodman, 1991; Lin-

dahl, 1993; Kreutzer and Essigmann, 1998).

There are two types of point mutations happening

in protein coding sequences: transitions (substitution

between the same chemical types of nucleotides, be-

tween purines and between pyrimidines) and transver-

sions (substitution between the different types of nu-

cleotides, between purines and pyrimidines). Transi-

tions are usually several times more often observed

in real sequences than transversions although the ex-

pected ratio is 1:2 if all substitutions are equally likely

(Wakeley, 1996). This bias results from higher rate of

chemical changes between nucleotides with the sim-

ilar structure and more common transition substitu-

tions introduced during replication of genetic mate-

rial. Moreover, transitions more rarely cause changes

in coded amino acids or their properties, therefore are

more often accepted than transversions.

To study the inﬂuence of mutations and differ-

ent transition/transversion rate on accumulation of

substitutions in three codon position of protein cod-

ing sequences, we elaborated Monte Carlo simulation

model of genome evolution. As a selection module,

we applied selection against occurrence of stop trans-

lation codons and a modiﬁed Metropolis-Hastings al-

gorithm to keep nucleotide composition characteristic

of a given codon position by acceptance or rejection

of introduced mutations.

2 MATERIALS AND METHODS

The simulations were carried out for two million steps

on the population of 72 individuals that represented

protein coding sequences from bacterial genome of

Borrelia burgdorferi. This genome is very suitable for

mutation simulation studies (Kowalczuk et al., 1999;

Bła

zej et al., 2012) because shows very strong com-

positional bias related to differently replicated lead-

ing/lagging DNA strands (McInerney, 1998; Mack-

iewicz et al., 1999b). Moreover, it has the de-

termined mutational pressure associated with DNA

replication (Kowalczuk et al., 2001a). In our sim-

ulation, each individual consisted of 333 gene se-

quences, with the total length of 353, 035 bp, ly-

ing on the leading strand. The sequences and their

annotations were downloaded from NCBI database

(http://www.ncbi.nlm.nih.gov).

Table 1: The substitution rate matrix P corresponding to

HKY85 model, used in simulations. A nucleotide in the

column is substituted by a nucleotide in the row. π

is the

stationary frequency of a given nucleotide, whereas α cor-

responds to the transition rate.

A C G T

A - π

απ

C π

- π

απ

G απ

- π

T π

απ

The applied Monte Carlo simulations consisted of

two stages: mutation of gene sequences and selec-

tion of individuals in population. The mutations were

introduced into the sequences according to the Pois-

son process with average equal to one mutation per

genome. Nucleotide substitutions (mutations) were

generated by a probability matrix P (Table 1) de-

scribed by the HKY85 model (Hasegawa et al., 1985).

This model distinguished transversion and transition

rates as well as assumed that a given substitution was

proportional to the stationary frequency of nucleotide

that was created by this substitution. The station-

ary distribution of nucleotides was the same as for the

empirical matrix describing mutational pressure for

the leading strand in B. burgdoferi genome (Table 2),

which was also used in these simulation for compari-

son.

We decided to use the modiﬁed model of nu-

cleotide substitution because it enabled easy imple-

mentation of various transition rates α and, simulta-

neously, inclusion of the assumed stationary distribu-

tion. We tested different values of α from 0.1 to 10

with the step of 0.1. For all cases, transversion rate

was ﬁxed to 1 and deﬁned only from frequencies of

nucleotides under the stationary distribution π.

BIOINFORMATICS2014-InternationalConferenceonBioinformaticsModels,MethodsandAlgorithms

246

Table 2: The uniformized substitution matrix describing

real mutational pressure for the leading DNA strand in the

B. burgdorferi genome (Kowalczuk et al., 2001a). A nu-

cleotide in the column changes to a nucleotide in the row

with the given probability.

A C G T

A 0.81 0.02 0.07 0.10

C 0.07 0.62 0.05 0.26

G 0.16 0.01 0.71 0.12

T 0.07 0.03 0.03 0.87

Every substitution rate matrix was transformed

to jump probability matrix using uniformization

method (Tijms, 2003), see Table 3 as an example.

This approach is generally used to change the original

continuous in time Markov process with non-identical

leaving rates into an equivalent of stochastic process

where transition between each states are generated by

Poisson process with the same ﬁxed rate. This method

is very useful in the simulation of multidimensional

Markov processes.

Table 3: The uniformized substitution matrix for the

HKY85 model assuming the transition/transversion ratio

α=1.1 as for the real mutational matrix. A nucleotide in the

column changes to a nucleotide in the row with the given

probability.

A C G T

A 0.79 0.02 0.07 0.12

C 0.07 0.64 0.03 0.26

G 0.17 0.01 0.70 0.12

T 0.08 0.03 0.03 0.86

Two types of selection were applied. One was

against occurrence of termination translation codons.

If one of three possible stop codons occurred inside

a given protein coding sequence then the individual

was removed from the population and replaced by an-

other. The second type of selection was for main-

tenance of characteristic nucleotide composition in

each of three positions in codon. To do so, we ap-

plied for every codon position a modiﬁed acceptance-

rejection method based on Metropolis-Hastings (MH)

algorithm (Chib and Greenberg, 1995). In contrast

to the original MH algorithm that generates a se-

quence of random samples from a stationary distribu-

tion π, we computed acceptance probability for each

nucleotide substitution a

using proposal transition

probabilities and the assumed stationary distribution

of nucleotides in three codon positions, separately:

= min(

, 1), π

> 0,

where π

, x ∈ {A,C, T,G} is frequency of nucleotides

in particular codon positions of protein coding se-

quences, whereas q

are transition probabilities from

matrix P which generates mutation process (see Ta-

ble 4 as an example).

Table 4: The acceptance-rejection matrices for three codon

positions based on the HKY85 matrix assuming the transi-

tion/transversion ratio α=1.1.

the ﬁrst codon position

A C G T

A 1 1 1 0.39

C 0.62 1 1 0.25

G 0.52 0.84 1 0.21

T 1 1 1 1

the second codon position

A C G T

A 1 1 0.91 0.64

C 0.39 1 0.36 0.25

G 1 1 1 0.70

T 1 1 1 1

the third codon position

A C G T

A 1 1 1 1

C 0.84 1 0.88 0.85

G 0.96 1 1 0.96

T 0.99 1 1 1

Additionally, the acceptance of substitution was

determined by a random variable U with uniform dis-

tribution with the range [0, 1]. If U > a

, the substitu-

tion of x by y was rejected, otherwise it was accepted.

It allowed to keep characteristic nucleotide composi-

tion in three codon position during simulations. For

instance, if π

> π

, it indicated that changes

from nucleotide y to x were too often than from nu-

cleotide x to y, then the move from nucleotide x to

new state y was accepted. An individual in which the

substitution was rejected, was ’killed’ and replaced by

another from the population.

3 RESULTS AND DISCUSSION

3.1 Nucleotide Composition in Three

Codon Positions

Analysed protein coding sequences show character-

istic nucleotide composition in three codon positions

(Table 5). The ﬁrst position is signiﬁcantly rich in

purines, adenine and guanine. However, it should be

noticed that guanine is two times more frequent in

this position than in others. The second position has

generally more adenine and thymine with compara-

ble frequencies although cytosine reaches the highest

StudiesofMutationAccumulationinThreeCodonPositionsusingMonteCarloSimulationsandMetropolis-Hastings

Algorithm

247

usage just in this position. The third position is also

AT-rich but thymine signiﬁcantly dominates. Inter-

estingly, the composition of the third position is strik-

ingly similar to the stationary distribution of empiri-

cal mutational matrix, whereas the composition of the

ﬁrst position signiﬁcantly differs. It strongly prefers

purines. It indicates that the third codon position is

subjected to the weakest selection pressure then freely

accumulates nucleotide substitutions resulting from

mutations. On the other hand, the global composi-

tion of the ﬁrst position is the least susceptible to the

mutational pressure and is under the strongest selec-

tion.

Table 5: Nucleotide frequency for three positions in codon

and stationary distribution for empirical mutational matrix.

A C G T

stationary 0.32 0.06 0.14 0.48

1st position 0.37 0.11 0.30 0.22

2nd position 0.35 0.17 0.14 0.34

3rd position 0.31 0.07 0.14 0.48

Figure 1: DNA walks, a graphical representation of nu-

cleotide composition in three codon position of protein cod-

ing sequence from B. burgdorferi. The walk starts at the ﬁrst

nucleotide in the ﬁxed codon position and jumps every third

nucleotide to the last one. Every jump begins at the origin of

a Cartesian plane and is associated with a unit shift, which

depends on the nucleotide visited during the walk. The shift

is (0; 1) for guanine, (1; 0) for adenine, (0;-1) for cytosine,

and (-1; 0) for thymine. The vector indicates the stationary

distribution of empirical mutational matrix.

The compositional trends are very well visualised

by DNA walks (Figure 1), which are graphical repre-

sentation of nucleotide composition in an analysed se-

quence (Cebrat and Dudek, 1998; Cebrat et al., 1998).

The longest walk is clearly visible in the ﬁrst codon

position, which indicates the strongest compositional

trend, i.e. strong preference of some nucleotides (here

purines) than other. The clear trend is also in the third

position, which very well matches the stationary com-

position generated by the empirical mutational ma-

trix and shows excess of guanine over cytosine and

thymine over adenine. On the other hand, the weak-

est trend is in the second position, which indicates

that there are no special preferences in nucleotide oc-

currence in this position. It means that this position

has more balanced frequency of complementary nu-

cleotides, adenine vs. thymine and guanine vs. cyto-

sine.

3.2 Simulations of Mutation and

Selection Processes

Simulations for different transition/transversion ratios

(calculated from elements of uniformised probability

matrices) showed that the mean number of individuals

eliminated from populations decreased in exponential

manner with the increase of the ratio (Figure 2). It

indicates a positive effect of the excess of transitions

over transversions on genome survival.

●

0 1 2 3 4 5

115000 125000 135000

transition/transversion ratio

mean number of killed individuals

Figure 2: Mean number of eliminated individuals from pop-

ulation in the relationship to transition/transversion ratio.

The black diamond indicates the empirical mutational ma-

trix.

It is in agreement with the fact that transversions

are more harmful by changing of coded amino acid

than transition in protein coding sequences. The em-

pirical matrix appeared very similar in the number of

eliminated genomes to the HKY85 matrix assuming

the same transition/transversion ratio 1.1. It seems

that the applied HKY85 model is very good approxi-

mation of the real mutational matrix (please compare

BIOINFORMATICS2014-InternationalConferenceonBioinformaticsModels,MethodsandAlgorithms

248

Table 3 and Table 2).

The mean number of accepted mutations in all

codon positions was more than two times higher

than the rejected ones (Figure 3). The number of

both mutation types decreased with growth of tran-

sition/transversion ratio although the fall was larger

for the accepted mutations. However, in both cases,

the decrease became weaker and ﬁnally stabilized for

higher transition/transversion ratios. It indicates that

the increase in the ratio is not necessary to signiﬁ-

cantly diminish the rejected mutations. Interestingly,

the values obtained for empirical matrix were very

similar to the HKY85 model with the similar transi-

tion/transversion ratio. Genomes in the simulations

with the real matrix accepted only slightly less muta-

tions than in the case of the HKY85 model.

●

0 1 2 3 4 5

100000 150000 200000 250000 300000 350000

transition/transversion ratio

mean number of mutations

●

rejected

accepted

Figure 3: Mean number of rejected and accepted mutations

in the relationship to transition/transversion ratio. The black

circle and diamond indicate the empirical mutational ma-

trix.

The mean of accepted mutations exceeded the

number of rejected ones in three codon positions

for tested values of transition/transversion ratio (Fig-

ure 4). The greatest difference between these num-

bers was for the third codon positions. The muta-

tions were most frequently accepted and most rarely

rejected in these positions. It results from very high

similarity between nucleotide composition of these

positions with the stationary distribution of the ap-

plied mutational matrix (Table 5, Figure 1). It indi-

cates that the third codon positions are subjected to

the weakest selection for the nucleotide composition

and can quite freely accumulated mutations. Actually,

they very well reﬂect mutational pressure associated

with replication (McLean et al., 1998; Cebrat et al.,

1999). On the other hand, the ﬁrst codon positions

accumulated the smallest number of mutations and re-

jected the most in comparison to other positions. In-

terestingly, the number of accepted and rejected mu-

tations in the ﬁrst codon positions became very simi-

lar when transition/transversion ratio declined. These

strong restrictions on mutation accumulation in the

ﬁrst positions in our simulations result from the sub-

stantial compositional trend in these positions (Fig-

ure 1), which signiﬁcantly deviates from the compo-

sition generated by the applied mutational pressure.

As it was reviewed in the Introduction, this spe-

ciﬁc composition is strongly related with various se-

lection constraints on coding function of protein gene

sequences. The number of mutations for the second

positions had intermediate values between the ﬁrst

and third positions. In real sequences the second po-

sition usually is more conserved than the ﬁrst one be-

cause substitutions in it always change coded amino

acid and very often its physicochemical properties.

However, our simulation considered only the effect

of selection on nucleotide compositions but not re-

strictions on amino acid substitution. Thus our results

suggest that the selection on nucleotide composition

is weaker in the second codon position than the ﬁrst

one.

The relationship between the mean number

of accepted or rejected mutations and transi-

tion/transversion ratio appeared different for three

codon positions (Figure 4). Similarly to the case

of mutations calculated for all positions (Figure 3),

the number of accepted mutations for third and sec-

ond codon positions declined rapidly with transi-

tion/transversion ratio and then begun stabilised for

large values of ratio. However, the number of ac-

cepted mutations for ﬁrst positions was stable and did

not depend on the ratio. On the other hand, the ex-

ponential decrease was observed for the number of

rejected mutations in these positions. In the remain-

ing cases, the number of rejected mutations did not

seem to depend from the transition/transversion ratio.

Only a small increase in the number of rejected muta-

tions was observed for the second and third codon po-

sitions. All the results about the number of accepted

and rejected mutations suggest that excess of transi-

tions over transversions has positive effect on mainte-

nance of nucleotide composition characteristic of the

ﬁrst codon position, whereas negative in the case of

other codon positions.

The differences between numbers of accepted and

rejected mutations for the empirical matrix and the

corresponding HKY85 model were generally very

small (Table 6). The largest deviations were observed

for the third codon positions especially for the number

of rejected mutations. The empirical matrix rejected

StudiesofMutationAccumulationinThreeCodonPositionsusingMonteCarloSimulationsandMetropolis-Hastings

Algorithm

249

accepted

mutations

rejected

mutations

1st codon position

2nd codon position

3rd codon position

Figure 4: Mean number of rejected and accepted mutations in the relationship to transition/transversion ratio for three codon

positions. The diamond, triangle and circle symbols indicate the empirical mutational matrix.

two times more mutations than the HKY85. Never-

theless, the number of these mutations for the third

codon positions were much smaller (ﬁve to almost

twenty times) in comparison to other positions.

Table 6: The number of accepted and rejected mutations

in three codon positions for the empirical matrix and the

HKY85 matrix assuming the transition/transversion ratio

α=1.1.

accepted rejected

empirical HKY85 empirical HKY85

1st 81398 82757 66631 69663

2nd 96715 99078 42260 42627

3rd 113363 122957 8238 3580

3.3 CONCLUSIONS

The obtained results showed that excess of transi-

tion over transversion in mutational pressure is gener-

ally proﬁtable for the studied genome because mean

number of eliminated individuals decreased exponen-

tially with the growth of transition/transversion ra-

tio (Figure 2). It is well-known that transversions

are more harmful than transitions because they more

frequently change coded amino acid. However, the

presented results are not trivial because the simula-

tions did not consider any selection on coded amino

acids. Instead of that, we applied independent selec-

tion on nucleotide composition in three codon posi-

tions. The results indicate that not only genetic code

and amino acid composition but also nucleotide com-

position typical of the ﬁrst codon positions are op-

timized for high transition/transversion ratio. More-

over, these codon positions appeared most conserved

because accepted the least and rejected the largest

number of mutations. Interestingly, the second codon

positions usually considered conserved according to

the effect on amino acid substitution were more tol-

erant on mutation accumulation in our simulations.

Because the applied model considered selection on

nucleotide composition, it seems that maintenance of

this composition by selection is more important for

the ﬁrst codon positions than for the second ones. It

would be interesting to check if these conclusions are

universal and valid for other genomes.

BIOINFORMATICS2014-InternationalConferenceonBioinformaticsModels,MethodsandAlgorithms

250

REFERENCES

Akashi, H. (2003). Translational selection and yeast pro-

teome evolution. Genetics, 164:1291–1303.

Anderson, S. G. E. and Kurland, C. G. (1990). Codon

preferences in free-living microorganisms. Microbiol.

Rev., 54:198–210.

Bennetzen, J. L. and Hall, B. D. (1982). Codon selection in

yeast. J Biol Chem, 257(6):3026–3031.

Bła

zej, P., Mackiewicz, P., and Cebrat, S. (2012). Simula-

tion of bacterial genome evolution under replicational

mutational pressures. In Proceedings of the BIOSTEC

2012, 5th International Joint Conference on Biomedi-

cal Engineering Systems and Technologies Bioinfor-

matics 2012, International Conference on Bioinfor-

matics Models, Methods and Algorithms, Vilamoura,

Algarve, Portugal, 1-4 February, pages 51–57.

Cebrat, S. and Dudek, M. (1998). The effect of DNA

phase structure on DNA walks. The European Phys-

ical Journal B-Condensed Matter and Complex Sys-

tems, 3(2):271–276.

Cebrat, S., Dudek, M. R., Gierlik, A., Kowalczuk, M., and

Mackiewicz, P. (1999). Effect of replication on the

third base of codons. Physica A, 265:78–84.

Cebrat, S., Dudek, M. R., and Mackiewicz, P. (1998). Se-

quence asymmetry as a parameter indicating coding

sequence in Saccharomyces cerevisiae genome. The-

ory in Biosciences, 117:78–89.

Cebrat, S., Dudek, M. R., Mackiewicz, P., Kowalczuk, M.,

and Fita, M. (1997a). Asymmetry of coding ver-

sus non-coding strands in coding sequences of differ-

ent genomes. Microbial & Comparative Genomics,

2:259–268.

Cebrat, S., Dudek, M. R., and Rogowska, A. (1997b).

Asymmetry in nucleotide composition of sense and

antisense strands as a parameter for discriminating

open reading frames as protein coding sequences. J.

Appl. Genet., 38:1–9.

Chib, S. and Greenberg, E. (1995). Understanding the

Metropolis-Hastings Algorithm. The American Statis-

tician, 49(4):327–335.

Das, S., Ghosh, S., Pan, A., and Dutta, C. (2005). Compo-

sitional variation in bacterial genes and proteins with

potential expression level. FEBS Letters, 579:5205–

5210.

Echols, H. and Goodman, M. F. (1991). Fidelity mech-

anisms in DNA replication. Annu Rev Biochem,

60:477–511.

Frank, A. and Lobry, J. (1999). Asymmetric substitution

patterns: a review of possible underlying mutational

or selective mechanisms. Gene, 238:65–77.

Freeman, J., Plasterer, T., Smith, T., and Mohr, S. (1998).

Patterns of genome organization in bacteria. Science,

279:1827.

Gutierrez, G., Marquez, L., and Martin, A. (1996). Pref-

erence for guanosine at ﬁrst codon position in highly

expressed Escherichia coli genes. a relationship with

translation efﬁciency. Nucleic Acids Res., 24:2525–

2528.

Hanawalt, P. C. (1991). Heterogeneity of dna repair at

the gene level. Mutation Research/Fundamental and

Molecular Mechanisms of Mutagenesis, 247(2):203–

211.

Hasegawa, M., Kishino, H., and Yano, T. (1985). Dating of

the human-ape splitting by a molecular clock of mito-

chondrial dna. J Mol Evol, 22(2):160–174.

Hutchinson, F. (1996). Mutagenesis. In Neidhardt,

F. C., editor, Escherichia coli and Salmonella. Cel-

lular and molecular biology, pages 749–763. Asm.

Press, Washington D.C.

Ikemura, T. (1981). Correlation between the abundance of

Escherichia coli transfer RNAs and the occurrence of

the respective codons in its protein genes. J Mol Biol,

146(1):1–21.

Ikemura, T. (1985). Codon usage and tRNA content in uni-

cellular and multicellular organisms. Mol. Biol. Evol.,

2:1334.

Kanaya, S., Yamada, Y., Kudo, Y., and Ikemura, T. (1999).

Studies of codon usage and trna genes of 18 unicel-

lular organisms and quantiﬁcation of Bacillus subtilis

trnas: gene expression level and species-speciﬁc di-

versity of codon usage based on multivariate analysis.

Gene, 238(1):143–155.

Karlin, S., Blaisdell, B. E., and Bucher, P. (1992). Quantile

distributions of amino acid usage in protein classes.

Protein Eng, 5(8):729–738.

Karlin, S. and Burge, C. (1995). Dinucleotide relative abun-

dance extremes: a genomic signature. Trends Genet,

11(7):283–290.

Karlin, S. and Mrazek, J. (1996). What drives codon choices

in human genes? J Mol Biol, 262(4):459–472.

Kowalczuk, M., Gierlik, A., Mackiewicz, P., Cebrat, S.,

and Dudek, M. (1999). Optimization of gene se-

quences under constant mutational pressure and selec-

tion. Physica A: Statistical Mechanics and its Appli-

cations, 273(1):116–131.

Kowalczuk, M., Mackiewicz, P., Mackiewicz, D., Now-

icka, A., Dudkiewicz, M., Dudek, M., and Cebrat,

S. (2001a). High correlation between the turnover of

nucleotides under mutational pressure and the DNA

composition. BMC Evol. Biol., 1:13.

Kowalczuk, M., Mackiewicz, P., Mackiewicz, D., Nowicka,

A., Dudkiewicz, M., Dudek, M. R., and Cebrat, S.

(2001b). DNA asymmetry and the replicational muta-

tional pressure. J. Appl. Genet., 42(4):553–577.

Kreutzer, D. A. and Essigmann, J. M. (1998). Oxidized,

deaminated cytosines are a source of C → T transi-

tions in vivo. Proc Natl Acad Sci U S A, 95(7):3578–

3582.

Lagunez-Otero, J. and Trifonov, E. N. (1992). mRNA pe-

riodocal infrastructure complementary to the proof-

reading site in the ribosome. J. Biomol. Struct. Dyn.,

10:455–464.

Lindahl, T. (1993). Instability and decay of the primary

structure of DNA. Nature, 362(6422):709–715.

Mackiewicz, P., Gierlik, A., Kowalczuk, M., Dudek, M.,

and Cebrat, S. (1999a). Asymmetry of nucleotide

composition of prokaryotic chromosomes. J. Appl.

Genet., 40:1–14.

Mackiewicz, P., Gierlik, A., Kowalczuk, M., Szczepanik,

D., Dudek, M., and Cebrat, S. (1999b). Mechanisms

StudiesofMutationAccumulationinThreeCodonPositionsusingMonteCarloSimulationsandMetropolis-Hastings

Algorithm

251

generating long-range correlation in nucleotide com-

position of the Borrelia burgdorferi genome. Physica

A, 273:103–115.

McInerney, J. (1998). Replicational and transcriptional se-

lection on codon usage in Borrelia burgdorferi. Proc.

Natl. Acad. Sci. U.S.A., 95:10698–10703.

McLean, M., Wolfe, K., and Devine, K. (1998). Base com-

position skews, replication orientation, and gene ori-

entation in 12 prokaryote genomes. J. Mol. Evol.,

47:691–696.

Mellon, I. and Hanawalt, P. C. (1989). Induction of

the Escherichia coli lactose operon selectively in-

creases repair of its transcribed DNA strand. Nature,

342(6245):95–98.

Pan, A., Dutta, C., and Das, J. (1998). Codon usage in

highly expressed genes of Haemophillus inﬂuenzae

and Mycobacterium tuberculosis: translational selec-

tion versus mutational bias. Gene, 215:405–413.

Sharp, P. M. and Cowe, E. (1991). Synonymous codon

usage in Saccharomyces cerevisiae. Yeast, 7(7):657–

678.

Shepherd, J. C. (1981). Method to determine the reading

frame of a protein from the purine/pyrimidine genome

sequence and its possible evolutionary justiﬁcation.

Proc. Natl. Acad. Sci. USA, 78:1596–1600.

Smithies, O., Engels, W. R., Devereux, J. R., Slightom,

J. L., and Shen, S. (1981). Base substitutions, length

differences and DNA strand asymmetries in the hu-

man g gamma and a gamma fetal globin gene region.

Cell, 26:345–353.

Tijms, H. (2003). A ﬁrst course in stochastic processes.

John Wiley & Sons LTD.

Tillier, E. and Collins, R. (2000). The contributions of

replication orientation, gene direction, and signal se-

quences to base composition asymmetries in bacterial

genomes. J. Mol. Evol., 50:249–257.

Trifonov, E. N. (1987). Translation framing code and frame-

monitoring mechanism as suggested by the analysis of

mRNA and 16S rRNA nucleotide sequences. J. Mol.

Biol., 194:643–652.

Trifonov, E. N. (1992). Recognition of correct reading

frame by the ribosome. Biochimie, 74:357–362.

Wakeley, J. (1996). The excess of transitions among nu-

cleotide substitutions: new methods of estimating

transition bias underscore its signiﬁcance. Trends in

Ecology and Evolution, 11:158–163.

Wang, J. (1998). The base contents of A, C, G, or U for

three codon positions and the total coding sequences

show positive correlation. J. Biomol. Struct. Dyn.,

16:51–57.

Wong, J. T. and Cedergren, R. (1986). Natural selection ver-

sus primitive gene structure as determinant of codon

usage. Eur. J. Biochem., 159:175–180.

Zhang, C. T. and Zhang, R. (1991). Analysis of distribution

of bases in codon in the coding sequences by a dia-

grammatic technique. Nucleic Acids Res., 19:6313–

6317.

BIOINFORMATICS2014-InternationalConferenceonBioinformaticsModels,MethodsandAlgorithms

252