CO-EVOLUTION IN HIV ENZYMES

P. Boba, P. Weil, F. Hoffgaard and K. Hamacher

Bioinformatics & Theoretical Biology Group, Technische Universit

at Darmstadt, Germany

Keywords:

Sequence analysis, Human immunodeﬁciency virus, HIV, Co-evolution, Mutual information, Data mining,

Machine learning.

Abstract:

Proteins as molecular phenotypes need to maintain their stability, fold, and the functionality throughout their

individual and collective evolution. Such important properties are maintained by a selective pressure that

reveals itself in sequence data sets. Small adaptive changes are usually possible, but in general the conservation

of structure and function implies the co-evolution of amino acids within the molecule. We analyze two most

important enzymes in the progression of viral infection by the human immunodeﬁciency virus (HIV) – namely

the reverse transcriptase and the protease – under an information theoretical framework to derive insight into

the selective pressure acting locally and globally on the enzymes. To this end we computed mutual information

inside the proteins and between the proteins for some 40,000 sequences. We discuss the results of intra- and

inter-protein co-evolution of residues in these enzymes and ﬁnally annotate important structural-evolutionary

correlations. In particular we focus on the reverse transcriptase and a small signal indicating a potential co-

evolution between the protease and the reverse transcriptase. We convinced ourselves that our sampling is

sufﬁciently large and that no normalization schemes needs to be applied. We conclude with a short outlook

into potential implications for drug resistance development.

1 INTRODUCTION

The acquired immunodeﬁciency syndrome (AIDS)

is induced by the human immunodeﬁciency virus

(HIV). Its viral replication cycle depends on the virus

own protease and several other enzymes such as the

reverse transcriptase. Currently the anti-HIV drugs

target these two enzymes to prevent the maturation of

new virions (Tsygankov, 2009; Wlodawer and Erick-

son, 1993).

Neutral evolution and drug resistance develop-

ment have been under investigation for a long time:

1) the high mutation rate of HIV makes the virus

an interesting evolutionary object in itself as it per-

forms a large-scale mutagenesis study (Perelson et al.,

1996); 2) a deeper knowledge of potential evolution-

ary barriers might lead to new therapeutics besides the

HAART-procedure (Richman et al., 2009).

The theoretical understanding of the viral evolu-

tion has greatly improved over the recent years (Rong

et al., 2007; Chen and Lee, 2006; Trylska et al., 2007;

Hamacher and McCammon, 2006), even the biophys-

ical annotation based on in silico models of the molec-

ular dynamics is under way (Hamacher, 2008).

At the same time the wealth of information on

HIV - in particular the large data sets of sequences

- prompt for a deeper analysis on the sequence level

alone. Here we leverage an available data set of

45,161 mutant sequences of the HIV-1 protease (PR)

and reverse transcriptase (RT).

2 MATERIALS AND METHODS

2.1 Sequence Data

The 45,161 positive selection mutant sequences have

been collected by the Lee lab (Pan et al., 2007; Chen

et al., 2004) and were made available on the net. The

data set contains the genomic, nucleotide sequences

from treated and untreated patients under various drug

regimes. The individual entries are, however, not an-

notated by the drug treatment regime of the particu-

lar patient. We therefore ﬁnd in this data set the di-

verse evolutionary dynamics, including effects such

as neutral drift, drug resistance development, and

other selective pressures on the two enzymes. Wher-

ever a codon could not be mapped unequivocally to

an amino acid we used a wild-card character, treating

these cases similar to gaps.

For a comparison on the quality and potential

Boba P., Weil P., Hoffgaard F. and Hamacher K. (2010).

CO-EVOLUTION IN HIV ENZYMES.

In Proceedings of the First International Conference on Bioinformatics, pages 39-47

DOI: 10.5220/0002731800390047

 SciTePress

ﬁnite-size effects we created also sequence align-

ments with CLUSTALW (Thompson et al., 1994;

Higgins and Sharp, 1988) and standard parameters on

1. BLAST (Altschul et al., 1990) hits on a sequence

of viral ion channel Kcv, and 2. ribosomal proteins

from bacterial genomes extracted by Pfam Hidden-

Markov-Models (Finn et al., 2008).

2.2 Information Theoretical Measures

on (Co-)Evolution

The evolution of an amino acid at a position i means

a change in the symbols S

over time within a set of

acceptable values S . One way to quantify the infor-

mation content of such collections of symbols is the

Shannon entropy (Shannon, 1951)

:= −

∑

∈S

p(S

) · log



p(S

)



(1)

where p(S

) is the probability of the occurrence of the

symbol S

within the empirical or theoretical data set

under investigation. For empirical data sets one usu-

ally sets this probability to the frequency of the sym-

bol within the data set. Positions (in e.g. a sequence

alignment) with high entropy are then amino acids

with high variability during evolutionary times. Our

choice of S comprised the 20 standard amino acids

and the above mentioned wild-card character.

The correlated change in the amino acid com-

position within a molecule or between molecules is

now based on empirical found two-point probabilities

p(S

) for the co-evolution of positions i and j. We

can deﬁne the Mutual Information (MI) between these

positions as a relative entropy as follows (Lund et al.,

2005):

i, j

∑

∈S

p(S

) · log



p(S

)

p(S

) · p(S

)



= H

i, j

− H

(2)

The value of the MI gives the amount of informa-

tion that one position i conveys about the other po-

sition j. The MI can be derived from the Kullback-

Leibler divergence as a relative entropy, which has

- besides sequence based approaches - also attracted

attention as a measure in in silico drug design and

molecular biophysics (Hamacher, 2007).

In addition we applied two normalization proce-

dures to the MI in the following form, of which one

was suggested earlier (Gloor et al., 2005) to account

for potential sampling artefacts:

i, j

(2)

:= MI

i, j

(1,1)

:= MI

i, j

/(H

· H

) (3)

0 100 200 300 400 500 600 700

0.0

0.5

1.0

1.5

median MI

KCV

HIV1−Pr

S20

Figure 1: The median of the Mutual Information according

to equation 2 for sequence homologs of the ion channel Kcv

of plant viruses, bacterial ribosomal proteins S20 and S6,

and the HIV-1 protease. We display the mutual information

as a function of the number of sequences N included in the

computation.

These normalizations are principally necessary to

cope with ﬁnite data sets and potential absolute con-

servation of individual positions.

3 RESULTS

3.1 Finite Size Effects

To estimate size effects due to ﬁnite data we applied

an established protocol: in ﬁgure 1 we show the MI as

computed from randomized alignments of the listed

molecules. The randomization was performed inde-

pendently in each column by shufﬂing the characters.

A perfect randomized sample would provide for an

independence between columns i and j, and thus to

a vanishing two-point-distribution function p(S

This leads in equation 2 also to a vanishing MI

i, j

=0.

This allows to achieve an understanding of the statis-

tical signiﬁcance of a data set.

Clearly the data set for the viral enzymes under

consideration is sufﬁcient in comparison to the se-

quence collections for Kcv and ribosomal proteins.

We analyzed this further and found the reason for this

in the high gap content [data not shown] of the non-

viral proteins chosen for comparison. The sequences

for HIV-1 PR, on the other hand, do not contain any

gap character at all, while the gap/wild-card character

content for the RT is also negligible.

BIOINFORMATICS 2010 - International Conference on Bioinformatics

3.2 Effect of Normalization

We applied the two variants of the normalization pro-

cedure to the sequences of HIV-1 PR and RT. In ﬁg-

ure 2 we show the change of the computed correlation

measures under those normalization procedures. We

applied a non-linear correlation measure - namely the

Spearman ranking coefﬁcient - that is superior in re-

gard to the insight one gains into the relations between

the two data sets.

We note in passing that the results for the two nor-

malized MI variants MI

(2)

and MI

(1,1)

showed similar

distributions.

We draw three conclusions from the observations

in sections 3.1 and 3.2:

• effects to the ﬁnite-size of our data set are negli-

gible

• MI

(2)

was shown (Gloor et al., 2005) to take ﬁnite-

size effects most reliable into account

• the high Spearman correlation between MI

(2)

and

MI indicate an equivalence

We therefore will in the subsequent parts of the dis-

cussion in this manuscript focus solely on the MI as

there is no additional gain in using any normalization

in this particular case.

3.3 Comparing Intra- and Inter-Protein

Co-Evolution of Residues

In ﬁgure 3 we show distributions of the naked MI-

values from our study on both, the HIV-1 PR and the

HIV-1 RT, as well as the inter-MI for a potential co-

evolution of residues in these enzymes.

We observe similarity of MI results for the intra-

co-evolution within the individual, isolated enzymes.

Obviously the evolutionary dynamics gave rise to the

same overall “mutual information picture”.

The dissimilarity of the MI-distributions for the

RT/PR and the one for the inter-molecular MI comes

as no surprise: within a molecule the evolutionary

pressure on the co-evolving dynamics of amino acids

can be regarded as quite different in the evolution-

ary dynamics between residues in different molecules,

despite potential protein-protein-interactions or other

implicit interdependencies resulting from cell biolog-

ical effects or drug combinations.

Although the RT consists of four domains –

namely the ﬁnger, palm, thumb, connection domains

– the potential for co-evolution between sites dis-

tributed over the four domains runs approximately in

parallel to the scenario of the protease, both of which

- in turn - are constructed as a molecular phenotype in

form of homodimers.

Figure 2: Scatter plots of the natural logarithm of the MI

values, comparing normalization procedures of eqs. 3.

a) MI

i, j

and MI

i, j

(2)

; b) MI

i, j

and MI

i, j

(1,1)

; c) MI

i, j

(1,1)

and

i, j

(2)

. In each ﬁgure we give the Pearson correlation r

and the Spearman ranking coefﬁcient r

(W.H. Press et al,

1995) between the data points. Note that MI values smaller

than 10

−8

were omitted for numerical reasons.

In ﬁgure 4 we show a graph for the MI of the HIV-

1 PR that indicates structural as well as dynamical

features as discussed in (Hamacher, 2008). We omit a

picture for the RT as the molecule is too large to dis-

play single MI entries on single-pixel basis. The raw

data, however, is available from our web-site (Boba

and Hamacher, 2009) for future analysis

Work is underway to construct a software-package to

visualize such voluminous matrices as for the RT (Schreck

CO-EVOLUTION IN HIV ENZYMES

●

● ●

●

● ●

●

● ●

−5 −4 −3 −2 −1 0

0.0 0.2 0.4 0.6 0.8 1.0 1.2

log(MI)

Density

●

PR−RT

Figure 3: Comparison of the MI-values for the intra-protein

co-evolution within the HIV-1 Protease (black) and the

HIV-1 Reverse Transcriptase (blue). We compare to the

inter-MI for the co-evolution between residues of the HIV-1

Protease on the one hand and the HIV-1 Reverse Transcrip-

tase on the other (red).

20 40 60 80

Figure 4: The logarithm of the MI within the HIV-1 PR for

all pairings of residue numbers. Clearly we reproduce fea-

tures already found in (Hamacher, 2008) and have therefore

veriﬁed our analysis protocol.

To analyze our MI results further and to overlay

these with structural knowledge, we went on with

a spectral decomposition of the MI matrices for the

HIV-1 PR and HIV-1 RT. For the inter-MI values,

that would indicate potential co-evolution between

residues of different molecules, the MI matrix is,

however, non-quadratic as the protein lengths are in

general different. We therefore applied a singular

et al., 2009).

●

0 10 20 30 40 50

0.0 0.5 1.0 1.5

λλ

●

● ●

●

● ●

●

PR−RT

Figure 5: The leading eigenvalues λ

of a spectral decom-

position of the MI-matrices. In black we show the ones

for the (99 × 99) MI-matrix of HIV-1 PR, in blue for the

(348× 348) MI-matrix of HIV-1 RT (we restricted our anal-

ysis to the ﬁrst 348 residues to cope with bad sequence res-

olution towards the C-terminus of the RT in the underly-

ing data set). The red points give the singular values of the

singular value decomposition (W.H. Press et al, 1995) of

the inter-co-evolution matrix, which reﬂects MI-values be-

tween residues in the PR on the one hand and the RT on

the other. The fast decay of the λ

justiﬁes a reconstruction

of the respective MI-matrices by just a few, even only one

eigenvector.

value decomposition (W.H. Press et al, 1995) to ob-

tain a pseudo-spectral decomposition with respect to

the singular values of the inter-MI matrix. The fast

decay of the eigenvalues/singular values as shown in

ﬁgure 5 indicates that a reconstruction of the whole

MI matrices can be achieved by just a few eigenvec-

tors, thus this small set of eigenvectors contains most,

if not all, of the mutual information.

If we now overlay these eigenvectors onto the

structures of the molecules, we can immediately con-

nect structural and evolutionary information. This is

done in ﬁgures 6 and 7.

Figures 6 and 7 both show high mutual informa-

tion for secondary structure elements. In particular

the β-sheet in the PR needs to be maintained as a

structural basis of the fold of this protein. This is

achieved by co-evolution of the residues within this

element. In the RT the β-sheet close to the reactive

center, as well as the α-helices forming the “ﬁnger”

of the RT are structurally maintained by co-evolving

the residues without giving them the freedom to inde-

pendently mutate.

In ﬁgure 7 we show in addition relevant residues

as small spheres. The colors indicate: yellow=three

catalytic aspartic acids; green=residues that enhance

BIOINFORMATICS 2010 - International Conference on Bioinformatics

Figure 6: a) sequence entropy of the HIV-1 PR as in eq. 1;

b) absolute values of the entries of the 1

eigenvector for

the MI matrix of HIV-1 PR; c) absolute values of the en-

tries of the 2

eigenvector. We rescaled all values so that

blue=maximum value, red=minimum value.

the excision reaction; red=non-nucleoside inhibitor

binding pocket; black=residues involved in NRTI-

resistance. This mapping was done in accordance

with previous work (Saraﬁanos et al., 2004).

We note in passing that the level of conservation

needs to be taken into account: an absolutely con-

served position shows a MI of zero, always, as the

knowledge about the identity of a residue here does

not convey any information on any other position. We

therefore decided to also display the sequence entropy

of equation 1 as a measure of local sequence conser-

vation in the ﬁgures 6 and 7. We return to this issue

Figure 7: I) a) sequence entropy of the HIV-1 RT as in eq. 1;

b) absolute values of the entries of the 1

eigenvector for

the MI of HIV-1 RT. II) viewed from orthogonal projection

& rescaled as in ﬁg. 6. The black part is the C-terminus for

which we had only insufﬁcient statistics; we omitted it from

our analysis. The small spheres indicate functional sites as

discussed in the text. Domains are indicated as follows:

light blue=”ﬁngers”, green=”thumb”, gray=”palm”.

CO-EVOLUTION IN HIV ENZYMES

in the disucssion section.

Furthermore one can see the “correlation” of low

sequence entropy and therefore the necessarily low

mutual information in ﬁgure 8. In this ﬁgure we mo-

tivate the classiﬁcation of an amino acid by its two

evolutionary/co-evolutionary measures, that is the se-

quence variability as expressed by the entropy of eq. 1

and the contribution to mutual information correlation

expressed by the respective entry in the leading eigen-

vector.

Class I (low sequence entropy, high MI) must be

empty. Classes II and IV are the important ones for

evolution: amino acids found in class IV are subject

to extensive selective pressure to maintain their iden-

tity (low sequence entropy, thus small sequence vari-

ability). Evolution acts here locally to force sequence

stabilization. Amino Acids in class II on the other

hand can vary extensively (large sequence entropy),

but at the same time convey information about other

amino acids, thus show correlation with other sites

within the protein. Accordingly the high MI reﬂects a

selective pressure to maintain not a particular amino

acid character, but instead to maintain some “interac-

tion”. This “interaction” might be a physical, direct

interaction such as steric repulsion or charge inter-

actions, but could also reﬂect, e.g. folding properties

of a monomer or recognition capabilities in protein-

protein-binding mechanisms. One might hypothesize

about the origin of this correlation or “interaction”,

but a high MI indicates always a selective pressure to

connect residues.

Class III on the other hand consists of those amino

acids, which are highly variable (high sequence en-

tropy), but at the same time show low dependence on

other sites within the protein, and thus low connection

via MI to these other positions.

3.4 Co-Evolution between Residues in

the PR and the RT?

In ﬁgure 3 we have seen that the inter-MI is some

two orders of magnitude smaller than the overall MI

of the intra-MI. Combining this insight with the typ-

ical values found in MI studies as shown in ﬁgure 1

we nevertheless ﬁnd a basal co-evolution between the

two enzymes under investigation.

Although such a co-evolution is counter-intuitive

at ﬁrst sight, their might be some small or even un-

known interdependencies between the two molecules.

For example one effect might be due to the packing of

RNA in the viral capsid and the genes coding for the

enzymes are in close vicinity of the RNA. Such pack-

ing is highly susceptible to local charges and balances

thereof - probably leading to long-range correlations

0.1 1

Sequence Entropy [bit]

0.01

0.1

|entry of first eigenvector of MI|

III

0.01 0.1 1

Sequence Entropy [bit]

0.001

0.01

0.1

|entry of first eigenvector of MI|

III

Figure 8: a) The correlation of MI contributions as found

in the ﬁrst eigenvector of the MI-matrix of PR vs. the re-

spective sequence entropy of the amino aids in the PR. We

applied an intuition driven classiﬁcation scheme to decom-

pose the results into four classes, numbered by Roman let-

ters and illustrated by the red and green line. b) same as a),

but for the RT.

along the genomic sequence. Additional potential ef-

fects are discussed in the discussion section 4.

In ﬁgure 9 we show our results for the pseudo-

spectral reconstruction of the inter-MI between the

PR and the RT.

4 DISCUSSION & SUMMARY

In this paper we have analyzed two of the most impor-

tant enzymes for the progression of viral infection by

the human immunodeﬁciency virus (HIV-1 protease

and the HIV-1 reverse transcriptase) in an information

theoretical setting to investigate evolutionary dynam-

ics and extract positions under exceptional selective

pressure.

A ﬁrst insight is possible by looking solely at the

sequence variability, which reveals selective pressure

to maintain local properties within the molecule - lo-

cal is meant here in the sense of an individual posi-

tion. Sites of enzymatic action are prone examples of

BIOINFORMATICS 2010 - International Conference on Bioinformatics

Figure 9: Absolute values of the entries of the 1

left-

and right-singular vector for a) the HIV-1 PR and b)+c)

for the HIV-1 RT. Again we rescaled all values so that

blue=maximum value, red=minimum value, and show the

non-analyzed parts of the RT in black.

such ﬁndings.

Nevertheless molecular evolution provides for an

extended selective pressure, which we label as non-

local as it involves several amino acids at the same

time. Despite individual amino acids being variable,

pairs of residues are connected or correlated. This is

revealed by the mutual information they carry.

We have shown that our sampling statistics is suf-

ﬁcient and a standard normalization procedure usu-

ally applied is not necessary in our case - due to large

sample size and absence of gaps in aligned sequences.

A particularly interesting result is the high se-

quences variability in the β-sheets of the PR, as shown

in ﬁgure 6 a). At the same time we ﬁnd these residues

also to be relevant for the high MI (parts b & c of the

same ﬁgure). This was recently discussed and anno-

tated in a biophysical simulation setting (Hamacher,

2008).

At the same time, we ﬁnd one residue (I54 in the

wild-type) in the ﬂaps to be highly variable and well

correlated to other parts of the PR, see the blue residue

in the upper strand of the β-sheet forming the ﬂaps in

ﬁgure 6.

Interestingly in ﬁgure 6 the dimerization interface

of the PR in the lower part of the molecule shows

over a larger range high sequence variability as well

as large contributions to the mutual information. This

indicates HIV-1’s ability to vary the composition of

the binding interface to dimerize the PR-monomers to

become the PR-homodimer. Obviously maintaining

recognition capabilities for binding is of paramount

importance for the virus, revealing itself in the high

MI.

The implications of relating sequence variabil-

ity and mutual information can be seen in ﬁgure 8.

An intuitive classiﬁcation scheme can be justiﬁed on

grounds of selective pressure induced by the ongoing

evolution of these molecular phenotypes and divided

in accordance with this classiﬁcation procedure.

In table 1 we extracted the most pronounced

residues under these classiﬁcation schemes - the ones

that correspond the most to the three existing classes.

To this end we have chosen visually those residues

most distant from the intersection of the red and green

lines in ﬁgure 8.

We ﬁnd in table 1 the L10 and M46 for the pro-

tease to be of class II. Correlated mutations in these

positions are known to reduce binding of well-known

protease inhibitors, such as JE-2147 by an order of

magnitude or even more (Yoshimura et al., 1999;

Reiling et al., 2002). This makes the acquisition of

mutations relatively easy: these amino acids are not

to be preserved, they only need to maintain “interac-

tions” or correlations, thus opening the path to change

the sequence locally in a correlated fashion to reduce

drug efﬁciency while maintaining the structure, func-

tion, and thus the infectious outcome of the protease.

In class IV we found some of the amino acids

building the ﬂaps of the PR (res. no. 52-58 are usu-

CO-EVOLUTION IN HIV ENZYMES

HIV-1 PR HIV-1 RT

I II III IV

63 41 49

10 69 78

71 57 29

12 60 56

none 20 70 28

7 61 27

90 16 86

82 39 52

46 67 98

54 92 51

I II III IV

334 102 349

335 49 348

329 108 347

333 106 344

none 339 249 346

338 48 343

324 165 345

322 100 342

326 90 341

311 4 152

Table 1: The most pronounced members of the classes as

introduced in ﬁgure 8 for both enzymes. For class I no

points exist that fulﬁll the particular requirement. The enu-

meration is in accordance with p66 monomer of the RT

dimer. We used the numbering convention of (Prajapati

et al., 2009; Saraﬁanos et al., 2004) for RT.

ally labeled to be part of the ﬂaps). As is known

from extensive simulations (Perryman et al., 2006)

the ﬂaps need to be most ﬂexible to embrace the sub-

strate of the PR. This - as indicated by our ﬁndings - is

achieved evolutionary to strictly conserve the overall

sequence composition of the ﬂaps.

For the RT we ﬁnd in table 1 the class III very in-

teresting: residues of the binding pocket for the non-

nucleoside inhibitors are to be found here. Class III

contains, however, those positions that vary a lot, but

do not show high correlation to other positions in the

molecule. This implies that the amino acids bind-

ing the inhibitor can more or less freely mutate, be-

cause they are not correlated to other positions and

thus there is not need for correlated mutations, which

turned out to be necessary for the resistance develop-

ment of the protease (see above).

We found some indications for a potential co-

evolution between the PR and the RT. We can think

of three reasons to this end:

The weak co-evolution between the proteins

might be – as speculated in the results section – in-

duced by implicit interactions of the coding genes

during packing of the viral RNA into the capsid. Ob-

viously charge distributions play a prominent role

during these events and that might correlate (slightly)

nucleotides and therefore also the coded amino acids.

It is, however, reasonable to assume this effect to be

distributed all over the proteins and not localized on

particular residues.

Another selective pressure on both proteins is

collectively induced by application of protease and

reverse transcriptase inhibitors at the same time or

in temporal proximity, as in e.g. HAART treatment

(Richman et al., 2009), combining both types of in-

hibitors, for example lopinavir, ritonavir, tenofovir

and emtricitabine. We note that we at least in the RT

structure no contribution from portions of the com-

plex that bind RT inhibitors can be observed, making

this explanation less likely.

And ﬁnally one cannot completely neglect the

possibility of functional protein-protein-interactions

between the RT and the PR. Although there are cur-

rently no indications to this effect and we doubt that

they exist, we mention this possibility for the sake of

completeness here.

ACKNOWLEDGEMENTS

KH gratefully acknowledges ﬁnancial support by the

Fonds der Chemischen Industrie through the program

Sachkostenzuschuß f

ur den Hochschullehrernach-

wuchs.

Molecular Structures were visualized by VMD

(Stone, 1998; Humphrey et al., 1996). VMD was de-

veloped by the Theoretical and Computational Bio-

physics Group in the Beckman Institute for Advanced

Science and Technology at the University of Illinois

at Urbana-Champaign.

REFERENCES

Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and

Lipman, D. J. (1990). Basic local alignment search

tool. J Mol Biol, (3):403–410.

Boba, P. and Hamacher, K. (2009).

http://bioserver.bio.tu-darmstadt.de/HIV.

Chen, L. and Lee, C. (2006). Distinguishing HIV-1 drug

resistance, accessory, and viral ﬁtness mutations using

conditional selection pressure analysis of treated ver-

sus untreated patient samples. Biology Direct, 1(1):14.

Chen, L., Perlina, A., and Lee, C. J. (2004). Positive Selec-

tion Detection in 40,000 Human Immunodeﬁciency

Virus (HIV) Type 1 Sequences Automatically Iden-

tiﬁes Drug Resistance and Positive Fitness Mutations

in HIV Protease and Reverse Transcriptase. J. Virol.,

78(7):3722–3732.

Finn, R. D., Tate, J., Mistry, J., Coggill, P. C., Sammut,

S. J., Hotz, H.-R., Ceric, G., Forslund, K., Eddy, S. R.,

Sonnhammer, E. L. L., and Bateman, A. (2008). The

Pfam protein families database. Nucl. Acids Res.,

36:D281–D288.

Gloor, G., Martin, L., Wahl, L., and Dunn, S. (2005). Mu-

tual information in protein multiple sequence align-

ments reveals two classes of coevolving positions.

Biochemistry, 44(19):7156–7165.

Hamacher, K. (2007). Information theoretical measures to

analyze trajectories in rational molecular design. J.

Comp. Chem., 28(16):2576–2580.

BIOINFORMATICS 2010 - International Conference on Bioinformatics

Hamacher, K. (2008). Relating sequence evolution of

HIV1-protease to its underlying molecular mechanics.

Gene, 422:30–36.

Hamacher, K. and McCammon, J. A. (2006). Computing

the amino acid speciﬁcity of ﬂuctuations in biomolec-

ular systems. J. Chem. Theory Comput., 2(3):873–

878.

Higgins, D. G. and Sharp, P. M. (1988). Clustal: a pack-

age for performing multiple sequence alignment on a

microcomputer. Gene, 73(1):237–244.

Humphrey, W., Dalke, A., and Schulten, K. (1996). VMD

– Visual Molecular Dynamics. Journal of Molecular

Graphics, 14:33–38.

Lund, O., Nielsen, M., Lundegaard, C., and Brunak, C.

K. S. (2005). Immunological Bioinformatics. MIT

Press, Cambridge.

Pan, C., Kim, J., Chen, L., Wang, Q., and Lee, C. (2007).

The hiv positive selection mutation database. Nuc.

Acids Res., 35:D371–D375(1).

Perelson, A. S., Neumann, A. U., Markowitz, M., Leonard,

J., and Ho, D. (1996). HIV-1 dynamics in vivo: virion

clearance rate, infected cell life-span, and viral gener-

ation time. Science, 271:1582–1586.

Perryman, A. L., Lin, J.-H., and McCammon, J. A. (2006).

Restrained molecular dynamics simulations of hiv-1

protease: The ﬁrst step in validating a new target for

drug design. Biopolymers, 82(3):272–284.

Prajapati, D. G., Ramajayam, R., Yadav, M. R., and Girid-

har, R. (2009). The search for potent, small molecule

nnrtis: A review. Bioorganic & Medicinal Chemistry,

17(16):5744–5762.

Reiling, K., Endres, N., Dauber, D., Craik, C., and Stroud,

R. (2002). Anisotropic dynamics of the JE-2147-

HIV protease complex: Drug resistance and thermo-

dynamic binding mode examined in a 1.09 a structure.

Biochemistry, 41:4582.

Richman, D., Margolis, D., Delaney, M., Greene, W. C.,

Hazuda, D., and Pomerantz, R. J. (2009). The chal-

lenge of ﬁnding a cure for HIV infection. Science,

323:1304–1307.

Rong, L., Gilchrist, M. A., Feng, Z., and Perelson, A. S.

(2007). Modeling within-host HIV-1 dynamics and

the evolution of drug resistance: Trade-offs between

viral enzyme function and drug susceptibility. J. Theo.

Biol., 247:804–818.

Saraﬁanos, S. G., Das, K., Hughes, S. H., and Arnold,

E. (2004). Taking aim at a moving target: design-

ing drugs to inhibit drug-resistant hiv-1 reverse tran-

scriptases. Current Opinion in Structural Biology,

14(6):716–30.

Schreck, T., Bremm, S., Held, S., and Hamacher, K. (2009).

to be published.

Shannon, C. E. (1951). Prediction and entropy of printed

english. The Bell System Technical Journal, 30:50–

64.

Stone, J. (1998). An Efﬁcient Library for Parallel Ray Trac-

ing and Animation. Master’s thesis, Computer Science

Department, University of Missouri-Rolla.

Thompson, J., Higgins, D., and Gibson, T. (1994).

CLUSTAL W: improving the sensitivity of progres-

sive multiple sequence alignment through sequence

weighting, position-speciﬁc gap penalties and weight

matrix choice. Nucleic Acids Res., 22:4673–4680.

Trylska, J., Tozzini, V., Chang, C., and McCammon, J. A.

(2007). HIV-1 protease substrate binding and product

release pathways explored with coarse-grained molec-

ular dynamics. Biophys. J., 92:4179–4187.

Tsygankov, A. Y. (2009). Current developments in anti-

HIV/AIDS gene therapy. Curr Opin Investig Drugs,

10(2):137–149.

W.H. Press et al (1995). Numerical Recipies in C. Cam-

bridge University Press, Cambridge.

Wlodawer, A. and Erickson, J. (1993). Structure-based

inhibitors of HIV-1 protease. Annu. Rev. Biochem.,

62(1):543–585.

Yoshimura, K., Kato, R., Yusa, K., Kavlick, M. F., Maroun,

V.and Nguyen, A., Mimoto, T., Ueno, T., Shintani, M.,

Falloon, J., Masur, H., Hayashi, H., Erickson, J., and

Mitsuya, H. (1999). JE-2147: A dipeptide protease

inhibitor (PI) that potently inhibits multi-PI-resistant

HIV-1. Proc. Natl. Acad. Sci., 96:8675–8680.

CO-EVOLUTION IN HIV ENZYMES