function relationship is elucidated with this
collection of mutants by comparing their residual
scores (measures of relative changes to sequence-
structure compatibility) with their relative activity
changes (measures of relative functional changes).
More generally, residual scores are also useful for
naturally clustering IL-3 amino acid positions based
on their polarity, as well as for distinguishing
residue groups based on their functional or structural
roles. The experimental IL-3 mutants are
subsequently represented as feature vectors, with
input attributes that include the residual score,
ordered perturbation scores for the six structurally
closest positions in the local neighborhood of the
mutated position, and additional components based
on sequence and structure, as well as an activity
category (unaffected / affected) output attribute. This
collection of feature vectors is used to train a
random forest classifier, which displays up to 80%
accuracy for mutant IL-3 activity prediction and
outperforms other well-known methods. To assist
researchers in prioritizing future IL-3 mutagenesis
experiments, activity predictions based on the
trained model are provided for all 1498 unexplored
single residue IL-3 mutants.
REFERENCES
Bagley, C. J., Phillips, J., Cambareri, B., Vadas, M. A. and
Lopez, A. F. (1996) J Biol Chem, 271, 31922-31928.
Baldi, P., Brunak, S., Chauvin, Y., Andersen, C. A. and
Nielsen, H. (2000) Bioinformatics, 16, 412-424.
Barber, C. B., Dobkin, D. P. and Huhdanpaa, H. T. (1996)
ACM Trans Math Software, 22, 469-483.
Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G.,
Bhat, T. N., Weissig, H., Shindyalov, I. N. and
Bourne, P. E. (2000) Nucleic Acids Res, 28, 235-242.
Bordner, A. J. (2008) Bioinformatics, 24, 2865-2871.
Bowie, J. U., Luthy, R. and Eisenberg, D. (1991) Science,
253, 164-170.
Breiman, L. (2001) Machine Learning, 45, 5-32.
Bromberg, Y. and Rost, B. (2007) Nucleic Acids Res, 35,
3823-3835.
Capriotti, E., Calabrese, R. and Casadio, R. (2006)
Bioinformatics, 22, 2729-2734.
Carter, C. W., Jr., LeFebvre, B. C., Cammer, S. A.,
Tropsha, A. and Edgell, M. H. (2001) J Mol Biol, 311,
625-38.
Dayhoff, M. O., Schwartz, R. M. and Orcut, B. C. (1978)
In Atlas of Protein Sequence and Structure, Vol. 5
(Ed, Dayhoff, M. O.) National Biomedical Research
Foundation, Washington D.C., pp. 345-352.
de Berg, M., Cheong, O., van Kreveld, M. and Overmars,
M. (2008) Computational Geometry: Algorithms and
Applications, Springer-Verlag, Berlin.
Fawcett, T. (2003) In Technical Report HPL-2003-4.
Hewlett-Packard Labs, Palo Alto.
Feng, Y., Klein, B. K. and McWherter, C. A. (1996) J Mol
Biol, 259, 524-541.
Frank, E., Hall, M., Trigg, L., Holmes, G. and Witten, I.
H. (2004) Bioinformatics, 20, 2479-2481.
Klein, B. K., Feng, Y., McWherter, C. A., Hood, W. F.,
Paik, K. and McKearn, J. P. (1997) J Biol Chem, 272,
22630-22641.
Kyte, J. and Doolittle, R. F. (1982) J Mol Biol, 157, 105-
132.
Ng, P. C. and Henikoff, S. (2006) Annu Rev Genomics
Hum Genet, 7, 61-80.
Olins, P. O., Bauer, S. C., Braford-Goldberg, S., Sterbenz,
K., Polazzi, J. O., Caparon, M. H., Klein, B. K.,
Easton, A. M., Paik, K., Klover, J. A. and et al. (1995)
J Biol Chem, 270, 23754-23760.
Pettersen, E. F., Goddard, T. D., Huang, C. C., Couch, G.
S., Greenblatt, D. M., Meng, E. C. and Ferrin, T. E.
(2004) J Comput Chem, 25, 1605-1612.
Qi, Y., Bar-Joseph, Z. and Klein-Seetharaman, J. (2006)
Proteins, 63, 490-500.
Sippl, M. J. (1993) J Comput Aided Mol Des, 7, 473-501.
Wang, G. and Dunbrack, R. L., Jr. (2003) Bioinformatics,
19, 1589-1591.
Zhang, S., Kaplan, A. H. and Tropsha, A. (2008) Proteins,
73, 742-53.
MODELING CELL PROLIFERATION ACTIVITY OF HUMAN INTERLEUKIN-3 UPON SINGLE RESIDUE
REPLACEMENTS
101