the two algorithms can be seen in Figures 2(a) and
2(b). In Figure 2(a) we show the RMSD values of our
resampled structures for each algorithm. All of our
GA resampled structures have a lower RMSD value
than our MC resampled structures. In Figure 2(b) our
GA approach has 13 structures with better TM-Score
values, and two that are on par with our MC approach.
The lower the RMSD the closer the structure is to the
native-conformation, and likewise the higher the TM-
Score value is the closer the structure is to the native.
This means that our GA approach, on average, is cre-
ating structures that are closer to the native conforma-
tion. The main reason for this is that it has a larger
feature-space to work with due to the GAs ability to
contain a library of low energy features in its popula-
tion, and by using crossover and mutation operators a
lot more of the conformational landscape can be ex-
plored. In contrast our MC algorithm only uses one
Rosetta decoy as a starting point, and therefore it has
less features at its disposal. This means it is highly
unlikely to find structures that are < 15
˚
A to the native
conformation.
Based on the above analysis we have shown that
using an evolutionary approach can give better re-
sults than other popular algorithms like the Monte
Carlo (MC) method. It also indicated that combining
feature-based resampling with Genetic Algorithms
can create structures with more native-like features,
which is supported by the lower RMSD values we ob-
tained when compared to our MC approach. Due to
this we can infer that more correct features are being
added to the search space, and thus guiding our search
to more accurate structures.
5 CONCLUSIONS
Critical Assessment of Techniques for Protein Struc-
ture Prediction (CASP) is a good way to accurately
indicate how far we have come in solving the protein
prediction problem. In this paper by randomly select-
ing 17 CASP 8 sequences we have demonstrated the
capabilities of our GA feature-resampling approach.
We have also compared it to Rosetta, a state-of-the-
art PSP suite, and another MC algorithm which we
developed to demonstrate the potential evolutionary
algorithms have over other non-deterministic search
algorithms.
Both algorithms were run on a set of 17 randomly
chosen sequences, which were used in the CASP 8
experiment. Our results showed that our GA per-
formed well overall, obtaining good improvements in
both RMSD and TM-Score. This indicated that most
of the overall topology of the protein was forming
throughout our GA search. We have also shown that
evolutionary algorithms have the potential to be more
successful than other non-deterministic search algo-
rithms like MC approaches. Our GA performed very
similar in resampling Rosetta starting points as our
MC approach, however due to having a larger feature-
space our GA approach produced more accurate pre-
dictions than our MC method.
In regards to future work it would be interesting
to look at modelling energy preferences in the fitness
function to enforce a bias on certain features or ar-
rangement of features that are observed in native con-
formations. This could increase the accuracy of our
search, and hence produce better predictions.
REFERENCES
Arunachalam, J., Kanagasabai, V., and Gautham, N. (2006).
Protein structure prediction using mutually orthogonal
latin squares and a genetic algorithm. Biochemical
and Biophysical Research Communications, 342:424–
433.
Baker, D. (2006). Prediction and design of macromolecular
structures and interactions. Philosphical Transactions
of the Royal Society B, 361:459–463.
Blum, B. (2008). Resampling Methods for Protein Structure
Prediction. PhD thesis, Electrical Engineering and
Computer Sciences University of California at Berke-
ley.
Bornberg-Bauer, E. (1997). Chain growth algorithms for
HP-type lattice proteins. In Research in Computa-
tional Molecular Biology RECOMB, pages 47–55.
Bradley, P., Chivian, D., Meiler, J., Misura, K., Rohl,
C., Schief, W., Wedemeyer, W., Scueler-Furman, O.,
Murphy, P., Schonbrun, J., Strauss, C., and Baker, D.
(2003). Rosetta predictions in CASP5: Success, fail-
ure, and prospects for complete automation. PRO-
TEINS: Structure, Function, and Genetics, 53:457–
468.
Brunette, T. and Brock, O. (2005). Improving protein struc-
ture prediction with model-based search. Bioinformat-
ics, 21 (Suppl. 1):i66–i74.
Higgs, T., Stantic, B., Hoque, T., and Sattar, A. (2010).
Genetic algorithm feature-based resampling for pro-
tein structure prediction. In IEEE World Congress on
Computational Intelligence, pages 2665–2672.
Hoque, T., Chetty, M., and Sattar, A. (2007). Protein folding
prediction in 3D FCC HP lattice model using genetic
algorithm. In IEEE Congress on Evolutionary Com-
putation, pages 4138–4145.
Hoque, T., Chetty, M., and Sattar, A. (2009). Extended
HP model for protein structure prediction. Journal of
Computational Biology, 16:85–103.
Jiang, T., Cui, Q., Shi, G., and Ma, S. (2003). Protein fold-
ing simulations of hydrophobic-hydrophilic model by
combining tabu search with genetic algorithms. Jour-
nal of Chemical Physics, 119(8):4592–4596.
BENEFITS OF GENETIC ALGORITHM FEATURE-BASED RESAMPLING FOR PROTEIN STRUCTURE
PREDICTION
193