3 METHOD
Given a chimera and its parent domains, we
calculate the MIR in their sequences and determine
if fusion significantly changes the interactions in the
fused domains. A large discrepancy in the
distribution of the MIRs in the parent domains and
the fused protein may allow us to conclude the
absence of a correct fold. We also compute a model
from the sequences and superpose the parent
domains onto the chimera.
In the simplest fusion protein, a sequence is
directly appended to another sequence so as to
produce a larger protein containing both sequences.
This organization holds for engineered chimeras, but
chimeric proteins also form naturally (e.g.
translocation). In Figure 1, we showed the more
general case where a spacer (or ligation scar) exists
between the two fused domains. While folding, a
spacer orientates and distances the two fused
domains to better allow their independent folding.
Our dataset is comprised of two groups of
sequences: a) products of chimeras known to fold
with conservation of folding of the individual parent
domains, b) chimeric products of oncogenes, thus
known to fold incorrectly. Proteins were selected
using the following criteria: 1) The atomic
coordinates must be determined for all residues. 2)
Relatively short. 3) Minimal spacer. We assume that
the sequence is cDNA. See Table 1. We retrofit the
chimeric protein sequence by splitting it into its
parent sequences using BLAST. We assume that
each chimeric protein is the result of appending
precisely two parent domains. In order for the whole
chimeric protein to fold correctly, it would be
required that any spacer did not interfere with the
attached protein. Consider the component protein
and spacer as a whole to be a protein; we then have
two components to fuse which fits our methodology.
The primary structures of the target proteins
were used to produce MIR predictions. For our
computations, we used an implmentation called MIR
2.2beta (Papandreou, et al., 2004). QUARK was
selected as our ab initio modeler based on its
performance in CASP9 (Protein Structure Prediction
Center, 2010), while I-TASSER was selected for its
association with QUARK. Phyre2 was selected for
its accuracy among fold recognition tools. We
expect that the percentage of the components which
superpose with the chimeric proteins would be much
greater in the chimeric proteins which are known to
fold correctly. Superposition was performed with
GANGSTA+.
4 RESULTS
For the MIR prediction, we first used a threshold of
seven interactions (Papandreou, et al., 2004) to
locate MIR. We list the positions along the sequence
where a MIR differs when comparing the
computations for an individual component to the
entire fused protein. Figures 2 and 3 show these
results for two extreme cases, the most divergent and
the most alike. The results of the structural
alignments are shown in Tables 2 and 3. We define
maximum alignment to be the length of the
component sequence divided by the length of the
chimeric sequence. The superposition column
indicates the portion of the component model that
can be superposed onto the chimera. For each
alignment, we also give the RMSD produced by
GANGSTA+ (Guerler and Knapp, 2008). In three
cases, GANGSTA+ could not calculate a result due
to a lack of secondary structure. In another, a model
could not be computed to use with GANGSTA+,
because CBL is peptide rather than a protein. When
more than one model was produced, we picked the
model with the highest reported confidence (Xu and
Zhang, unpublished; Roy, Kucukural and Zhang,
2010).
Figure 2: Changes in MIR distribution for GST-EGFP.
Figure 3: Changes in MIRs for BRD4-NUT. Only 12
residues on either side of the point of fusion are shown.
5 DISCUSSION
An analysis of the MIR data would ideally show
similar MIRs. A change in MIRs might indicate a
disruption during folding. In general, the MIR
results are noisy due to the Monte Carlo algorithm.
BIOINFORMATICS 2012 - International Conference on Bioinformatics Models, Methods and Algorithms
236