BRIDGING THE GAP BETWEEN DESIGN AND REALITY
A Dual Evolutionary Strategy for the Design of Synthetic Genetic Circuits
J. S. Hallinan, S. Park and A. Wipat
School of Computing Science, Newcastle University, NE7 4RU, Newcastle upon Tyne, U.K.
Keywords: Synthetic biology, Evolutionary computation, Directed evolution, Genome-scale design.
Abstract: Computational design is essential to the field of synthetic biology, particularly as its practitioners become
more ambitious, and system designs become larger and more complex. However, computational models
derived from abstract designs are unlikely to behave in the same way as organisms engineered from those
same designs. We propose an automated, iterative strategy involving evolution both in silico and in vivo,
with feedback between strands as necessary, combined with automated reasoning. This system can help
bridge the gap between the behaviour of computational models and that of engineered organisms in as rapid
and cost-effective a manner as possible.
1 INTRODUCTION
The nascent field of synthetic biology aims to
produce engineered organisms with novel, desirable
behaviour. To date, synthetic genetic circuits have
primarily been designed manually, by a domain
expert with an in-depth knowledge of the biological
system of interest. This approach has been
moderately successful; bacteria, and even plants,
have been engineered to perform tasks as diverse as
the detection of arsenic in well water, the
identification of explosive residues in soil, and the
performance of a range of computational tasks such
as the operation of logic gates and mathematical
functions (Khalil and Collins, 2010).
However, the ultimate aim of synthetic biology is
the large-scale engineering of entire genomes.
Important strides in this direction have been made
(Cello et al., 2002); (Smith et al. 2003); (Tumpy et
al., 2005). In 2010 Gibson and colleagues
announced the synthesis of a completely synthetic
genome, and its insertion into a living bacterium
which had previously been denuded of its genome
(Gibson et al., 2010). However, all of the work done
in this area to date has focussed upon the re-creation,
with slight modifications, of existing genomes. To
date the design of entire genomes with appreciable
novel functionality has not been achieved.
It is becoming increasingly apparent that the
design of novel, genome-scale biological systems
will require computer-aided design (CAD) and
computational simulation prior to implementation
(Cohen 2008). Several CAD systems (Chandran et
al., 2009); (Czar et al., 2009); (Pedersen, 2009);
(Beal et al., 2011), including a data and workflow
management system (www.clothocad.org) have been
designed specifically for synthetic biology. In
addition, a synthetic biology-specific ontology,
SBOL, (http://hdl.handle.net/1721.1/66172) is under
active development.
However, manually-oriented CAD systems will
almost certainly not scale to the genome level. In
order to design large-scale synthetic biological
systems the complex process of genetic circuit
design, implementation, evaluation, modification
and iterative refinement will have to be automated as
fully as possible.
A design for a synthetic genetic circuit is usually
initially in the form of a conceptual diagram, which
can be converted into a simulateable model in a
standard modelling language. However, converting
such a model into a DNA sequence which can be
inserted into a living organism is not so
straightforward; there is a gap between design and
successful implementation, which must be
addressed.
Natural systems have arisen via the process of
evolution, and there has been considerable interest in
the application of evolutionary approaches to the
design of novel genetic circuits. In this paper we
briefly review the application of both computational
263
S. Hallinan J., Park S. and Wipat A..
BRIDGING THE GAP BETWEEN DESIGN AND REALITY - A Dual Evolutionary Strategy for the Design of Synthetic Genetic Circuits.
DOI: 10.5220/0003887002630268
In Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms (BIOINFORMATICS-2012), pages 263-268
ISBN: 978-989-8425-90-4
Copyright
c
2012 SCITEPRESS (Science and Technology Publications, Lda.)
and directed evolution to the design of biological
systems, and present our vision of a dual
evolutionary strategy to bridge the gap between in
silico design and in vivo reality.
2 BACKGROUND
2.1 Evolutionary Computation for
Genetic Circuit Design
The manual design of circuits has the advantage of
producing simple, well-understood circuit layouts.
However, this approach relies heavily upon domain
expertise; a biologist with extensive knowledge of
the circuit to be engineered, and any extraneous
components to be incorporated, is essential. An
alternative approach is to incorporate techniques
inspired by the only process yet known to have
successfully produced life – evolution.
Evolutionary computation (EC) has been around
almost since computers became a consumer item
(Box 1957). Based on biological evolution, EC
attempts to use random changes in a problem
solution, together with a fitness function and fitness-
proportional selection, to generate solutions to
complex problems. EC is therefore ideally suited to
problems in complex, poorly-understood domains,
where a good, but not necessarily optimal, solution
is essential, but the precise nature of the solution is
not. There are many variants of EC (Hallinan and
Wiles, 2002)but the basic principles are common to
all.
EC has been applied to metabolic engineering,
for tasks such as identifying the appropriate genes to
knock out in order to maximize the production of
biochemicals (Patil et al., 2005) and to optimize
parameters for allosteric regulation of enzymes
(Gilman and Ross, 1995). Some of the results have
been interestingly counter-intuitive (Patil et al.,
2005).
The applicability of EC to the design of genetic
circuits is clear. Multiple runs of an algorithm will
produce different, equally fit, solutions which can be
compared for efficiency, cost and practicality of
implementation, among other factors. Since the
detailed workings of many genetic circuits are
poorly-understood, EC is a promising approach to
the generation of new circuit designs.
2.2 Directed Evolution in vivo
The relationship between a DNA sequence and the
structure and function of the protein it encodes is
indirect. Many factors affect the relationship,
including post-transcriptional and –translational
modifications to DNA, RNA and proteins; the
presence or absence of protein chaperones; protein
folding; and the cellular context. It is therefore non-
trivial to design a protein with a required
functionality, such as a transcription factor with a
given binding strength. An extremely successful way
to overcome this problem is to extend, or completely
replace, the rational design approach with directed
evolution (Romero and Arnold, 2008).
Directed evolution involves the application, to a
population of cells, of iterative rounds of mutation
and artificial selection. With each round of selection
the desired behaviour is more closely approximated,
and the process can be ended when the protein
function is deemed to be close enough to the target
behaviour. Directed evolution has been shown
repeatedly to be both powerful and flexible in its
outcomes (Aharoni et al., 2005).
There are two ways in which directed evolution
is generally used. In the biotechnology industry the
output of a particular biological pathway is often of
primary interest; companies need to optimise the
production of a specific compound (Lee et al.,
2008). In this case directed evolution has the effect
of optimising entire pathways. Alternatively, the
evolutionary process can be aimed at manipulating
individual proteins, developing, for example,
specific enzymes (Brustad and Arnold, 2011).
Originally, much of the work in this area was
performed in large-scale chemostats. The use of
smaller volumes then made it possible to automate
much of the directed evolution process using liquid-
handling robots (Felton, 2003). Such robots,
however, still work with relatively large numbers of
cells at a time. At this scale the stochasticity inherent
in biological systems is averaged out when
measurements are made over whole populations of
cells, prohibiting analysis of the behaviour of single
cells.
Recently, however, there has been considerable
interest in the use of microfluidic technologies in
synthetic biology (Gulati et al., 2009); (Szita et al.,
2010); (Ferry et al., 2011); (Vinuselvi et al., 2011).
Operating at micrometre scales, microfluidic devices
allow the manipulation and observation of single
cells or small groups of cells. Biological
stochasticity can thus be explored in detail.
Importantly, microfluidics devices can be fully
automated, with tasks such as the input of fresh
media, removal of waste, selection of individual
cells and control of cellular environment completely
controlled by an attached computer. Microfluidics
BIOINFORMATICS 2012 - International Conference on Bioinformatics Models, Methods and Algorithms
264
provides an ideal environment for the directed
evolution of cells for synthetic biology.
3 A DUAL EVOLUTIONARY
STRATEGY FOR SYNTHETIC
BIOLOGY
3.1 Design in Synthetic Biology
Synthetic genetic circuits tend to be designed in
isolation, and often incorporate a number of
simplifying assumptions. However, a designed
circuit in vivo is operating in a complex genetic and
environmental context, and an engineered microbe
may not behave in the same way as the in silico
model upon which it is based.
The creations of synthetic biologists must
operate predictably in a complex, noisy environment
in which they are subject to global selection
pressures as yet poorly understood. However, a
number of important issues have been identified.
Sprinzak and Elowitz (Sprinzak and Elowitz, 2005)
nominate “parameter sensitivity, the lack of effective
rules to simplify complex circuits, and the difficulty
of incorporating extrinsic noise". To this list we add:
nonlinearity; crosstalk; scalability; evolvability; and
genetic context.
Rather than attempting to eliminate this noise
and complexity, we believe that it should be possible
to harness the incredibly powerful forces that have
shaped life on this planet for the past 3.8 billion
years for the controlled design of organisms with
novel, valuable behaviours.
3.2 A Dual Evolutionary Strategy
Our proposed dual evolutionary strategy involves in
silico and in vivo experimentation and evolution
carried out in serial, with results from each strand
feeding back to the other strand as required.
The process starts, as does all formal engineering
design, with requirements gathering, leading to a
formal functional specification of the desired
system. Data from a variety of sources are then
integrated to inform automated reasoning, leading to
an initial design for the system.
The field of data integration is increasingly being
recognised as important to bioinformatics, whose
practitioners routinely deal with the large datasets
produced by high-throughput technologies such as
microarrays and proteomics. Although much of this
data is freely available in the over 1300 online
databases currently available (Galperin and
Cochrane, 2011), the sheer scale of data generation
means that much of this data does not make it into
the literature. It is not feasible to manually trawl
databases for more than a small number of genes.
Data, and thus information, can effectively be lost to
the research community.
Tools such as the Ondex data integration
platform (Kohler et al., 2006) rescue this data by
bringing it together in a common format and
integrating diverse datasets into a single resource,
which can be viewed as a network, or accessed and
manipulated computationally. Ondex incorporates an
underlying ontology, so individual concepts, which
can be of any type (gene, protein, publication,
protein family, etc.) and their interactions are
annotated in a structured manner. Ondex graphs are
therefore well suited to the application of automated
reasoning algorithms, which have already been
applied to good effect in bioinformatics (King et al.,
2004). An Ondex knowledgebase has recently been
produced for the model Gram positive bacterium
Bacillus subtilis (Misirli et al., 2011).
The initial reasoning / design process is iterative,
as individual designs are scrutinised for genetic
components and their desired interactions, which are
then reasoned over to predict the systems behaviour
and to suggest modifications to the design. Once the
initial design is determined it is translated into a
computational model in a standard modelling
language such as SBML (Hucka et al., 2003) or
CellML (Cooling et al.m 2008).
Simulation of the model and experiments on the
engineered microbe are conducted in serial. Intially,
the model is run, to determine whether it behaves as
predicted. Simulation modelling can also establish
factors such as sensitivity to variations in parameter
values, and to determine which model elements are
most important to the generation of the desired
behaviour, observations which can be used to guide
measurements made on the in vivo system. Models
may be run multiple times using stochastic
algorithms to investigate the range of behaviours
possible from a single design (Hallinan et al., 2010).
If the behaviour generated by the model is not
sufficiently close to the target, an evolutionary
algorithm is used to modify the circuit until is
behaves as desired. The modified model is analyzed
in the same way as the original model.
Once the modelling results are satisfactory the
design is converted into a synthesizable DNA
sequence using a tool such as MoSeC (Misirli et al.,
2011), an approach which preserves the automated
nature of the process. Alternatively, if standard
BRIDGING THE GAP BETWEEN DESIGN AND REALITY - A Dual Evolutionary Strategy for the Design of Synthetic
Genetic Circuits
265
cloning approaches are too be used, the design may
be manually translated into a set of DNA
components and the strategy with which to
manipulate them.
The in vivo evolutionary cycle is then executed,
with laboratory experimentation replacing model
simulation, measurements made as indicated by the
results of the modelling, and directed evolution
replacing EC.
The end result of the in silico and in vivo
experimentation is the amassing of large amounts of
new data about the construct and its behaviour. This
data is added to that in the original integrated dataset
to form the basis for further computational
reasoning.
Computational evolution of the model will
produce multiple, variant designs for genetic circuits
with the same functionality, many of which will
never have existed in nature. Similarly, directed
evolution of the engineered microbes will almost
certainly produce a number of organisms which have
behaviour closer to that desired. Sequencing of their
genomes post-evolution will permit comparison with
the original sequence, and thus facilitate the
generation of testable hypotheses about the
significance of any mutations observed.
All of the data generated by experiments on both
the original design and the evolved variants then
feed back into the reasoning / design loop, and the
process can continue iteratively until an organism is
achieved with behaviour close enough to the target.
Figure 1: A dual evolutionary strategy for synthetic
biology.
4 DISCUSSION
Currently most design in synthetic biology is done
on a small scale, in close consultation with a domain
expert. Although it may never be possible to move
completely away from specialised expertise, the
design of large scale genetic systems will clearly
require a high level of automation, including
automated reasoning over large amounts of data.
Synthetic biology builds upon molecular and
systems biology (Church 2005), but has a different
aim from either of those disciplines: to engineer
entirely novel biological systems, performing tasks
which are not within the scope of existing
organisms. In order to achieve these aims we
contend that large scale systems must be engineered;
such systems will be of of a size and complexity of
which the human brain cannot maintain a complete
overview.
Large scale synthetic biology therefore requires
sophisticated computation and extensive automation.
The algorithms and hardware required to achieve
this task are rapidly becoming available. New
technologies promise to extend the capabilities of
laboratories in many different directions. DNA
synthesis technology is increasing in speed, while
decreasing in cost (details). Cloud and Grid
computing make available enormous amounts of
CPU time cheaply (Craddock et al., 2008), and,
because these technologies are highly parallel,
quickly. Microfluidics provides an exciting, albeit
challenging, approach to the manipulation and
measurement of cells, either wild type or engineered,
in very small numbers.
The development of these technologies permits
approaches such as directed evolution at the single-
cell level in time scales which are not very different
from those required to run multiple computational
simulations. Computational and in vivo experiments
provide different, but overlapping, windows onto the
biology of synthetic genetic systems. We therefore
propose a bipartite strategy for engineering synthetic
genetic circuits, involving both in silico and in vivo
experiments.
One important component of our approach is
computational reasoning. The amount of data which
can be collected from a single experiment is vast. At
present, it is usually the task of the human
experimenter to decide which parameters should be
measured, and how those measurements should be
used in the development of new experiments.
Automated computational reasoning has been
applied with success to the generation of new
testable hypotheses, and appropriate experimental
protocols, as in the case of the Robot Scientist (King
et al., 2009). There is clearly considerable scope for
the application of this approach to automated
decisions about which aspects of an experiment to
BIOINFORMATICS 2012 - International Conference on Bioinformatics Models, Methods and Algorithms
266
measure, and which experiments to conduct, in the
field of synthetic biology.
The other fundamental aspect of our approach is
the use of evolution to refine the designs arrived at
by humans or machines. The uncertainty inherent in
biological systems—whether arising from inherent
stochasticity or our lack of knowledge about the
structure and function of many biomolecules—
means that a completely rational design strategy in
synthetic biology, as espoused by hard-core
engineers, is simply not practical at this point in
time. By harnessing evolution to refine our design,
and then comparing the products of evolution with
our original designs, we have the potential to learn
not only how to better engineer the organisms in
which we are interested, but also how these
organisms work in the absence of engineering.
Molecular and systems biology form the basis for
synthetic biology; but synthetic biology also
promises to provide unique insights into the
fundamental workings of the cell.
A highly automated approach, incorporating
computational intelligence wherever possible, and
operating at the level of one or a few cells, appears
to us to offer the best prospects for designing,
implementing and testing large-scale novel genetic
systems, thus bridging the gap between design and
reality in synthetic biology. Although there are still
many technical hurdles to be overcome in the
construction of such a system, all of the individual
technologies are currently in place, and the
construction of such a synthetic biology factory is a
realistic goal in the near future.
REFERENCES
Aharoni, A., L. Gaiducov, et al. (2005). "The 'evolvability'
of promiscuous protein functions." Nature Genetics
37(1): 73 - 76.
Beal, J., T. Lu, et al. (2011). "Automatic compilation from
high-level biologically-oriented programming
language to genetic regulatory networks." PLoS ONE
6(8): e22490.
Box, G. E. P. (1957). "Evolutionary operation: A method
for increasing industrial productivity." Applied
Statistics 6(2): 81 - 101.
Brustad, E. M. and F. H. Arnold (2011). "Optimizing non-
natural protein function with directed evolution."
Current Opinion in Chemical Biology 15(2): 201 -
210.
Cello, J., A. V. Paul, et al. (2002). "Chemical synthesis of
poliovirus cDNA: Generation of infectious virus in the
absence of natural template." Science 297(5583): 1016
- 1018.
Chandran, D., F. T. Bergmann, et al. (2009) "TinkerCell:
modular CAD tool for synthetic biology." Journal of
Biological Engineering 3 DOI: doi:10.1186/1754-
1611-3-19.
Church, G. M. (2005). "From systems biology to synthetic
biology." Molecular Systems Biology 1: 0032.
Cohen, J. (2008). "The crucial role of CS in systems and
synthetic biology." Transactions of the ACM 51(5): 15
- 18.
Cooling, M. T., P. Hunter, et al. (2008). "Modelling
biological modularity with CellML." Systems Biology,
IET 2(2): 73-79.
Craddock, T., C. R. Harwood, et al. (2008). "e-Science:
Relieving bottlenecks in large-scale genomic
analyses." Nature Reviews Microbiology 6: 948 - 954.
Czar, M., Y. Cai, et al. (2009). "Writing DNA with
GenoCAD." Nucleic Acids Research 37(W40-7).
Felton, M. J. (2003). "Product review: Liquid handling:
Dispensing reliability." Analytical Chemistry 75(17):
397A - 399A.
Ferry, M. S., I. A. Razinkov, et al. (2011). "Microfluidics
for synthetic biology: From design to execution."
Methods in Enzymology 497: 295 - 372.
Galperin, M. Y. and G. R. Cochrane (2011). "The 2011
Nucleic Acids Research database issue and the online
molecular biology database collection." Nucleic Acids
Research 39(Suppl 1).
Gibson, D. G., J. I. Glass, et al. (2010). "Creation of a
bacterial cell controlled by a chemically synthesized
genome." Science 329(5987): 52 - 56.
Gilman, A. and J. Ross (1995). "Genetic-algorithm
selection of a regulatory structure that directs flux in a
simple metabolic model." Biophysical Journal 69(4):
1321 - 1333.
Gulati, S., V. Rouilly, et al. (2009). "Opportunities for
microfluidic technologies in synthetic biology."
Journal of the Royal Society Interface 6(Suppl_4):
S493-S506.
Hallinan, J., G. Misirli, et al. (2010). Evolutionary
computation for the design of a stochastic switch for
synthetic genetic circuits. 32nd Annual International
Conference of the IEEE Engineering in Medicine and
Biology Society (EMBC 2010), Buenos Aires,
Argentina.
Hallinan, J. and J. Wiles (2002). Evolutionary algorithms.
The Encyclopedia of Cognitive Sciences. L. Nadel.
New York, Palgrave Macmillan.
Hucka, M., A. Finney, et al. (2003). "The systems biology
markup language (SBML): A medium for
representation and exchange of biochemical network
models." Bioinformatics 19: 524-531.
Khalil, A. S. and J. J. Collins (2010). "Synthetic biology:
Applications come of age." Nature Reviews Genetics
11(5): 367 - 379.
King, R. D., J. Rowland, et al. (2009). "The Automation of
Science." Science 324(5923): 85-89.
King, R. D., K. E. Whelan, et al. (2004). "Functional
genomic hypothesis generation and experimentation
by a robot scientist." Nature 427(6971): 247-252.
BRIDGING THE GAP BETWEEN DESIGN AND REALITY - A Dual Evolutionary Strategy for the Design of Synthetic
Genetic Circuits
267
Kohler, J., J. Baumbach, et al. (2006). "Graph-based
analysis and visualization of experimental results with
ONDEX." Bioinformatics 22(11): 1383-1390.
Lee, S. K., H. Chou, et al. (2008). "Metabolic engineering
of microorganisms for biofuels production: From bugs
to synthetic biology to fuels." Current Opinion in
Biotechnology 19(6): 556-563.
Misirli, G., J. Hallinan, et al. (2011). CS-TR-1237.
Technical Reports, Newcastle University.
Misirli, G., J. S. Hallinan, et al. (2011). "Model annotation
for synthetic biology: automating model to nucleotide
sequence conversion." Bioinformatics 27(7): 973-979.
Patil, K., I. Rocha, et al. (2005). “Evolutionary
programming as a platform for in silico metabolic
engineering.” BMC Bioinformatics 6(1): 308.
Pedersen, M. P., A.; (2009). "Towards programming
languages for genetic engineering of living cells."
Journal of the Royal Society Interface.
Romero, P. A. and F. H. Arnold (2008). "Exploring
protein fitness landscapes by directed evolution."
Nature Reviews Molecular Cell Biology 10: 866 - 876.
Smith, H. O., C. A. Hutchison, et al. (2003). "Generating a
synthetic genome by whole genome assembly:
phiX174 bacteriophage from synthetic
oligonucleotides." Proceedings of the National
Academy of Sciences 100(26): 15440 - 15445.
Sprinzak, D. and M. Elowitz (2005). "Reconstruction of
genetic circuits." Nature 438: 443 - 448.
Szita, N., K. Polizzi, et al. (2010). "Microfluidic
approaches for systems and synthetic biology."
Current Opinion in Biotechnology 21(4): 517-523.
Tumpy, T. M., C. F. Basler, et al. (2005).
"Characterisation of the reconstructed 1918 Spanish
Influenza pandemic virus." Science 310(5745): 77 -
80.
Vinuselvi, P., S. Park, et al. (2011). "Microfluidic
technologies for synthetic biology." International
Journal of Molecular Sciences 12(3576 - 3593).
BIOINFORMATICS 2012 - International Conference on Bioinformatics Models, Methods and Algorithms
268