maintaining the gaplessness of the ancestral networks.
Thus, the algorithm can be used both to elucidate net-
work structures in ancestral nodes and to fill in gaps
in draft metabolic networks in a evolutionarily plausi-
ble manner.
We argue that such approach, where the recon-
struction networks are required to be gapless, im-
proves prediction performance over the unconstrained
case. This work extends the method of (Pitk
¨
anen
et al., 2008), where gapless reconstructions of in-
dividual metabolic networks were inferred from se-
quence data, to take into account the phylogenetic
context of the reconstructed network.
The proposed algorithm was found to perform
well in practice despite the computational complexity
of the underlying problem. In experiments with ran-
dom data, the algorithm was able to recover the orig-
inal data from perturbed input more accurately than
the baseline method that did not enforce gaplessness
in reconstructed networks. Moreover, the algorithm
yielded explanations to the question why a given re-
action (enzyme) appears in an ancestral network by
suggesting the required reactions that render the reac-
tion gapless. This is especially important when we at-
tempt to uncover the evolutionary history leading into
observed networks.
While we experimented only with a simple ran-
dom model of evolution, the framework introduced
here lends itself to more realistic models.
Exploring this direction is left as future work,
though we note the importance of incorporat-
ing sequence data with the joint metabolic net-
work/phylogenetic tree topology driven approach pre-
sented here. To this end, dealing with inaccuracies
and omissions in the underlying metabolic reaction
databases presents an additional challenge.
ACKNOWLEDGEMENTS
We would like to thank Pasi Rastas and Esko Ukko-
nen for insightful discussions. This work was finan-
cially supported by Academy of Finland grant 118653
(ALGODAN), in part by the IST Programme of the
European Community, under the PASCAL2 Network
of Excellence, ICT-216886-PASCAL2, and by the
Academy of Finland postdoctoral researcher’s fellow-
ship 127715 (as part of the Finnish Centre of Ex-
cellence in White Biotechnology - Green Chemistry,
Project No. 118573). This publication only reflects
the authors’ views.
REFERENCES
Alon, N., Moshkovitz, D., and Safra, S. (2006). Algorith-
mic construction of sets for k-restrictions. ACM Trans.
Algorithms, 2(2):153–177.
Arvas, M., Kivioja, T., Mitchell, A., Saloheimo, M., Ussery,
D., Penttil
¨
a, M., and Oliver, S. (2007). Comparison of
protein coding gene contents of the fungal phyla Pez-
izomycotina and Saccharomycotina. BMC Genomics,
8(1):325.
Borenstein, E., Kupiec, M., Feldman, M. W., and Rup-
pin, E. (2008). Large-scale reconstruction and phy-
logenetic analysis of metabolic environments. PNAS,
105(38):14482–14487.
Bourque, G. and Sankoff, D. (2004). Improving gene net-
work inference by comparing expression time-series
across species, developmental stages or tissues. J
Bioinform Comput Biol, 2(4):765–783.
Caetano-Anoll
´
es, G., Yafremava, L., Gee, H., Caetano-
Anoll
´
es, D., Kim, H., and Mittenthal, J. (2009). The
origin and evolution of modern metabolism. The In-
ternational Journal of Biochemistry & Cell Biology,
41(2):285–297.
Clemente, J., Satou, K., and Valiente, G. (2007). Phyloge-
netic reconstruction from non-genomic data. Bioin-
formatics, 23(2):e110.
Clemente, J. C., Ikeo, K., Valiente, G., and Gojobori, T.
(2009). Optimized ancestral state reconstruction using
sankoff parsimony. BMC Bioinformatics, 10(51).
Dandekar, T., Schuster, S., Snel, B., Huynen, M., and Bork,
P. (1999). Pathway alignment: application to the com-
parative analysis of glycolytic enzymes. Biochem J.,
343(Pt 1):115–124.
Deacon, J. (2006). Fungal biology. Wiley-Blackwell.
Fitch, W. M. (1971). Toward defining the course of evo-
lution: minimum change for a specific tree topology.
Syst. Zool., 20:406–416.
Fitzpatrick, D., Logue, M., Stajich, J., and Butler, G.
(2006). A fungal phylogeny based on 42 complete
genomes derived from supertree and combined gene
analysis. BMC Evolutionary Biology, 6(1):99.
Garey, M. R. and Johnson, D. S. (1979). Computers
and Intractability: A Guide to the Theory of NP-
Completeness. W. H. Freeman.
Gusfield, D. (1997). Algorithms on Strings, Trees, and Se-
quences. Cambridge University Press.
Handorf, T., Christian, N., Ebenh
¨
oh, O., and Kahn, D.
(2008). An environmental perspective on metabolism.
Journal of Theoretical Biology, 252(3):530–537.
Jamshidi, N. and Palsson, B. O. (2007). Investigating the
metabolic capabilities of Mycobacterium tuberculosis
H37Rv using the in silico strain iNJ661 and proposing
alternative drug targets. BMC Systems Biology, 1(26).
Kanehisa, M., Araki, M., Goto, S., Hattori, M., Hirakawa,
M., Itoh, M., Katayama, T., Kawashima, S., Okuda,
S., Tokimatsu, T., and Yamanishi, Y. (2008). Kegg for
linking genomes to life and the environment. Nucleic
Acids Res., 36:D480–D484.
MINIMUM MUTATION ALGORITHM FOR GAPLESS METABOLIC NETWORK EVOLUTION
37