for novel prediction, brute-force enumeration of re-
actions, although sufficient for the purposes of this
paper, is not a satisfactory approach for a practical
system. As simpler output representations may pro-
vide more efficient preimage algorithms, it would be
tempting to simplify the representations. However, in
our view this should not be done at the expense of
predictive accuracy.
REFERENCES
Astikainen, K., Holm, L., Pitknen, E., Szedmak, S., and
Rousu, J. (2008). Towards structured output predic-
tion of enzyme function. BMC Proceedings, 2(S4):S2.
Barutcuoglu, Z., Schapire, R., and Troyanskaya, O. (2006).
Hierarchical multi-label prediction of gene function.
Bioinformatics, 22(7):830–836.
Blockeel, H., Schietgat, L., Struyf, J., et al. (2006). Deci-
sion trees for hierarchical multilabel classification: A
case study in functional genomics. In PKDD.
Borgwardt, K. M., Ong, C. S., Schnauer, S., Vishwanathan,
S. V. N., Smola, A. J., and Kriegel, H.-P. (2005). Pro-
tein function prediction via graph kernels. Bioinfor-
matics, 21(1):47–56.
Clare, A. and King, R. (2002). Machine learning of func-
tional class from phenotype data. Bioinformatics,
18(1):160–166.
Falquet, L., Pagni, M., Bucher, P., Hulo, N., Sigrist, C.,
Hofmann, K., and Bairoch, A. (2002). The prosite
database, its status in 2002. Nucleic Acids Research,
30(1):235.
Gartner, T. (2003). A survey of kernels for structured data.
SIGKDD Explorations, 5.
Goto, S., Okuno, Y., Hattori, M., Nishioka, T., and Kane-
hisa, M. (2002). Ligand: database of chemical com-
pounds and reactions in biological pathways. Nucleic
Acids Research, 30(1):402.
Heger, A., Korpelainen, E., Hupponen, T., Mattila, K., Ol-
likainen, V., and Holm, L. (2008). Pairsdb atlas of
protein sequence space. Nucl. Acids Res., 36:D276–
D280.
Heger, A., Mallick, S., Wilton, C., and Holm, L. (2007).
The global trace graph, a novel paradigm for searching
protein sequence databases. Bioinformatics, 23(18).
Henikoff, J. and Henikoff, S. (1996). Blocks database
and its applications. METHODS IN ENZYMOLOGY,
pages 88–104.
Holm, L. and Sander, C. (1996). Dali/fssp classification
of three-dimensional protein folds. Nucleic Acids Re-
search, 25(1):231–234.
Krissinel, E. and Henrick, K. (2004). Secondary-structure
matching (ssm), a new tool for fast protein structure
alignment in three dimensions. Acta Crystallograph-
ica D Biol Crystallogr, 60(1 Part 12):2256–2268.
Lanckriet, G., Deng, M., Cristianini, N., et al. (2004).
Kernel-based data fusion and its application to protein
function prediction in yeast. PSB, 2004.
Mulder, N., Apweiler, R., Attwood, T., Bairoch, A., Bate-
man, A., Binns, D., Biswas, M., Bradley, P., Bork,
P., Bucher, P., et al. (2002). Interpro: An inte-
grated documentation resource for protein families,
domains and functional sites. Briefings in Bioinfor-
matics, 3(3):225–235.
Palsson, B. (2006). Systems Biology: Properties of Recon-
structed Networks. Cambridge University Press.
Punta, M. and Ofran, Y. (2008). The rough guide to in silico
function prediction, or how to use sequence and struc-
ture information to predict protein function. PLoS
Computational Biology, 4(10).
Rousu, J., Saunders, C., Szedmak, S., and Shawe-Taylor, J.
(2006). Kernel-based learning of hierarchical multil-
abel classification models. JMLR, 7.
Schlkopf, B., Platt, J. C., Shawe-Taylor, J., Smola, A. J., and
Williamson, R. C. (2001). Estimating the support of
a high-dimensional distribution. Neural Computation,
13(7):1443–1471.
Sokolov, A. and Ben-Hur, A. (2008). A structured-outputs
method for prediction of protein function. In Proceed-
ings of the 3rd International Workshop on Machine
Learning in Systems Biology.
Szedmak, S., Shawe-Taylor, J., and Parado-Hernandez, E.
(2005). Learning via linear operators: Maximum mar-
gin regression. Technical report, Pascal.
Taskar, B., Guestrin, C., and Koller, D. (2004). Max-margin
markov networks. In NIPS 2003.
Tsochantaridis, I., Hofmann, T., Joachims, T., and Altun, Y.
(2004). Support vector machine learning for interde-
pendent and structured output spaces. In ICML.
Ye, Y. and Godzik, A. (2004). Fatcat: a web server for
flexible structure comparison and structure similarity
searching. Nucleic Acids Research, 32(Web Server
Issue):W582.
REACTION KERNELS - Structured Output Prediction Approaches for Novel Enzyme Function
55