# EFFICIENT PATH KERNELS FOR REACTION FUNCTION PREDICTION

### Markus Heinonen, Niko Välimäki, Veli Mäkinen, Juho Rousu

#### Abstract

Kernels for structured data are rapidly becoming an essential part of the machine learning toolbox. Graph kernels provide similarity measures for complex relational objects, such as molecules and enzymes. Graph kernels based on walks are popular due their fast computation but their predictive performance is often not satisfactory, while kernels based on subgraphs suffer from high computational cost and are limited to small substructures. Kernels based on paths offer a promising middle ground between these two extremes. However, the computation of path kernels has so far been assumed computationally too challenging. In this paper we introduce an effective method for computing path based kernels; we employ a Burrows-Wheeler transform based compressed path index for fast and space-efficient enumeration of paths. Unlike many kernel algorithms the index representation retains fast access to individual features. In our experiments with chemical reaction graphs, path based kernels surpass state-of-the-art graph kernels in prediction accuracy.

#### References

- Astikainen, K., Holm, L., Pitkänen, E., Szedmak, S., and Rousu, J. (2011). Structured output prediction of novel enzyme function with reaction kernels. In Biomedical Engineering Systems and Technologies, pages 367-378. Springer.
- Borgwardt, K., Ong, C., Schönauer, S., Vishwanathan, S., Smola, A., and Kriegel, H.-P. (2005). Protein function prediction via graph kernels. Bioinformatics, 21:i47.
- Demco, A. (2009). Graph Kernel Extension and Experiments with Application to Molecule Classification, Lead Hopping and Multiple Targets. PhD thesis, University of Southampton.
- Felix, H., Rossello, F., and Valiente, G. (2005). Optimal artificial chemistries and metabolic pathways. In Proc. 6th Mexican Int. Conf. Computer Science, pages 298- 305. IEEE Computer Science Press.
- Ferragina, P., Luccio, F., Manzini, G., and Muthukrishnan, S. (2009). Compressing and indexing labeled trees, with applications. J. ACM, 57:4:1-4:33.
- Ferragina, P., Manzini, G., Mäkinen, V., and Navarro, G. (2007). Compressed representations of sequences and full-text indexes. ACM Transactions on Algorithms (TALG), 3(2):article 20.
- Gärtner, T. (2003). A survey of kernels for structured data. ACM SIGKDD Explorations Newsletter, 5:49-58.
- Grossi, R., Gupta, A., and Vitter, J. S. (2004). When indexing equals compression: experiments with compressing suffix arrays and applications. In Proc. 15th annual ACM-SIAM Symposium on Discrete Algorithms, pages 636-645, Philadelphia, PA, USA. SIAM.
- Heinonen, M., Lappalainen, S., Mielikäinen, T., and Rousu, J. (2011). Computing atom mappings for biochemical reactions without subgraph isomorphism. J. Comp. Biology, 18:43-58.
- Jacobson, G. (1989). Succinct Static Data Structures. PhD thesis, Carnegie-Mellon. CMU-CS-89-112.
- Kashima, H., Tsuda, K., and Inokuchi, A. (2003). Marginalized kernels between labeled graphs. In Proc. 20th Int. Conf. on Machine Learning (ICML), pages 321-328.
- Mahe, P., Ueda, N., Akutsu, T., Perret, J.-L., and Vert, J.-P. (2005). Graph kernels for molecular structure-activity relationship analysis with support vector machines. J. Chem. Inf. Model., 45:939-951.
- Ralaivola, L., Swamidass, S., Saigo, H., and Baldi, P. (2005). Graph kernels for chemical informatics. Neural Networks, 18:1093-1110.
- Rousu, J., Saunders, C., Szedmak, S., and Shawe-Taylor, J. (2007). Efficient algorithms for max-margin structured classification. Predicting Structured Data, pages 105-129.
- Saigo, H., Hattori, M., Kashima, H., and Tsuda, K. (2010). Reaction graph kernels predict ec numbers of unknown enzymatic reactions in plant secondary metabolism. BMC Bioinformatics, 11:S31.
- Shawe-Taylor, J. and Christianini, N. (2004). Kernel Methods for Pattern Analysis. Cambridge University Press.

#### Paper Citation

#### in Harvard Style

Heinonen M., Välimäki N., Mäkinen V. and Rousu J. (2012). **EFFICIENT PATH KERNELS FOR REACTION FUNCTION PREDICTION** . In *Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2012)* ISBN 978-989-8425-90-4, pages 202-207. DOI: 10.5220/0003779402020207

#### in Bibtex Style

@conference{bioinformatics12,

author={Markus Heinonen and Niko Välimäki and Veli Mäkinen and Juho Rousu},

title={EFFICIENT PATH KERNELS FOR REACTION FUNCTION PREDICTION},

booktitle={Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2012)},

year={2012},

pages={202-207},

publisher={SciTePress},

organization={INSTICC},

doi={10.5220/0003779402020207},

isbn={978-989-8425-90-4},

}

#### in EndNote Style

TY - CONF

JO - Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2012)

TI - EFFICIENT PATH KERNELS FOR REACTION FUNCTION PREDICTION

SN - 978-989-8425-90-4

AU - Heinonen M.

AU - Välimäki N.

AU - Mäkinen V.

AU - Rousu J.

PY - 2012

SP - 202

EP - 207

DO - 10.5220/0003779402020207