EC1 EC2 EC3 EC4 EC5 EC6
0
2.5
5
7.5
10
12.5
15
EC Class
Test error (%)
Walks
Shortest paths
RGK
Core paths
All paths
Figure 4: Prediction results for the six main EC classes. The
first EC class (oxidoreductases) is most difficult, while the
last (ligases) is well predicted with any method.
5 DISCUSSION AND
CONCLUSIONS
The path index can also be used for on-line path fre-
quency queries: given any path P of length m ≤ h,
where h is the maximum path length enumerated, we
can output the frequency of the path P in all graphs
in just O(m log σ + output) time, where output is the
size of the output, i.e. the number of distinct graphs
having frequency that is greater than zero. Details are
out of the scope of this paper, but this type of on-line
feature query might be interesting when computing
kernels in iterative manner. Another interesting di-
rection would be to implement feature selection or `
1
regularized learning methods for graph data making
use of the efficient access to features.
We presented a method for efficiently computing
all-paths kernels for graph data. Our approach relies
on computing and storing a single compressed path
index of all graphs, which can subsequently be effi-
ciently queried for the purposes for graph kernel or
feature vector computation. We demonstrate the com-
putational feasibility of the approach by computing a
path index for graph representations of KEGG reac-
tions. Our experiments show that path kernels give
significant improvements over walk kernels in the re-
action mechanism prediction task.
ACKNOWLEDGEMENTS
We are grateful to Petteri Kaski, Mikko Koivisto
and Craig Saunders for insightful discussions. This
work was funded by the Academy of Finland grants
1140727 and 118653, the Graduate School Hecse and
by the PASCAL2 (IST grant-2007-216886).
REFERENCES
Astikainen, K., Holm, L., Pitk
¨
anen, E., Szedmak, S., and
Rousu, J. (2011). Structured output prediction of
novel enzyme function with reaction kernels. In
Biomedical Engineering Systems and Technologies,
pages 367–378. Springer.
Borgwardt, K., Ong, C., Sch
¨
onauer, S., Vishwanathan, S.,
Smola, A., and Kriegel, H.-P. (2005). Protein function
prediction via graph kernels. Bioinformatics, 21:i47.
Demco, A. (2009). Graph Kernel Extension and Exper-
iments with Application to Molecule Classification,
Lead Hopping and Multiple Targets. PhD thesis, Uni-
versity of Southampton.
Felix, H., Rossello, F., and Valiente, G. (2005). Optimal
artificial chemistries and metabolic pathways. In Proc.
6th Mexican Int. Conf. Computer Science, pages 298–
305. IEEE Computer Science Press.
Ferragina, P., Luccio, F., Manzini, G., and Muthukrishnan,
S. (2009). Compressing and indexing labeled trees,
with applications. J. ACM, 57:4:1–4:33.
Ferragina, P., Manzini, G., M
¨
akinen, V., and Navarro, G.
(2007). Compressed representations of sequences and
full-text indexes. ACM Transactions on Algorithms
(TALG), 3(2):article 20.
G
¨
artner, T. (2003). A survey of kernels for structured data.
ACM SIGKDD Explorations Newsletter, 5:49–58.
Grossi, R., Gupta, A., and Vitter, J. S. (2004). When index-
ing equals compression: experiments with compress-
ing suffix arrays and applications. In Proc. 15th an-
nual ACM-SIAM Symposium on Discrete Algorithms,
pages 636–645, Philadelphia, PA, USA. SIAM.
Heinonen, M., Lappalainen, S., Mielik
¨
ainen, T., and Rousu,
J. (2011). Computing atom mappings for biochemical
reactions without subgraph isomorphism. J. Comp.
Biology, 18:43–58.
Jacobson, G. (1989). Succinct Static Data Structures. PhD
thesis, Carnegie–Mellon. CMU-CS-89-112.
Kashima, H., Tsuda, K., and Inokuchi, A. (2003). Marginal-
ized kernels between labeled graphs. In Proc. 20th Int.
Conf. on Machine Learning (ICML), pages 321–328.
Mahe, P., Ueda, N., Akutsu, T., Perret, J.-L., and Vert, J.-P.
(2005). Graph kernels for molecular structure-activity
relationship analysis with support vector machines. J.
Chem. Inf. Model., 45:939–951.
Ralaivola, L., Swamidass, S., Saigo, H., and Baldi, P.
(2005). Graph kernels for chemical informatics. Neu-
ral Networks, 18:1093–1110.
Rousu, J., Saunders, C., Szedmak, S., and Shawe-Taylor,
J. (2007). Efficient algorithms for max-margin struc-
tured classification. Predicting Structured Data, pages
105–129.
Saigo, H., Hattori, M., Kashima, H., and Tsuda, K.
(2010). Reaction graph kernels predict ec numbers
of unknown enzymatic reactions in plant secondary
metabolism. BMC Bioinformatics, 11:S31.
Shawe-Taylor, J. and Christianini, N. (2004). Kernel Meth-
ods for Pattern Analysis. Cambridge University Press.
EFFICIENT PATH KERNELS FOR REACTION FUNCTION PREDICTION
207