REACTION KERNELS

Structured Output Prediction Approaches for Novel Enzyme Function

Katja Astikainen, Esa Pitk

anen, Juho Rousu

Department of Computer Science, University of Helsinki, PO Box 68, Helsinki, Finland

Liisa Holm

Institute of Biotechnology, University of Helsinki, PO Box 56, Helsinki, Finland

andor Szedm

Electronics and Computer Science, University of Southampton, SO17 1BJ, Southampton, U. K.

Keywords:

Bioinformatics, Machine learning, Kernel methods, Enzyme function prediction.

Abstract:

Enzyme function prediction problem is usually solved using annotation transfer methods. These methods are

suitable in cases where the function of the new protein is previously characterized and included in the taxon-

omy such as EC hierarchy. However, given a new function that is not previously described, these approaches

arguably do not offer adequate support for the human expert.

In this paper, we explore a structured output learning approach, where enzyme function—an enzymatic

reaction—is described in ﬁne-grained fashion with so called reaction kernels which allow interpolation and

extrapolation in the output (reaction) space. Two structured output models are learned via Kernel Density

Estimation and Maximum Margin Regression to predict enzymatic reactions from sequence motifs. We bring

forward two choices for constructing reaction kernels and experiment with them in the remote homology case

where the functions in the test set have not been seen in the training phase. Our experiments demonstrate the

viability of our approach.

1 INTRODUCTION

Enzymes are the workhorses of living cells, produc-

ing energy and building blocks for cell growth as well

as participating in maintaining and regulation of the

metabolic states of the cells. Reliable assignment of

enzyme function, that is, the biochemical reactions

catalyzed by the enzymes, is a prerequisite of high-

quality metabolic reconstruction (Palsson, 2006).

In literature, the enzyme function prediction prob-

lem comes in two general formulations: annotation

transfer or classiﬁcation by machine learning. In the

ﬁrst approach, given an unannotated protein, a similar

annotated protein with experimentally veriﬁed func-

tion is searched for in databases, and the annotation

is transferred to the new protein. In the second ap-

proach, a model is trained to classify the new protein

into one of the predeﬁned functional classes such as

four-level hierarchical EC classiﬁcation of enzymatic

functions.

The success of the above approaches depends on

the set of previously characterized and catalogued en-

zymatic functions. If the new protein belongs to the

existing function classes, annotation transfer or clas-

siﬁcation learning may work. If the new protein, how-

ever, posseses a function that is not pre-existing, cor-

rect function cannot be predicted even in principle.

Given the diversity of the tree of life, it is likely

that completely new functions are encountered as se-

quencing and annotation efforts widen. Tools, which

can give accurate predictions of what the new func-

tions might be, could expedite these efforts. In this

paper, we develop a structured output prediction ap-

proach that, to our knowledge is the ﬁrst enzyme func-

tion prediction tool to possess the capability of predic-

tion previously unseen functions. The key component

of our method is the representation of enzyme func-

tion in ﬁne-grained fashion with the so called reaction

Astikainen K., Pitkänen E., Rousu J., Holm L. and Szedmák S. (2010).

REACTION KERNELS - Structured Output Prediction Approaches for Novel Enzyme Function.

In Proceedings of the First International Conference on Bioinformatics, pages 48-55

DOI: 10.5220/0002741700480055

 SciTePress

kernels, that allow interpolation and extrapolation in

the space of enzymatic function.

The organization of the paper is the following.

In section 2 we brieﬂy describe main existing ap-

proaches in enzyme function prediction. In section 3,

we review structured output prediction approaches, in

particular Kernel Density Estimation and Maximum

Margin Regression which are applied in the subse-

quent sections. In section 4, we describe representa-

tions for structured output prediction of enzyme func-

tion. We put forward two reaction kernel variants that

allow us to interpolate and extrapolate in the space of

enzymatic reactions. Section 5 describes experiments

validating our approach. Section 6 discusses the rela-

tive merits of the current and competing methods, and

outlines directions for future work.

2 ENZYME FUNCTION

PREDICTION

Protein function prediction is recognized as one of the

key problems in bioinformatics, and hence there is a

large number of approaches to tackle this problem.

Most enzyme function prediction methods are instan-

tiations of the more general protein function predic-

tion problem. Here we give a brief overview of pro-

tein function prediction approaches. For more infor-

mation, we refer the interested reader to the recent

survey of (Punta and Ofran, 2008).

2.1 Annotation Transfer Approaches

The most widely used function prediction approach is

still annotation transfer based on sequence similarity:

given an unannotated protein, using a sequence com-

parison tool such as BLAST, search for an annotated

sequence homolog with an experimentally veriﬁed

function, and transfer the annotation to the new pro-

tein. This approach has well-known pitfalls: sequence

similarity does not equate to homology, function is

typically determined by a small group of residues

whose contribution in the overall similarity may fail

to be detected, and the danger is the propagation of

the annotation errors.

Sequence motifs or signatures are used to over-

come shortcomings of overall sequence similarity. As

the protein function is typically dependent on a small

region of the sequence (e.g. for enzymes the residues

forming the active center), a signiﬁcant amount of re-

search has been conducted to derive sequence mo-

tifs that are predictive of the function (Henikoff and

Henikoff, 1996; Falquet et al., 2002; Mulder et al.,

2002). In this paper, we apply the Global Trace Graph

(Heger et al., 2007) features that can be interpreted

as predicted conserved residues. The GTG features

are derived from a global alignment of all known pro-

tein sequences. In this alignment, GTG features cor-

respond to residues that align consistently within a

group of proteins.

Information about the 3D structure is known to

be a powerful aid in function prediction, due to the

fact that it is ultimately the three-dimensional struc-

ture that determines the protein function. Structural

similarity of two proteins may indicate common evo-

lutionary origin even in the absence of signiﬁcant

sequence similarity. Numerous structural alignment

methods (e.g. (Krissinel and Henrick, 2004; Ye and

Godzik, 2004; Holm and Sander, 1996)) have been

developed to make use of the 3D structures. Structural

motifs are an analogous concept to sequence motifs:

a local constellation of residues in the active center of

an enzyme may be higly predictive of the function. In

this paper, we do not apply 3D information, but leave

this as future work.

2.2 Machine Learning Approaches

Machine learning methods are potentially useful in

cases where the new protein does not possess sig-

niﬁcant sequence (or structure) similarity to existing

proteins. Given large enough data, machine learning

methods are able to distill non-trivial associations be-

tween the input features and the function.

In the machine learning setting, enzyme function

prediction has been generally deﬁned as a classiﬁca-

tion problem. The works by Lanckriet et al. (Lanck-

riet et al., 2004) and Borgwardt et al. (Borgwardt

et al., 2005) use the kernel method to predict the main

categories in MIPS and EC taxonomies, respectively.

Other works aim to predict the membership in the

whole taxonomy. These include the work by Clare

and King (Clare and King, 2002) who use decision

trees to predict the membership in the MIPS taxon-

omy. Barutcuoglu et al. (Barutcuoglu et al., 2006)

combine Bayesian networks with a hierarchy of sup-

port vector machines to predict Gene Ontology clas-

siﬁcation. Blockeel et al. (Blockeel et al., 2006) use

multilabel decision tree approaches to functional class

classiﬁcation according to the MIPS FunCat taxon-

omy.

Structured output approaches (see below) for hi-

erarchical multilabel classiﬁcation (c.f. (Rousu et al.,

2006)) have been applied to enzyme function predic-

tion by Astikainen et al. (Astikainen et al., 2008) and

Sokolov and Ben-hur (Sokolov and Ben-Hur, 2008).

In this paper, we take the hierarchical classiﬁcation

against the EC hierarchy (Astikainen et al., 2008) as

REACTION KERNELS - Structured Output Prediction Approaches for Novel Enzyme Function

one of the comparison methods to the reaction kernel

approach.

3 STRUCTURED OUTPUT

LEARNING

Our objective is to learn a function that, given (a fea-

ture representation of) a sequence, can predict (a fea-

ture representation of) an enzymatic reaction.

Learning algorithms that are designed for struc-

tured prediction tasks like the above, are many. We

concentrate on kernel methods, that let us utilize high-

dimensional feature spaces without computing the

feature maps explicitly. Structured SVM (Tsochan-

taridis et al., 2004), Max-Margin-Markov networks

(Taskar et al., 2004; Rousu et al., 2006), Kernel Den-

sity Estimation (KDE) and Maximum-Margin Re-

gression (MMR) (Szedmak et al., 2005) are learning

methods falling into this category.

We consider a training set of (sequence,reaction)-

pairs D

= {(x

, y

)|x

∈ X , y

∈ Y }

i=1

drawn from an

unknown joint distribution P (X , Y ).

For sequences and reactions, respectively, we as-

sume feature mappings φ: X 7→ F

and ψ: Y 7→ F

mapping the input and output objects into associ-

ated inner product spaces F

and F

. The kernels

(x, x

) = hφ(x), φ(x

)i and K

(y, y

) = hψ(y), ψ(y

deﬁned by the feature maps are called the input and

output kernel, respectively. Above

denotes the

inner product. Subsequently, we discuss particular

choices for the feature mappings and the kernels suit-

able for the enzyme function prediction task.

3.1 Joint Kernels

In structured prediction models based on kernels, the

associations between the inputs and outputs are typ-

ically represented by a joint kernel, deﬁned by some

feature map joint for inputs and outputs. In this paper,

we use a joint feature map

ϕ(x, y): X × Y 7→ F

X ⊗Y

where ϕ(x, y) = φ(x) ⊗ ψ(y) is the tensor product of

input and output feature maps, thus consisting of all

pairwise products φ

(x)ψ

(y) between input and out-

put features. This choice gives us the joint kernel rep-

resentation as elementwise product of the input and

output kernels

(x, y;x

, y

) = K

(x, x

(y, y

The tensor product kernel is suitable in situations

where there is no prior alignment information of in-

put and output features available, but the learning ma-

chine is expected to learn the alignments. This is the

case in our enzyme function prediction setup.

3.2 Learning Task

Most structured prediction models (Taskar et al.,

2004; Tsochantaridis et al., 2004; Szedmak et al.,

2005; Rousu et al., 2006) take the form of a linear

score function

(x, y) = hw, ϕ(x, y)i = hw, φ(x) ⊗ ψ(y)i

in the joint feature space. The model’s prediction ˆy(x)

corresponds to highest scoring output y:

ˆy(x) = argmax

(x, y).

For the model learning we use two computational

methods. The ﬁrst method is Kernel Density Estima-

tion (KDE) which uses the joint kernel density func-

tion

(x, y) =

∑

(x, y;x

, y

) (1)

for scoring. This is the simplest model we use for

prediction, since there is no weighting vector w for

the training examples and all the datapoints are thus

equally important.

The second method, Max-Margin Regression

(MMR) (Szedmak et al., 2005) aims to separate the

training data ϕ(x

, y

) from the origin of the joint

feature space with maximum margin, thus it can be

seen analogous to the one-class SVM (Schlkopf et al.,

2001). The primal form of the MMR optimization

problem can be written as

min

∑

s.t. hw, ϕ(x

, y

)i ≥ 1 − ξ

≥ 0, i = 1, . . . , m.

The dual form of the MMR problem can be expressed

max

∑

i=1

−

∑

i, j=1

, x

, y

)

s.t. 0 ≤ α

≤ C, i = 1, . . . , m. (2)

MMR, due to its simple form, can be optimized

very efﬁciently which makes, for example, the op-

timization of kernel parameters a feasible task on

medium sized datasets (10

-10

examples), which is

not true for most competing approaches (Taskar et al.,

2004; Tsochantaridis et al., 2004; Rousu et al., 2006).

Furthermore, as the output representation is ker-

nelized, it is possible to learn in very complex output

spaces, as we will demonstrate subsequently.

BIOINFORMATICS 2010 - International Conference on Bioinformatics

3.3 Preimage Problem

In all structured output prediction approaches, the

prediction of the model needs to be extracted by solv-

ing the preimage problem

ˆy(x) = argmax

y∈Y

(x, y).

Depending on the output space, solving the preim-

age exactly can be computationally challenging or in-

tractable.

Using kernelized outputs, as in the case of dual

MMR (2), the preimage takes an even more challeng-

ing form

ˆy(x) = argmax

y∈Y

∑

(x, x

(y, y

for which efﬁcient algorithms are hard to come by.

However, a difference between MMR and most struc-

tured output prediction methods is that there is no

need to solve the preimage problem as part of the

training, only during prediction. Thus, the compu-

tational complexity of the preimage is not as a crucial

issue.

In the experiments reported in this paper, we use

a trivial preimage algorithm: we enumerate the set of

outputs contained in our whole dataset (training and

test examples included) Y

= {y|(x, y) ∈ D

}. This

approach will give us an approximate solution to the

preimage problem, that is, the globally best scoring

prediction may lie outside the set Y

. This approach

is sufﬁcient for ﬁrst evaluation of the proposed pre-

diction methods. We leave the development of better

preimage algorithms as future work.

4 KERNELS FOR CHEMICAL

REACTIONS

In this section, we consider how to build kernels for

chemical reactions, using molecule graph kernels as

the building blocks.

Let us ﬁrst introduce some notation used in this

section. We denote a basic set of reactions R , where

a reaction ρ(S(ρ), P(ρ)) ∈ R is given by a set sub-

strates S(ρ) ⊂ M and products P(ρ) ⊂ M

. The set of

reactants is simply the union of substrates and prod-

ucts R(ρ) = S(ρ) ∪ P(ρ). A feature vector describing

a reaction ρ is denoted by ψ(ρ) and the feature vector

describing a molecule M is denoted by φ(M).

To fully represent chemical reaction equations, we

would also need to consider the stoichiometric coefﬁcients

for each reactant; However, we ignore this modelling aspect

here

For illustration, consider a chemical reaction ρ =

({S

}, {P

, P

}) converting a substrate molecule S

into two product molecules P

and P

, thus deﬁned

by the reaction equation

ρ: S

⇒ P

+ P

Consider now a second reaction ρ

({S

, S

}, {P

, P

}), converting substrates S

, S

into products P

+ P

, and back, expressed as

: S

+ S

⇔ P

+ P

How can we measure the similarity of these re-

actions via kernels? The approach in this paper is

to consider pairwise similarities of the constituent

molecules and compute an aggregate on them. While

there are many ways how this could be done in prin-

ciple, two important considerations arise from the

(bio)chemical reality:

• Similarity of Reaction Events vs. Reactants.

We should make a distinction between the simi-

larity of the reaction events versus the similarity

of the reactant molecules. For example, enzymes

belonging to the amino-transferase group are sim-

ilar to each other in that they transfer a certain

functional group (the amino group) from a reac-

tant molecule to another. However, the reactant

molecules need not be similar.

Conversely, there are many different transfor-

mations which can be performed on the same

molecule. For example, pyruvate, an important

hub metabolite in the central metabolism of all

living cells, participates in many reactions. The

transformations applied by the reactions may be

very different from each other, although they work

on the same substrate molecule pyruvate.

Thus, depending on the application, our kernel

should be designed to measure one of these sim-

ilarity notions, or measure both of them in some

proportion.

• Directionality of Reactions. The reactions may

be deﬁned as unidirectional or bidirectional. As

the direction of a reaction depends on thermody-

namical conditions, this may or may not be a rel-

evant issue. For example, most enzymatic reac-

tions are bidirectional in principle, but the condi-

tions inside a living cell force unidirectionality.

When the directionality of reactions is of impor-

tance, each bidirectional reaction can be divided

into forward and backward reactions. In our ex-

ample, we would obtain

f wd

: S

+ S

⇒ P

+ P

and

bwd

: P

+ P

⇒ S

+ S

REACTION KERNELS - Structured Output Prediction Approaches for Novel Enzyme Function

In this case we would like our kernel to be sensi-

tive to the direction so that forward and backward

directions of the same reaction can be discrimi-

nated in the feature space.

However, when reaction direction is of no impor-

tance, the forward and backward directions of a

bidirectional reaction should be treated the same

by our kernel.

Below, we will describe a molecule graph kernel

matrix K

which constitutes the basic component of

the two alternative reaction kernels described next.

For both reaction kernels we also show the underly-

ing feature map which will sufﬁce to show that both

of the reaction kernels below are valid Mercer kernels

if the underlying molecule kernel is a valid Mercer

kernel.

Both of the reaction kernels described below are

very fast to compute, given that the molecule kernel

is pre-computed: the time complexity of the re-

action kernel computation is then linear in the number

of the elements in the kernel matrix.

4.1 Kernels for Molecule Graphs

As the molecule kernel K

underlying the reaction

kernels we use a subgraph kernel restricted to small

subgraphs (10 nodes or less). The kernel computes

the product graph of the two molecule graphs and

counts its connected subgraphs. The kernel con-

structed in this way may in general not be a valid Mer-

cer kernel. However, on our dataset, the kernel matrix

was observed to be positive semideﬁnite.

Enumerating the subgraphs up to the maximum

subgraph size d takes O(m

) time, where m is the

number of edges in the product graph. Thus the ker-

nel is quite time-consuming to compute. In practice,

we were able to compute the common connected sub-

graphs of 1767 KEGG LIGAND (Goto et al., 2002)

molecules up to subgraph size 10 in a week with ap-

proximately 50 Pentium 4 class computers. Consid-

ering the computational resources available nowadays

in research labs, and the time available to solve a typ-

ical problem involving molecular data, the computa-

tional complexity hardly presents a prohibitive con-

straint.

We note that it would also be possible to use a

more quickly computable graph kernel based on com-

mon walks (Gartner, 2003), that is, sequences of la-

beled atoms and bonds, which can be thought to ap-

proximate common subgraphs (each common sub-

graph induces a set of common walks). However, we

leave exploring this direction as future work.

4.2 Sum-of-Reactants Kernel

A simple kernel, called the Sum-of-Reactants (SoR)

kernel, is obtained by deﬁning

SoR

(ρ, ρ

) = m(ρ)

m(ρ

where the vector m(ρ) consists of indicators m

(ρ) =

∈R

}

for the presence or absence of a molecule

in the set of reactants of ρ. The corresponding

feature vector is simply the sum of feature vectors of

molecule graphs in R

ψ(ρ) =

∑

M∈R

φ(M)

Intuitively, the kernel measures the similarity of

reactions in terms of how similar the molecules ma-

nipulated by the reactions are on average, rather than

the similarity of reaction events. The reaction rep-

resentation and the kernel can be considered bidirec-

tional as the different roles of reactant molecules are

not considered.

4.3 Reactant-Matching Kernels

In the SoR kernel there is an underlying all-against-

all matching between the substrate sets (S

, S

), prod-

uct sets (P

, P

) and between the cross-pairs (S

, P

)

and (P

, S

). This measure implicitly contains spuri-

ous matches where one substrate s

∈ S

is matched

against a substrate s

∈ S

while another s

∈ S

matched against a product p

∈ P

. Considering such

matches has no biological signiﬁcance. We can ﬁlter

out the above spurious mactches by deﬁning a feature

map via the tensor product

ψ(ρ) =

∑

M∈S

φ(M) ⊗

∑

M∈P

φ(M),

which gives us the Reactant-Matching (RM) kernel

K(ρ, ρ

) = K(S

, S

)K(P

, P

where we use the shorthand

K(S, S

) =

∑

M∈S

∑

∈S

(M, M

The above kernel is obviously unidirectional as it

matches the reactions in the forward direction. To ob-

tain a bidirectional kernel we compute the backward

direction by taking the cross terms

K(ρ, ρ

) =

(K(S

, S

)K(P

, P

) + K(S

, P

)K(P

, S

))

We note that the bidirectional kernel still ﬁlters out

the above mentioned spurious matches, in the second

term the other reaction is just ﬂipped around.

BIOINFORMATICS 2010 - International Conference on Bioinformatics

5 EXPERIMENTS

5.1 Data

The dataset is a sample(sequence, reaction) pairs from

the KEGG LIGAND database (Goto et al., 2002). As

the input (sequence) representation, we use Global

Trace Graph (GTG, (Heger et al., 2007)) features

that can be interpreted as predicted conserved amino

acids.

We have two separate datasets: the parameter val-

idation set of 1481 enzymes and testing set of 8112

enzymes, which do not have overlapping EC num-

bers. Parameter validation set is yet divided into two

folds, training set of 930 and test set of 551 enzymes.

Testing dataset is divided into ﬁve folds with average

of 1622 enzymes. Members of the folds are chosen

such that each of the different EC number exist only

in one of the folds, so the training sets have no en-

zymes with the test set EC number appear. This is to

simulate setting where a previously unseen functions

are to be predicted.

Both the input (GTG) kernel and the output

(reaction) kernels are fed to a polynomial kernel

poly

(x, z) = (K(x, z) + 1)

and normalized. The re-

stricted size subgraph kernel is used as the molecule

kernel underlying all the reaction kernel variants.

5.2 Compared Methods

We compare the following methods:

• NN(BLAST): This is the baseline annotation

transfer method: given a test sequence, ﬁnd the

nearest sequence neighbor in the training set and

transfer the annotation to the new protein. Se-

quence similarity is taken from pre-computed

Blast scores from the Pairs-DB server (Heger

et al., 2008).

• NN(GTG): This is the annotation transfer meth-

ods using the GTG data. Given a test sequence,

ﬁnd the training sequence with the most common

GTG features with the test sequence, and transfer

the annotation.

• MMR(GTG,Hierarchical): The hierarchical

structured output prediction from (Astikainen

et al., 2008). The method predicts the mem-

bership of the new protein in the EC hierarchy;

generally the prediction is a root-to-leaf path in

the EC hierarchy.

• MMR(GTG,RM): MMR with GTG as input ker-

nel and Reactant-Matching as output kernel.

• KDE(GTG,RM): KDE with GTG as input kernel

and Reactant-Matching as output kernel.

We have beforehand made an experiment where

we compared the function prediction accuracy with

both of the reaction kernels using degree-6 polyno-

mial kernel over the inputs and degree-20 polynomial

kernel over the outputs. F1 score for RM was 27.9%

and for SoR it was 25.9%. Since the RM outper-

formed SoR, we use the RM as output kernel in all

the following experiments.

5.3 Measure of Success

To measure accuracy of prediction, for each test

instance (x, y), we ﬁrst compute the set of top-

scoring functions

Y (x) = {y

∈ Y

|F(α, x, y

) ≥

F(α, x, y

), ∀y

∈ Y

}, that is the reactions that the

prediction model considers the (equally) best. This

set is considered as the prediction of the model.

For each function y

∈

Y (x), we check how many

consecutive digits starting from the left of the EC

number associated with y

coincide with digits of the

EC number associated with the reference function y.

Each such correctly predicted EC digits counts as a

true positive, rest of the EC digits counts as a false

positive. For example, if the reference function y is

3.1.1.1 and prediction set

Y (x) contains two mem-

bers 3.1.2.1 and 3.1.1.10, there are ﬁve true positives

(marked bold) and three false positives out of 8 EC

digits. The EC digit F1 is then the F1 score taken

over all EC digit predictions in the test set.

5.4 Results

5.4.1 Effect of Polynomial Kernel Degree

In the ﬁrst experiment we illustrate the behaviour of

the structured output learning of MMR in very high-

dimensional joint feature space. We use the GTG ker-

nel (predicted conserved residues) as the base input

kernel and the RM kernel as the base output kernel.

In this experiments we use two sets: one for train-

ing and second for testing. Figure 1 shows a heat map

of the EC digit F1 score. The F1 score improves when

the degree either the input, the output or both input

and output polynomial kernels increases. The opti-

mum reaches a plateau at input degrees 1-4 and out-

put degrees 8-16 indicating robustness with respect to

changes in parameter values.

Applying a high-degree polynomial to the base

kernel makes the resulting output kernel more sparse,

which suggests that the reactant matching kernel

alone is too smooth for optimum performance. We

note that optimizing the input and output kernels inde-

pendently can be useful in other structured prediction

settings as well.

REACTION KERNELS - Structured Output Prediction Approaches for Novel Enzyme Function

EC digit F1 (%), b

=0, b

out

Degree of polynomial input kernel

Degree of polynomial output kernel

11.7

11.9

12.1

11.9

12.6

11.4

7.4

11.7

11.9

11.8

11.7

11.8

13.3

12.1

7.3

11.7

11.9

11.7

11.8

12.1

11.7

11.9

11.7

11.8

12.8

13.3

11.9

12.2

11.8

11.9

12.2

13.2

13.6

13.4

11.8

11.9

11.8

13.1

13.5

13.3

11.7

11.8

11.9

11.8

13.5

13.3

11.8

12.9

13.5

13.7

11.9

11.8

12.9

13.5

11.8

12.9

13.2

13.7

11.7

11.8

12.8

13.2

13.5

11.7

11.8

11.9

11.8

12.7

13.1

1 2 4 6 8 10 12 14 16 18 20 30

Figure 1: The EC digit F1 score plotted as the function of

the degrees of the input and output kernels. The best results

are obtained with degree 2 polynomial over the inputs and

degree 8 or higher over the outputs.

5.4.2 Prediction under Remote Homology

In the ﬁnal experiment, we demonstrate the gener-

alization ability of the structured output prediction

methods. We measure how many EC digit are cor-

rectly predicted in testing over a ﬁve fold set of en-

zyme families where the four digit EC numbers are

not overlapping between folds. Thus the training set

contains no enzyme that has exactly the same EC

number, but families that have three matching EC dig-

its typically appear in the training.

In this setup is should be clear that the nearest

neighbor classiﬁer or the hierarchical classiﬁer can-

not ever predict four-digit EC number correctly, as the

methods have not seen any examples of that particular

family. The reaction kernel approach, however, does

not suffer from this limitation: as all possible reac-

tions can be represented in the output space, it is in

principle possible to predict the correct function.

Figure 2 shows the results of this experiment.

Here, we used a degree 8 polynomial kernel over the

RM kernel and degree 2 polynomial kernel over the

inputs. In the bottom chart is the cumulative chart

depicting the number of enzyme families that have at

least certain number of correctly predicted EC digits.

It can be seen that the methods relying on the GTG

features (NN(GTG), KDE(GTG,RM) and both MMR

methods) are more effective in predicting more than

one EC digits correcly. The KDE reaction kernel and

MMR hierarchical approach is slightly better in pre-

dicting two or more EC digits correctly than the com-

peting approaches. Finally, we note that the reaction

kernel approach is the only method that, at times, can

=4 =>3 =>2 =>1

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Percent of enzyme families

EC digits

BLAST

NN GTG

MMR GTG hierarchical

MMR GTG RM

KDE GTG RM

Figure 2: The cumulative distribution of correctly predicted

EC digits in the test set (bottom chart). Each member of

the top ranking preditions

Y (x) contributes one item in the

distribution.

get the whole EC number correct. In other words, the

set of top-ranking reactions

Y (x) contain reactions

that possess the exactly correct EC number.

6 DISCUSSION

The present experiments show the potential of struc-

tured output prediction using reaction kernels: given

a novel, previously unseen enzymatic function, the re-

action kernel approach is signiﬁcantly more accurate

than the annotation transfer approach and also com-

pares with a hierarchical classiﬁer trained with struc-

tured output learning.

Also we note that the reaction kernel approach is

an enabling technique: it is possible, albeit not easy,

to predict the new function exactly correctly. Interest-

ingly best results are obtained with a highly complex

output representation: a high-degree polynomial ker-

nel over reactant matching kernel.

As the result show, using the reaction kernel meth-

ods for enzyme function prediction is encouraging

way to go, even if the prediction accuracy is still very

low for all of the methods used. There are many ar-

eas where the methods can be improved. First, we

only used predicted conserved residues (GTG) as in-

puts. Although they work well, augmenting them

with other types of data, e.g. structural information

should be helpful. Second, the presented reaction ker-

nels certain can be improved and completely different

kinds of encodings of enzyme function can be imag-

ined.

Third, a better preimage algorithm will be needed

BIOINFORMATICS 2010 - International Conference on Bioinformatics

for novel prediction, brute-force enumeration of re-

actions, although sufﬁcient for the purposes of this

paper, is not a satisfactory approach for a practical

system. As simpler output representations may pro-

vide more efﬁcient preimage algorithms, it would be

tempting to simplify the representations. However, in

our view this should not be done at the expense of

predictive accuracy.

REFERENCES

Astikainen, K., Holm, L., Pitknen, E., Szedmak, S., and

Rousu, J. (2008). Towards structured output predic-

tion of enzyme function. BMC Proceedings, 2(S4):S2.

Barutcuoglu, Z., Schapire, R., and Troyanskaya, O. (2006).

Hierarchical multi-label prediction of gene function.

Bioinformatics, 22(7):830–836.

Blockeel, H., Schietgat, L., Struyf, J., et al. (2006). Deci-

sion trees for hierarchical multilabel classiﬁcation: A

case study in functional genomics. In PKDD.

Borgwardt, K. M., Ong, C. S., Schnauer, S., Vishwanathan,

S. V. N., Smola, A. J., and Kriegel, H.-P. (2005). Pro-

tein function prediction via graph kernels. Bioinfor-

matics, 21(1):47–56.

Clare, A. and King, R. (2002). Machine learning of func-

tional class from phenotype data. Bioinformatics,

18(1):160–166.

Falquet, L., Pagni, M., Bucher, P., Hulo, N., Sigrist, C.,

Hofmann, K., and Bairoch, A. (2002). The prosite

database, its status in 2002. Nucleic Acids Research,

30(1):235.

Gartner, T. (2003). A survey of kernels for structured data.

SIGKDD Explorations, 5.

Goto, S., Okuno, Y., Hattori, M., Nishioka, T., and Kane-

hisa, M. (2002). Ligand: database of chemical com-

pounds and reactions in biological pathways. Nucleic

Acids Research, 30(1):402.

Heger, A., Korpelainen, E., Hupponen, T., Mattila, K., Ol-

likainen, V., and Holm, L. (2008). Pairsdb atlas of

protein sequence space. Nucl. Acids Res., 36:D276–

D280.

Heger, A., Mallick, S., Wilton, C., and Holm, L. (2007).

The global trace graph, a novel paradigm for searching

protein sequence databases. Bioinformatics, 23(18).

Henikoff, J. and Henikoff, S. (1996). Blocks database

and its applications. METHODS IN ENZYMOLOGY,

pages 88–104.

Holm, L. and Sander, C. (1996). Dali/fssp classiﬁcation

of three-dimensional protein folds. Nucleic Acids Re-

search, 25(1):231–234.

Krissinel, E. and Henrick, K. (2004). Secondary-structure

matching (ssm), a new tool for fast protein structure

alignment in three dimensions. Acta Crystallograph-

ica D Biol Crystallogr, 60(1 Part 12):2256–2268.

Lanckriet, G., Deng, M., Cristianini, N., et al. (2004).

Kernel-based data fusion and its application to protein

function prediction in yeast. PSB, 2004.

Mulder, N., Apweiler, R., Attwood, T., Bairoch, A., Bate-

man, A., Binns, D., Biswas, M., Bradley, P., Bork,

P., Bucher, P., et al. (2002). Interpro: An inte-

grated documentation resource for protein families,

domains and functional sites. Brieﬁngs in Bioinfor-

matics, 3(3):225–235.

Palsson, B. (2006). Systems Biology: Properties of Recon-

structed Networks. Cambridge University Press.

Punta, M. and Ofran, Y. (2008). The rough guide to in silico

function prediction, or how to use sequence and struc-

ture information to predict protein function. PLoS

Computational Biology, 4(10).

Rousu, J., Saunders, C., Szedmak, S., and Shawe-Taylor, J.

(2006). Kernel-based learning of hierarchical multil-

abel classiﬁcation models. JMLR, 7.

Schlkopf, B., Platt, J. C., Shawe-Taylor, J., Smola, A. J., and

Williamson, R. C. (2001). Estimating the support of

a high-dimensional distribution. Neural Computation,

13(7):1443–1471.

Sokolov, A. and Ben-Hur, A. (2008). A structured-outputs

method for prediction of protein function. In Proceed-

ings of the 3rd International Workshop on Machine

Learning in Systems Biology.

Szedmak, S., Shawe-Taylor, J., and Parado-Hernandez, E.

(2005). Learning via linear operators: Maximum mar-

gin regression. Technical report, Pascal.

Taskar, B., Guestrin, C., and Koller, D. (2004). Max-margin

markov networks. In NIPS 2003.

Tsochantaridis, I., Hofmann, T., Joachims, T., and Altun, Y.

(2004). Support vector machine learning for interde-

pendent and structured output spaces. In ICML.

Ye, Y. and Godzik, A. (2004). Fatcat: a web server for

ﬂexible structure comparison and structure similarity

searching. Nucleic Acids Research, 32(Web Server

Issue):W582.

REACTION KERNELS - Structured Output Prediction Approaches for Novel Enzyme Function