Hadamard Code Graph Kernels for Classifying Graphs

Tetsuya Kataoka and Akihito Inokuchi

School of Science and Technology, Kwansei Gakuin University, 2-1 Gakuen, Sanda, Hyogo, Japan

Keywords:

Graph Classiﬁcation, Support Vector Machine, Graph Kernel, Hadamard Code.

Abstract:

Kernel methods such as Support Vector Machines (SVMs) are becoming increasingly popular because of their

high performance on graph classiﬁcation problems. In this paper, we propose two novel graph kernels called

the Hadamard Code Kernel (HCK) and the Shortened HCK (SHCK). These kernels are based on the Hadamard

code, which is used in spread spectrum-based communication technologies to spread message signals. The

proposed graph kernels are equivalent to the Neighborhood Hash Kernel (NHK), one of the fastest graph

kernels, and comparable to the Weisfeiler-Lehman Subtree Kernel (WLSK), one of the most accurate graph

kernels. The fundamental performance and practicality of the proposed graph kernels are evaluated using three

real-world datasets.

1 INTRODUCTION

A natural way of representing structured data is to

use graphs (Vinh, et. al, 2010). As an example,

the structural formula of a chemical compound is a

graph, where each vertex corresponds to an atom in

the compound and each edge corresponds to a bond

between the two atoms therein. Using such graph rep-

resentations, a new research ﬁeld called graph min-

ing has emerged from data mining with the objective

of mining information from a database consisting of

graphs. With the potential to ﬁnd meaningful infor-

mation, graph mining has raised great interest, and

research in the ﬁeld has increased rapidly in recent

years. Furthermore, because the need for classifying

graphs has increased in many real-world applications,

e.g., the analysis of proteins in bioinformatics and

chemical compounds in cheminformatics (Sch ¨olkopf,

et. al, 2004), graph classiﬁcation has also been widely

researched worldwide. The main objective of graph

classiﬁcation is to classify graphs of similar structures

into the same classes. This originates from the fact

that instances represented by graphs usually havesim-

ilar properties if their graph representations have high

structural similarity.

Kernel methods such as Support Vector Machine

(SVM) are becoming increasingly popular because of

their high performance on graph classiﬁcation prob-

lems (Kashima, et. al, 2003). Most graph kernels

are based on the decomposition of a graph into sub-

structures and a feature vector containing counts of

these substructures. Because the dimensionality of

these feature vectors is typically very high and this ap-

proach includes the subgraph isomorphism matching

problem that is known to be NP-complete (Garey and

Johnson, 1979), kernels deliberately avoid the explicit

computation of feature values and instead employ ef-

ﬁcient procedures.

One representative graph kernel is the Random

Walk Kernel (RWK) (Sch¨olkopf and Smola, 2002;

Kashima, et. al, 2003), which computes k(g

) in

O(|V(g)|

) for graphs g

and g

, where |V(g)| is the

number of vertices in g

and g

. The kernel returns

a high value if the random walk on the graph gen-

erates many sequences with the same labels for ver-

tices and edges, i.e., the graphs are similar to each

other. The Neighborhood Hash Kernel (NHK) (Hido

and Kashima, 2009) and the Weisfeiler-Lehman Sub-

tree Kernel (WLSK) are two other recently proposed

kernels that compute k(g

) faster than RWK. The

NHK uses logical operations such as exclusive-OR on

the label set of adjacent vertices, while the WLSK

uses a concatenation of label strings of the adjacent

vertices to compute k(g

). The labels updated by

repeating the hash or concatenation propagate the la-

bel information over the graph and uniquely represent

the higher-order structures around the vertices beyond

the vertex or edge level. An SVM with two graph ker-

nels works very well with benchmark data consisting

of graphs.

The computation of NHK is very efﬁcient be-

cause its computation is a logical operation between

ﬁxed-length bit strings and does not require any string

Kataoka, T. and Inokuchi, A.

Hadamard Code Graph Kernels for Classifying Graphs.

DOI: 10.5220/0005634700240032

In Proceedings of the 5th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2016), pages 24-32

ISBN: 978-989-758-173-1

sorting. However, its drawback is hash collision,

which occurs when different induced subgraphs have

an identical hash value. Although WSLK must sort

the vertex labels, it has high expressiveness because

each vertex v has a distribution of vertex labels within

i steps from v. To overcome these drawbacks, in this

paper, we propose a novel graph kernel that is equiv-

alent with NHK in terms of time and space complex-

ities and comparable to WLSK in term of expressive-

ness. The graph kernel proposed in this paper is based

on the Hadamard code. The Hadamard code is used in

spread spectrum-based communication technologies

such as Code Division Multiple Access (CDMA) to

spread message signals. Because the probability of

occurrences of 1 and −1 are equivalent in each col-

umn of the Hadamard matrix except for the ﬁrst col-

umn, labels assigned by our graph kernel follow the

binomial distribution with zero mean under a certain

assumption. Therefore, the expected value of the la-

bel is 0, and for such labels, a large memory space is

not required. This characteristic is used to compress

vertex labels in graphs, enabling the proposed graph

kernel to be computed quickly.

The rest of this paper is organized as follows. In

Section 2, we deﬁne the graph classiﬁcation prob-

lem and explain the framework of the existing graph

kernels. In Section 3, we propose the Hadamard

Code Kernel (HCK), based on the Hadamard code,

and another graph kernel called the Shortened HCK

(SHCK), which is a version of HCK that compresses

vertex labels in graphs. In Section 4, we provide a

theoretical discussion of the effect of overﬂow on the

proposed graph kernel. In Section 5, the fundamental

performance and practicality of the proposed method

are demonstrated through experiments. Finally, we

conclude the paper in Section 6.

2 GRAPH KERNELS

2.1 Framework of Representative

Graph Kernels

This paper tackles the classiﬁcation problem of

graphs. A graph is represented as g = (V,E,Σ,ℓ),

where V is a set of vertices, E ⊆ V × V is a set of

edges, Σ is a set of vertex labels, and ℓ : V → Σ is

a function that assigns a label to each vertex in the

graph. Additionally, the set of vertices in graph g

is represented as V(g). Although we assume that

only the vertices in the graphs have labels in this

paper, the methods in this paper can be applied to

graphs where both the vertices and edges have labels.

The vertices adjacent to vertex v are represented as

N(v) = {u | (v,u) ∈ E}. A sequence of vertices from

v to u is called a path, and its step refers to the num-

ber of edges on that path. A path is called simple

if and only if the path does not have repeating ver-

tices. Paths in this paper are not always simple. Given

two graphs g = (V,E,L,ℓ) and g

′

= (V

′

,ℓ

′

), g

′

is called a subgraph of g, if there exists an injective

function ϕ : V

′

→ V that satisﬁes the following three

conditions for ∀v, v

∈ V

′

1. (ϕ(v

),ϕ(v

)) ∈ E, if (v

) ∈ E

′

2. ℓ

′

(v) = ℓ(ϕ(v)),

3. ℓ

′

((v

)) = ℓ((ϕ(v

),ϕ(v

))).

Additionally, a subgraph g

′

of g is an “induced sub-

graph,” where ϕ(v

) and ϕ(v

) are adjacent in g if and

only if v

and v

in V(g

′

) are adjacent in g

′

The graph classiﬁcation problem is deﬁned as

follows. Given a set of n training examples D =

{(g

)} (i = 1, · · · , n), where each example is a pair

consisting of a labeled graph g

and the class y

∈

{+1, −1} to which it belongs, the objective is to learn

a function f that correctly predicts the classes of the

test examples.

In this paper, graphs are classiﬁed by a

SVM that uses graph kernels. Let Σ and

c(g,σ) be {σ

,σ

,··· ,σ

|Σ|

} and c(g,σ) =

|{v ∈ V(g) | ℓ(v) = σ}|, respectively. A function

φ that converts a graph g to a vector is deﬁned as

φ(g) =



c(g,σ

),c(g,σ

),··· ,c(g,σ

|Σ|

)



Function k

′

), deﬁned as φ(g

)

φ(g

), is a semi-

positive deﬁnite kernel. This function is calculated as

follows.

′

) = φ(g

)

φ(g

)

∑

∈V(g

)

∑

∈V(g

)

δ(ℓ(v

),ℓ(v

)),

where δ is the Kronecker delta.

Given a g

(h)

= (V,E,Σ,ℓ

(h)

), a procedure to con-

vert g

(h)

to another graph g

(h+1)

= (V,E,Σ

′

,ℓ

(h+1)

)

is called a relabel. Although relabel function ℓ

(h+1)

is deﬁned later in detail, the label of a v in g

(h+1)

is deﬁned using the labels of v and N(v) in g

(h)

and is denoted as ℓ

(h+1)

(v) = r(v,N(v),ℓ

(h)

). Let

(0)

(1)

,··· ,g

(h)

} be a series of graphs obtained by

iteratively applying a relabel h times, where g

(0)

is a

graph contained in D. Given two graphs g

and g

, a

graph kernel is deﬁned using k

′

k(g

) = k

′

(0)

) + k

′

(1)

) + ··· + k

′

(h)

Because k is a summation of semi-positive deﬁnite

kernels, k is also semi-positive deﬁnite (Cristianini

and Taylor, 2000).

Hadamard Code Graph Kernels for Classifying Graphs

Recently, various graph kernels have been applied

to the graph classiﬁcation problem. Representative

graph kernels such as the NHK and WLSK follow

the above framework, where graphs contained in D

are iteratively relabeled. In these kernels, ℓ

(h)

(v) =

r(v, N(v),ℓ

(h−1)

) characterizes a subgraph induced by

the vertices that are reachable from v within h steps

in g

(0)

. Therefore, given v

∈ V(g

) and v

∈ V(g

), if

subgraphs of the graphs induced by the vertices reach-

able from vertices v

and v

within h steps are identi-

cal, the relabel assigns an identical label to them. Ad-

ditionally, it is desirable for a graph kernel to fulﬁll

the converse of this condition. However, it is not an

easy task to design such a graph kernel.

We now review the representative graph kernels,

NHK and WLSK.

NHK: Given a ﬁxed-length bit string ℓ

(0)

(v) of

length L, ℓ

(h)

(v) is deﬁned as follows.

ℓ

(h)

(v) = ROT(ℓ

(h−1)

(v)) ⊕





u∈N(v)

ℓ

(h−1)

(u)





where ROT is bit rotation to the left and ⊕ is the ex-

clusive OR of the bit strings. NHK is efﬁcient in terms

of computation and space complexities because the

relabel of NHK is computable in O(L|N(v)|) for each

vertex and its space complexity is O(L).

Figure 1 shows an example of an NHK relabel and

its detailed calculation for a vertex v

, assuming that

L = 3. First, ℓ

(0)

) = #011 is rotated to return #110.

We then obtain #001 by the exclusive OR of #110,

ℓ

(0)

) = #011, ℓ

(0)

) = #001, ℓ

(0)

) = #001, and

ℓ

(0)

) = #100. In this computation, we do not re-

quire sorted bit strings because the exclusive OR is

commutative. Three bits are required for ℓ

(0)

) in

this example, and ℓ

(h)

) also requires three bits,

even if h is increased.

NHK has a drawback with respect to accidental

hash collisions. For example, vertices v

, v

, and v

(1)

in Fig. 1 have an identical label after the relabel.

This is because v

and v

in g

(0)

have identical labels

and the same number of adjacent vertices. However,

despite the different labels and numbers of adjacent

vertices of v

and v

, these vertices have the same ver-

tex labels in g

(1)

, leading to low graph expressiveness

and low classiﬁcation accuracy.

We next describe the WLSK, which is based on

the Weisfeiler-Lehman algorithm, an algorithm that

determines graph isomorphism.

WLSK: When ℓ

(0)

(v) returns a string of characters,

⊕

Figure 1: Relabeling g

(0)

to g

(1)

in NHK.

ℓ

(h)

(v) is deﬁned as

ℓ

(h)

(v) = ℓ

(h−1)

(v) ·





u∈N(v)

ℓ

(h−1)

(u)





where · and

are string concatenation operators. Be-

cause concatenation is not commutative, u is an iter-

ator to obtain the vertices N(v) adjacent to v in al-

phabetical order. Because ℓ

(h)

(v) has information on

the distribution of labels for h steps from v, it has

high graph expressiveness.

If the labels are sorted

using bucket sort, the time complexity of WLSK is

O(|Σ||N(v)|) for each vertex.

Figure 2 shows an example of a relabel using

WLSK. Vertices v

, v

, and v

in g

(0)

have

labels A, A, B, B, and C, respectively. For each ver-

tex, WLSK sorts the labels of the vertices adjacent to

the vertex, then concatenates these labels. In g

(1)

, v

has label BAC, meaning that v

has label B in g

(0)

and

two adjacent vertices whose labels are A and C.

In addition to NHK and WLSK, we deﬁne the La-

bel Aggregate Kernel (LAK) to facilitate the under-

standing of the other kernels proposed in this paper.

LAK: In this kernel, ℓ

(0)

(v) is a vector in |Σ|-

dimensional space. In concrete terms, if a vertex in

a graph has a label σ

among Σ = {σ

,σ

,··· ,σ

|Σ|

the i-th element in the vector is 1. Otherwise, it is 0.

In LAK, ℓ

(h)

(v) is deﬁned as

ℓ

(h)

(v) = ℓ

(h−1)

(v) +

∑

u∈N(v)

ℓ

(h−1)

(u).

When ℓ

(0)

(v) is a string of length 1, ℓ

(1)

(v) is a string of

length |N(v)| + 1. By replacing the later string with a new

string of length 1, both the computation time and memory

space that WLSK requires are reduced.

ICPRAM 2016 - International Conference on Pattern Recognition Applications and Methods

Figure 2: Relabeling g

(0)

to g

(1)

in WLSK.

Figure 3: Relabeling g

(0)

to g

(1)

in LAK.

Figure 4: Relabeling g

(3)

to g

(4)

in LAK.

The i-th element in ℓ

(h)

(v) is the frequency of occur-

rence of character σ

in the string ℓ

(h)

(v) concatenated

by WLSK. Therefore, ℓ

(h)

(v) has information on the

distribution of labels within h steps from v. Therefore,

LAK has high graph expressiveness. However, when

Table 1: Graph Kernel Characteristics.

advantages drawbacks

NHK computation time hash collision

WSLK expressiveness computation time

LAK expressiveness & memory space

computation time

h is increased, the number of paths from v that reach

vertices labeled σ

increases exponentially. Thus, el-

ements in ℓ

(h)

(v) also increase exponentially. For ex-

ample, if the average degree of vertices is d, there

are (d + 1)

vertices reachable from v within h steps.

Thus, LAK requires a large amount of memory space.

Figures 3 and 4 showan exampleof a relabel using

LAK, assuming that |Σ| = 3. The vertex label of v

(1)

is (1, 2, 1), which means that there are one, two,

and one vertices reachable from v within one step that

have labels σ

, σ

, and σ

, respectively. Compared

with relabeling g

(0)

to g

(1)

, the additional number of

values in ℓ

(h)

(v) when relabeling g

(3)

to g

(4)

is large.

2.2 Existing Graph Kernel Drawbacks

We here summarize the characteristics of the above

three graph kernels. NHK is efﬁcient because its

computation is a logical operation between ﬁxed-

length bit strings and does not require string sorting.

However, its drawback is a tendency for hash colli-

sion, where different induced subgraphs have identi-

cal hash values. Although WSLK requires vertex la-

bel sorting, it has high expressiveness because ℓ

(h)

(v)

contains the distribution of the vertex labels within

′

steps (0 ≤ h

′

≤ h) from v. LAK requires a large

amount of memory space to store vectors for high h

although it does not require label sorting. To over-

come these drawbacks, in this paper, we propose a

novel graph kernel that is equivalent to NHK in terms

of time and space complexities and equivalentto LAK

in terms of expressiveness.

3 GRAPH KERNELS BASED ON

THE HADAMARD CODE

In this section, we propose a novel graph kernel with

the Hadamard code to overcome the aforementioned

drawbacks. A Hademard matrix is a square (−1,1)-

matrix in which any two row vectors are orthogonal,

deﬁned as follows:



1 1

1 −1



(1)



k−1

−H

k−1



(2)

Hadamard Code Graph Kernels for Classifying Graphs

A Hadamard code is a row vector of the Hadamard

matrix. Given a Hadamard matrix of order 2

, 2

Hadamard codes having 2

elements are generated

from this matrix. Using the Hadamard codes, we

propose the HCK as follows.

HCK: Let H be a Hadamard matrix of order

⌈log

|Σ|⌉

and ℓ

(0)

(v) be a Hadamard code of order

|H|. If a vertex v has label σ

, the i-th row in the

Hadamard matrix of order |H| is assigned to the

vertex. Then ℓ

(h)

(v) is deﬁned as follows.

ℓ

(h)

(v) = ℓ

(h−1)

(v) +

∑

u∈N(v)

ℓ

(h−1)

(u).

When ℓ

is a Hadamard code for a vertex label

, ℓ

ℓ

(h)

(v)/|H| is the occurrence of σ

in a string

ℓ

(h)

(v) generated by WLSK. Therefore, HCK has the

same expressiveness as LAK.

Figure 5 shows an example of a relabel using

HCK. Each vertex v in g

(1)

is represented as a vector

produced by the summation of vectors for vertices ad-

jacent to v in g

(0)

. Additionally, after the relabel, we

can obtain the distribution of the vertex labels within

one step of v using the following calculation:

|H|

Hℓ

(1)

)







1 1 1 1

1 −1 1 −1

1 1 −1 −1

1 −1 −1 1













−2



















That is, there are one σ

, two σ

, and one σ

labels

within one step of v

. Furthermore, the result is equiv-

alent to ℓ

(1)

), as shown in Fig. 3. The reason why

we divide Hℓ

(h)

(v) by four is that the order of the

Hadamard matrix used is |H| = 4.

If each element in ℓ

(h)

(v) is stored in four bytes

(the commonly used size of integers in C, Java, and

other languages) the space complexity of HCK is

equivalent to LAK. Therefore, we have not overcome

the drawback of LAK yet. In this paper, we assume

that each vertex label is assigned to a vertex with

equal probability. Because the probability of occur-

rence of 1 and −1 are equivalent in the each col-

umn in the Hadamard matrix except for the ﬁrst col-

umn, the i-th element (1 < i ≤ |Σ|) in ℓ

(h)

(v) follows

a binomial distribution with zero mean under this as-

sumption. Therefore, the expected value of the el-

ement in ℓ

(h)

(v) is 0, and for the elements, a large

Figure 5: Relabeling g

(0)

to g

(1)

in HCK.

memory space is not required. For example, Tables 2

and 3 represent values of the i-th elements in ℓ

(h)

)

and ℓ

(h)

), respectively, in a graph g

(h)

, when g

(0)

(shown in Fig. 6) is relabeled iteratively h times. Un-

der this assumption of vertex label probability, the ex-

pected value of all elements in ℓ

(h)

) except for the

ﬁrst element becomes 0. The ﬁrst element represents

the number of paths from v

to the vertices reachable

within one step. Based on this observation, we assign

bit arrays of length ρ in the L bit array to the elements

as follows.

SHCK: Similar to NHK, ℓ

(0)

(v) is a ﬁx-length bit

array of length L. The bit array is divided into

|H| fragments, one of which is a bit array of length

L − ρ(|H| − 1) and the rest are bit arrays of length

ρ. The ﬁrst fragment of length L − ρ(|H| − 1) is as-

signed to store the ﬁrst element of ℓ

(0)

(v), the next

fragment of length ρ is assigned to store the second

element, and so on. Here, ρ is a positive integer ful-

ﬁlling ρ(|H| − 1) = ρ(2

⌈log

|Σ|⌉

− 1) ≤ L. Addition-

ally, each element of ℓ

(0)

(v) is represented by its one’s

complement in ℓ

(0)

(v) for the purpose of the follow-

ing summation, which deﬁnes ℓ

(h)

(v).

ℓ

(h)

(v) = ℓ

(h−1)

(v) +

∑

u∈N(v)

ℓ

(h−1)

(u).

Because ℓ

(h)

(v) is a ﬁxed-length binary bit string and

ℓ

(h)

(v) is the summation of the values represented as

bit strings, both the time and space complexities of

SHCK are equivalent to those of NHK. Additionally,

the expressiveness of SHCK is equivalent to LAK, if

overﬂow of the ﬁx-length bit array does not occur.

ICPRAM 2016 - International Conference on Pattern Recognition Applications and Methods

Table 2: Elements in a label in LAK.

h Label

0 ℓ

(0)

) = ( 0 1 0 0)

1 ℓ

(1)

) = ( 1 1 1 0)

2 ℓ

(2)

) = ( 2 3 2 2)

3 ℓ

(3)

) = ( 7 7 7 6)

4 ℓ

(4)

) = ( 20 21 20 20)

5 ℓ

(5)

) = ( 61 61 61 60)

6 ℓ

(6)

) = ( 182 183 182 182)

7 ℓ

(7)

) = ( 547 547 547 546)

8 ℓ

(8)

) = ( 1640 1641 1640 1640)

9 ℓ

(9)

) = ( 4921 4921 4921 4920)

10 ℓ

(10)

) = ( 14762 14763 14762 14762)

Figure 6: Relabeled graphs.

Table 3: Elements in a label in HCK.

h Label

0 ℓ

(0)

) =( 1 -1 -1 1)

1 ℓ

(1)

) =( 3 -1 -1 -1)

2 ℓ

(2)

) =( 9 -1 -1 1)

3 ℓ

(3)

) =( 27 -1 -1 -1)

4 ℓ

(4)

) =( 81 -1 -1 1)

5 ℓ

(5)

) =( 243 -1 -1 -1)

6 ℓ

(6)

) =( 729 -1 -1 1)

7 ℓ

(7)

) =( 2187 -1 -1 -1)

8 ℓ

(8)

) =( 6561 -1 -1 1)

9 ℓ

(9)

) =( 19683 -1 -1 -1)

10 ℓ

(10)

) =( 59049 -1 -1 1)

4 SHCK OVERFLOW

As explained in the previous section, in SHCK, the

ﬁxed-length bit array L is divided into small frag-

ments, each of which corresponds to an element in

ℓ

(h)

(v). We sum such bit arrays to relabel vertices.

Because all elements in ℓ

(h)

(v) except for the ﬁrst ele-

ment are represented as a bit array of length ρ, we face

the possibility of overﬂow when iteratively summing

up these bit arrays. In this section, we theoretically

discuss the probability of overﬂow in SHCK.

Let x

be the i-th element in ℓ

(h)

(v), which is the

label of vertex v and is a value generated by summing

up the base Hadamard codes k times. If i = 1, x

= k.

For i 6= 1, if −2

≤ x

≤ 2

− 1, x

ﬁts in a fragment

of length ρ without overﬂowing. Let p(k, j) be the

probability that the value of x

is j and x

ﬁts in a bit

fragment of length ρ without overﬂowing. Under the

assumption that the probability of any label existing

on a vertex is uniform, when k = 1,

p(k, j) =







1/2 if j = 1,

1/2 if j = −1, and

0 otherwise,

because an element in the Hadamard matrix is either

1 or −1. If x

ﬁts in a bit array of length ρ without

overﬂowing, x

k−1

also ﬁts in the array. In contrast, if

cannot ﬁt in a bit array of length ρ without over-

ﬂowing, x

k+1

also cannot ﬁt in the array. Overﬂow

occurs when x

is 2

ρ−1

and +1 sum to x

or when x

is −2

and −1 sums to x

. Therefore, p(k, j) is intro-

duced by the following recurrence formula.

p(k, j) =











p(k− 1, j − 1) if j = 2

− 1,

p(k− 1, j + 1) else if j = −2

p(k− 1, j + 1) +

p(k− 1, j − 1)

else i f − 2

< j < 2

− 1,

0 otherwise.

Accordingly, p(k), which is the probability that x

ﬁts

in a bit array of length ρ without overﬂowing is

p(k) =

−1

∑

j=−2

p(k, j).

After h relabels of a graph in which the average

degree is d, x

is a value that is a summation of k =

(d + 1)

binary values. The probability p(ρ, d,h) that

overﬂow does not occur for ρ, d, and h is

p(ρ,d,h) =

−1

∑

j=−2



(d + 1)

, j



. (3)

When h increases, p(ρ, d,h) becomes very small.

Nevertheless, in the next section, we demonstrate that

the proposed graph kernel SHCK has the ability to

classify graphs with high accuracy.

5 EXPERIMENTAL EVALUATION

The proposed method was implemented in Java. All

experiments were done on an Intel Xeon X5670 2.93

GHz computer with 48 GB memory running Mi-

crosoft Windows 8. We compared the computation

Hadamard Code Graph Kernels for Classifying Graphs

Table 4: Summary of evaluation datasets.

MUTAG ENZYMES D&D

Number of graphs |D| 188 600 1178

Maximum graph size 84 126 5748

Average graph size 53.9 32.6 284.3

Number of labels |Σ| 12 3 82

Number of classes 2 6 2

(class distribution) (126,63) (100,100,100,100,100,100) (487, 691)

Average degree of vertices 2.1 3.8 5.0

Figure 7: Conversion of a graph.

time and accuracy of the prediction performance of

HCK and SHCK with those of HNK and WLSK. To

learn from the kernel matrices generated by the above

graph kernels, we used the LIBSVM package

using

10-fold cross validation.

We used three real-world datasets. The ﬁrst

dataset, MUTAG (Debnath, et. al, 1991), contains in-

formation on 188 chemical compounds and their class

labels. The class labels are binary values that indi-

cate the mutagenicity of chemical compounds. The

second dataset, ENZYMES, contains information on

600 proteins and their class labels. The class labels

are one of six labels showing the six EC top-level

classes (Schomburg, et. al, 2004). The third dataset,

D&D, contains information on 1178 protein struc-

tures, in which each amino acid corresponds to a ver-

tex and two vertices are connected by an edge if they

are less than 6

Angstroms apart (Dobson and Doig,

2003). Each chemical compound is represented as an

undirected graph where each vertex, edge, vertex la-

bel, and edge label corresponds to an atom, chemi-

cal bond, atom type, and bond type, respectively. Be-

cause we assume that only vertices in graphs have la-

bels, the chemical graphs are converted following the

article (Hido and Kashima, 2009), that is, an edge la-

beled with ℓ that is adjacent to vertices v and u in a

chemical graph is replaced with a vertex labeled with

ℓ that is adjacent to v and u with unlabeled edges, as

shown in Fig. 7. Table 4 summarizes the datasets.

5.1 Scalability

Figures 8, 9, and 10 show the computation time re-

quired to obtain a graph g

(h)

from a graph g

(0)

http://www.csie.ntu.edu.tw/∼cjlin/libsvm/

Ϭ͘ϭ

Ϭ͘Ϯ

Ϭ͘ϯ

Ϭ͘ϰ

Ϭ͘ϱ

Ϭ͘ϲ

Ϭ͘ϳ

Ϭ ϱ ϭϬ ϭϱ ϮϬ

ŽŵƉƵƚĂƚŝŶdŝŵĞ΀ŵƐĞĐ΁

E,<

t>^<

,<

^,<

Figure 8: Computation time for various h (MUTAG).

Ϭ͘ϭ

Ϭ͘Ϯ

Ϭ͘ϯ

Ϭ͘ϰ

Ϭ͘ϱ

Ϭ͘ϲ

Ϭ ϱ ϭϬ ϭϱ ϮϬ

ŽŵƉƵƚĂƚŝŽŶdŝŵĞ΀ŵƐĞĐ΁

E,<

t>^<

,<

^,<

Figure 9: Computation time for various h (ENZYMES).

NHK, WLSK, HCK, and SHCK for various h for

the MUTAG, ENZYMES, and D&D datasets, respec-

tively. As shown in the ﬁgures, NHK and SHCK are

faster than HCK, and much faster than WLSK. Ad-

ditionally, the computation time of NHK, HCK, and

SHCK increases linearly when h is increased. The

reason why WLSK requires such a large amount of

computation time is that WLSK must sort the labels

of adjacent vertices and replace a string of length

|N(v)| + 1 with a string of length 1. This is espe-

cially true when h = 11 or 15 for the MUTAG dataset,

h = 8 or 14 for the ENZYMES dataset, and h = 10

or 20 for the D&D dataset. In our implementation,

this replacement is done with Java’s HashMap class,

ICPRAM 2016 - International Conference on Pattern Recognition Applications and Methods

ϭϬ

ϭϮ

ϭϰ

ϭϲ

Ϭ ϱ ϭϬ ϭϱ ϮϬ

ŽŵƉƵƚĂƚŝŽŶdŝŵĞ΀ŵƐĞĐ΁

E,<

t>^<

,<

^,<

Figure 10: Computation time for various h (D&D).

ϲϱ

ϳϬ

ϳϱ

ϴϬ

ϴϱ

ϵϬ

ϵϱ

Ϭ ϱ ϭϬ ϭϱ ϮϬ

ůĂƐƐŝĨŝĐĂƚŝŽŶĐĐƵƌĂĐǇ΀й΁

E,<

t>^<

,<

^,<;ʌсϭͿ

^,<;ϭфʌфϱͿ

Figure 11: Classiﬁcation accuracy for various h and ρ (MU-

TAG).

where a string of length |N(v)| + 1 is the hash key

and a string of length 1 is a value corresponding to

that key. Although the average degree in the evalu-

ated datasets is small, WLSK requires further com-

putation time when the average degree of the data in-

creases. HCK requires a large amount of computation

time for the D&D dataset because the number of la-

bels in the dataset is large and its computation time is

proportional to the number of labels.

5.2 Classiﬁcation Accuracy

Figure 11 shows the classiﬁcation accuracy of NHK,

WLSK, HCK, and SHCK for various h and ρ for the

MUTAG dataset. Their maximum accuracies for var-

ious h are almost the same. When h = 0, the accuracy

for SHCK (ρ = 1) is very low, because 1 or −1 (the

values in the Hadamard matrix) cannot be stored as

a one’s complement consisting of one bit. The ac-

curacy of HCK is exactly the same as that of SHCK

(1 < ρ < 5), which means that although overﬂow may

occur in SHCK, the kernel can assign identical vertex

labels to the identical subgraphs induced by a vertex

v and the vertices within h steps from v. Figure 12

shows the classiﬁcation accuracy of NHK, WLSK,

ϭϱ

ϮϬ

Ϯϱ

ϯϬ

ϯϱ

ϰϬ

ϰϱ

ϱϬ

ϱϱ

Ϭ ϱ ϭϬ ϭϱ ϮϬ

ůĂƐƐŝĨŝĐĂƚŝŽŶĐĐƵƌĂĐǇ΀й΁

E,< t>^<

,< ^,<;ʌсϭͿ

^,<;ʌсϮͿ ^,<;ʌсϯͿ

^,<;ϳфʌфϭϳͿ

Figure 12: Classiﬁcation accuracy for various h and ρ (EN-

ZYMES).

ϱϱ

ϲϬ

ϲϱ

ϳϬ

ϳϱ

ϴϬ

ϴϱ

Ϭ ϱ ϭϬ ϭϱ ϮϬ

ůĂƐƐŝĨŝĐĂƚŝŽŶĐĐƵƌĂĐǇ΀й΁

E,<

t>^<

,<

^,<;ʌсϭͿ

^,<;ʌсϮͿ

Figure 13: Classiﬁcation accuracy for various h and ρ

(D&D).

HCK, and SHCK for various h and ρ for the EN-

ZYMES dataset. The accuracy of WLSK is slightly

superior to those of HCK and SHCK (ρ = 2, ρ = 3,

and 7 < ρ < 17), and their accuracies are much supe-

rior to those of NHK and SHCK (ρ = 1). The perfor-

mance of HCK is exactly the same as that of SHCK

for high ρ (7 < ρ < 17) and almost the same of that

of SHCK for low ρ (ρ = 2 and ρ = 3). The max-

imum accuracy of WLSK is 53.0%, while the max-

imum accuracy of both HCK and SHCK (ρ = 3, 4,

and 7 < ρ < 17) is 51.3%. The reason why the accu-

racy of WLSK is slightly superior to that of HCK is

that ℓ

(h)

(v) contains information on the distribution of

labels at h steps from v, while ℓ

(h)

(v) contains infor-

mation on the distribution of all labels within h steps

from v. Although the latter distribution can be ob-

tained from the former distribution, the former dis-

tribution cannot be obtained from the latter distribu-

tion. Therefore, WLSK is more expressive than HCK

Hadamard Code Graph Kernels for Classifying Graphs

t>^<

E,<

,< ^,<

ĨĂƐƚ

ĂĐĐƵƌĂƚĞ

Figure 14: Qualitative performance of evaluated graph ker-

nels.

and SHCK. When ρ is increased up to 16, the length

of a bit string to store the ﬁrst element of ℓ

(h)

(v) is

L − ρ × 2

⌈log

|Σ|⌉

= 64 − 16 × 2

⌈log

3⌉

= 0. Even in

this case, the accuracy of SHCK is equivalent to that

of HCK, which means that the overﬂow of the ﬁrst

element of ℓ

(h)

(v) has absolutely no impact on classi-

ﬁcation accuracy. Figure 13 shows the classiﬁcation

accuracy of NHK, WLSK, HCK, and SHCK for var-

ious h and ρ for the D&D dataset. All accuracies ex-

cept for that of SHCK (ρ = 1) are almost equivalent.

6 CONCLUSION

In this paper, we proposed graph kernels based on the

Hadamard code to classify graphs. Figure 14 presents

a qualitative description of the performance of graph

kernels in terms of computation time and classiﬁca-

tion accuracy. These experimental results show that

the proposed graph kernel SHCK is fast and accurate.

REFERENCES

Borgwardt, Karsten M., Cheng, Soon Ong, Schonauer, Ste-

fan, Vishwanathan, S. V. N., Smola, Alex J., and

Kriegel, Hans-Peter. 2005. Protein Function Predic-

tion via Graph Kernels. Bioinfomatics 21 (suppl 1):

47–56.

Chang, Chih-Chung, and Lin, Chih-Jen. 2001. LIBSVM: A

library for support vector machines. Available online

at http://www.csie.ntu.edu.tw/cjlin/libsvm.

Cristianini, Nello, and Shawe-Taylor, John. 2000. An

Introduction to Support Vector Machines and Other

Kernel-based Learning Methods. Cambridge Univer-

sity Press.

Debnath, Asim Kumar, Lopez de Compadre, Rosa L., Deb-

nath, Gargi, Shusterman, Alan J., and Hansch, Cor-

win. 1991. Structure-Activity Relationship of Mu-

tagenic Aromatic and Heteroaromatic Nitro Com-

pounds. Correlation with Molecular Orbital Energies

and Hydrophobicity. Journal of Medicinal Chemistry

34: 786–797.

Dobson, Paul D., and J. Doig, Andrew. 2003. Distinguish-

ing Enzyme Structures from Non-enzymes Without

Alignments. Journal of Molecular Biology 330(4):

771–783.

Garey, Michael R., and Johnson, David S.. 1979. Com-

puters and Intractability: A Guide to the Theory of

NP-Completeness. W.H. Freeman.

Hido, Shohei, and Kashima, Hisashi. 2009. A Linear-Time

Graph Kernel. In Proc. of the International Confer-

ence on Data Mining (ICDM). 179–188.

Kashima, Hisashi, Tsuda, Koji, and Inokuchi, Aki-

hiro. 2003. Marginalized Kernels Between Labeled

Graphs. In Proc. of the International Conference on

Machine Learning (ICML). 321–328.

Shervashidze, Nino, Schweitzer, Pascal, Jan van Leeuwen,

Erik, Mehlhorn, Kurt, and Borgwardt, Karsten M..

2011. Weisfeiler-Lehman Graph Kernels. Journal of

Machine Learning Research (JMLR): 2539–2561.

Sch¨olkopf, Bernhard, and Smola, Alexander J.. 2002.

Learning with Kernels. MIT Press.

Sch¨olkopf, Bernhard, Tsuda, Koji, and Vert, Jean-Philippe.

2004 Kernel Methods in Computational Biology. MIT

Press.

Schomburg, Ida, Chang, Antje, Ebeling, Christian, Gremse,

Marion, Heldt, Christian, Huhn, Gregor, and Schom-

burg, Dietmar. 2004. BRENDA, the Enzyme

Database: Updates and Major New Developments.

Nucleic Acids Research 32D: 431–433.

Vinh, Nguyen Duy, Inokuchi, Akihiro, and Washio,

Takashi. 2010. Graph Classiﬁcation Based on Opti-

mizing Graph Spectra. In Proc. of the International

Conference on Discovery Science. 205–220.

ICPRAM 2016 - International Conference on Pattern Recognition Applications and Methods