ON CERTAIN GROUP INVARIANT MERCER KERNELS
Bernd-Jürgen Falkowski
University of Applied Sciences Stralsund, Department of Economics
Zur Schwedenschanze 15, D-18435 Stralsund, Germany
Key words: Mercer Kernels, Group Invariance, Radial Functions
Abstract: For the construction of support vector machines Mercer Kernels are of considerable importance. Since the
conditions of Mercer’s theorem are hard to verify in general, a systematic (constructive) description of Mer-
cer kernels which are invariant under a transitive group action is provided. As an example kernels on
Euclidean space invariant under the Euclidean motion group are treated. En passant a minor but confusing
error in a seminal paper due to Gangolli is rectified. In addition an interesting relation to radial basis func-
tions is exhibited.
1 INTRODUCTION
In recent years support vector machines (SVMs), cf.
e.g. (Cristianini et al., 2000; Vapnik, 1998), for the
theoretical background and (Shashua and Levin,
2002; Shashua and Levin, 2003; Vapnik, 1998) for
practical applications, have received much attention.
In this context Mercer kernels, cf. e.g. (Cristianini et
al., 2000), p. 35 for Mercer’s theorem, which are
important building blocks of such machines, have
frequently been used. These kernels determine (im-
plicitly) the feature maps of SVMs and hence their
separation capability, cf. (Cover, 1965). Thus it
seems somewhat surprising that, apart from three
basic types of kernels, cf. e.g. (Haykin, 1999), p.
333, and some construction rules, cf. e.g. (Cristianini
et al., 2000), pp. 42-44, (Shawe-Taylor and Cristi-
anini, 2004), p. 75-76, very little appears to be
known about such kernels amongst Neural Network
researchers. This is all the more surprising since the
conditions of Mercer’s theorem are not easily verifi-
able in general.
Hence it seems worthwhile to apply some
mathematical results which have, in essence, been
known for quite some time, to provide a complete
description of all Mercer kernels which are invariant
under a transitive group action. This is all the more
the case since transitive group actions include the
group of proper rigid motions in Euclidean space
(this, of course, being of interest for practical appli-
cations, cf. (Schölkopf et al., 1999), p. 339 and p.
349). As an interesting consequence the important
role of radial basis functions is seen to result from
the invariance property of the kernels. En passant a
minor but confusing error in (Gangolli, 1967) is
rectified.
The basic mathematical results that are of inter-
est here mainly stem from the seminal works of
(Gangolli, 1967) and (Parthasarathy and Schmidt,
1972) and also from (Falkowski, 1986) (amongst the
Neural Network community they seem to be little-
known; even Wahba in (Wahba, 1999) does not
mention them).
Being intimately connected to an abstract version
of the Levy-Khinchine formula they have previously
chiefly been employed to derive the structure of
Mercer kernels that are described by infinitely di-
visible positive definite functions., cf. (Gangolli,
1967; Falkowski, 2001; Falkowski, 2003). In this
paper, however, the transitivity of the group action is
essential. The reader is invited to consult Minsky’s
corresponding results in (Minsky and Papert, 1990),
(the group invariance theorem, p. 48 and theorem
2.4. p. 53) where finite groups are considered.
Note that only continuous kernels will be treated
here. For details and further background information
the reader is referred to (Bishop, 2006; Falkowski,
1986; Gangolli, 1967; Parthasarathy and Schmidt,
1972). For technical convenience all kernels consid-
ered here will be complex-valued in general since
this greatly simplifies the technical problems con-
cerning unitary representations of the symmetry
groups.
517
Falkowski B. (2009).
ON CERTAIN GROUP INVARIANT MERCER KERNELS.
In Proceedings of the International Joint Conference on Computational Intelligence, pages 517-521
DOI: 10.5220/0002277805170521
Copyright
c
SciTePress
2 BASIC FACTS
In order to be able to proceed with the relevant com-
putations some basic definitions are needed.
2.1 Definition
Given a topological space X and a continuous func-
tion K: X×X C (the complex numbers), K is
called a positive definite (p.d.) kernel on X×X if it
satisfies
a) K(x,y) = , 
for arbitrary x, y X
b) Given x
1
, x
2
, …, x
n
X, K(x
i
,x
j
) is p.d.
as a matrix
Hence, by remark 3.7 in (Cristianini et al., 2000), p.
35, a Mercer kernel is just a real-valued positive
definite kernel.
2.1.1 Example
If X is a complex Hilbert space with inner product
denoted by <.. , .>, then the kernel K defined by
K(x,y):= <x, y> for arbitrary x, y X
is positive definite. Note though that in general the
space X is not required to carry a vector space struc-
ture (although for many interesting examples it
does).
2.1.2 Example
Suppose that a p.d. kernel K and a polynomial p
with positive coefficients are given. Then p(K) is
also a p.d. kernel.
Proof (sketch): Linear combinations of p.d. kernels
with positive coefficients are obviously p.d. It re-
mains to show that products of p.d. kernels are again
p.d. This may be achieved by considering p.d. matri-
ces A:= [a
ij
] and B:= [b
ij
] as covariance matrices of
two independent normally distributed random vari-
ables X:= [X
1
, X
2
, …, X
n
] and Y:= [Y
1
, Y
2
, …, Y
n
]
with mean vector zero. Then the matrix C:= [a
ij
*b
ij
]
is the covariance matrix of Z:= [X
1
*Y
1
, X
2
*Y
2
, …,
X
n
*Y
n
] and hence p.d. Q.E.D.
2.1.3 Example
Suppose that f is the characteristic function (Fourier
transform) of a probability measure on the real line,
then the kernel K(t
1
,t
2
):= f(t
2
-t
1
) is well-known to be
positive definite and f is called a positive definite
function.
For further examples and the explicit relation of
kernels to feature maps the reader is referred to
(Shawe-Taylor and Cristianini, 2004), pp. 47-84.
Clearly, however, in this way no systematic descrip-
tion is obtained.
With the aim to get more explicit information the
invariance condition under a transitive group action
is introduced.
2.2 Definition
Let G be a topological Group with identity e and X
be a topological space, as before. G is said to act
continuously on X if
1. for every fixed g G, the map x
gx
is a bijection.
2. ex = x for all x X.
3. g
1
(g
2
x) = (g
1
g
2
)x for all g
1
, g
2
G , x
X.
4. (g, x)
gx is continuous.
5. for every fixed g G, the map x gx
is a homeomorphism of X.
The action is transitive if for every x, y X there
exists a g G such that gx = y.
A p.d. kernel K is said to be invariant under G if
K(gx,gy) = K(x,y) for all gG.
The following theorem describes G invariant
kernels in the sense of definition 2.2 in terms of
unitary representations of G.
2.3 Theorem
Let X be a topological space and let G be a group
acting continuously on it. Suppose that K is a p.d.
kernel on X×X invariant under G.
Then there exists a complex Hilbert space H and a
weakly continuous unitary representation g U(g)
of G in H (i.e. U(g
1
g
2
) = U(g
1
)U(g
2
) and the map g
<U(g
)v
1
,v
2
> is continuous for every v
1
, v
2
H)
and a continuous map v: X H such that the vec-
tors v(x) span H and
1. K(x,y) = <v(x),v(y)>
2. v(gx) = U(g) v(x)
Proof (rough sketch): This is essentially a conse-
quence of the Kolmogorov consistency theorem. A
detailed proof is provided in (Parthasarathy and
Schmidt, 1972), theorems 1.2 and 2.7 as well as
remark 2.8 Q.E.D.
3 INVARIANT KERNELS
Suppose now that G acts transitively on X and
choose a fixed x
0
X. Further let
G(x
0
):= {g G| gx
0
= x
0
}
be the stability subgroup of x
0
in G.
IJCCI 2009 - International Joint Conference on Computational Intelligence
518
3.1 Lemma
The map gx
0
gG(x
0
) defines a bijection between
X and the space of left cosets G/G(x
0
).
Proof: Simple computation. Q.E.D.
In addition the G-action on X corresponds to the
following G-action on G/G(x
0
)
g
1
(gG(x
0
)):= (g
1
g)G(x
0
).
Thus an invariant kernel K on X may equally
well be considered as an invariant kernel K’ on
G/G(x
0
) if the G-action is transitive. Note also that
any function on G/G(x
0
) may be considered as a
function on G that is constant on left cosets (by
simply assigning the value of a left coset to all the
elements in that coset) and vice versa. Thus one
obtains the following theorem.
3.2 Theorem
Suppose that f is a positive definite function on G
(i.e. the kernel H(g
1
,g
2
):= f(g
2
-1
g
1
) is positive defi-
nite) that is bi-invariant under G(x
0
) (i.e. f(k
1
gk
2
) =
f(g) for all k
1
,k
2
G(x
0
)). Then the kernel K defined
by
K(g
1
x
0
,g
2
x
0
):= f(g
2
-1
g
1
)
is positive definite and invariant under G. Moreover
every positive definite kernel on X invariant under G
is of this form.
Proof: ()
(i) K is well defined
Suppose g
1
x
0
= g
1
’x
0
and g
2
x
0
= g
2
’x
0
. Then g
2
’ =
g
2
k
1
and g
1
’ = g
1
k
2
for some k
1
, k
2
G(x
0
). Hence
f(g
2
-1
g
1
’) = f(k
1
-1
g
2
-1
g
1
k
2
) = f(g
2
-1
g
1
) by bi-
invariance of f under G(x
0
).
(ii) The positive definiteness of K follows from the
positive definiteness of f.
(iii) The G- invariance of K is immediate from the
definitions.
()
From theorem 2.3 the following holds
K(x,y) = K(g
1
x
0
,g
2
x
0
) for suitable g
1
, g
2
by transitivity
= <v(g
1
x
0
),v(g
2
x
0
)>
= <U(g
2
-1
g
1
)v(x
0
),v(x
0
)>
= f
1
(g
2
-1
g
1
) say.
Now clearly f
1
is a positive definite function on G
which is bi-invariant under G(x
0
). Q.E.D.
Corollary to 3.2
If G(x
0
) is a normal subgroup of G, then every G-
invariant p.d. kernel on X is given via a p.d. function
on G/G(x
0
).
Proof: Since G(x
0
) is normal G/G(x
0
) carries a group
structure with multiplication ° given by
g
1
G(x
0
) ° g
2
G(x
0
) := g
1
g
2
G(x
0
)
By lemma 3.1 X may be identified with the group
G/G(x
0
) and under this correspondence x
0
is mapped
to the coset G(x
0
), i.e. the identity element of the
group G/G(x
0
). So from the second part of 3.2 the
following holds (denoting by π :G G/G(x
0
) the
natural projection).
K(x,y) = K’(g
1
G(x
0
),g
2
G(x
0
)) for suitable g
1
, g
2
by
transitivity
= <v(π(g
1
)),v(π(g
2
))>
= <U(π(g
2
-1
)π(g
1
))v(G(x
0
)),v(G(x
0
))>
= f
2
(π(g
2
)
-1
π(g
1
)) say.
Note that in the above calculation the kernel cor-
responding to K after identifying X with G/G(x
0
) has
been denoted by K’, and U describes a unitary repre-
sentation of G/G(x
0
). Moreover f
2
may also be con-
sidered as a bi-invariant p.d. function on G by as-
signing constant values to the cosets (since left co-
sets are also right cosets in this case because G(x
0
) is
assumed to be normal, there is no ambiguity).
4 EXAMPLES
In order to underline the importance of the above
results some examples are provided below.
4.1 Example (G Acting on Itself)
(i) Suppose X = G and G acts on itself by left mul-
tiplication. Further let x
0
:= e, the identity element of
G. Then G(x
0
) = {e} and every p.d. kernel on G is
clearly of the form
K(g
1
,g
2
) = f(g
2
-1
g
1
)
= <U(g
2
-1
g
1
)v(e),v(e)>
This is, of course, a classical result due to Gelfand
and Raikov.
(ii) Even more can be said if G is locally compact
second countable and abelian. Then every unitary
representation U of G is the direct sum of a direct
integral (for the technical details see (Naimark,
1960)) and the trivial representation. Hence there
exist a measure space (Ω, S, μ) and a measurable
map τ:Ω G such that τ(ω) is a nontrivial character
for every ω (homomorphism into the complex unit
circle) and that
U = τ(ω) dμ(ω) I
(again, for the notation and technical details the
reader is referred to (Naimark, 1960).)
(iii) As a concrete version of (ii) above consider Ω
= R
n
(Euclidean n space) and let
ON CERTAIN GROUP INVARIANT MERCER KERNELS
519
τ(ω)(x):= e
i<x,ω>
,
dμ(ω):= [1/(2π)
n/2
] exp[-||ω||
2
/2]dω
x
0
(ω):= 1,
x
0
:= x
0
(ω)dμ(ω) (as direct integral)
and U(x):= e
i<x,ω>
dμ(ω) (as direct integral)..
Then
<U(x) x
0
, x
0
> = exp[-||x||
2
/2].
Thus from the above “abstract nonsense” the charac-
teristic function of the standard Gaussian distribu-
tion is obtained.
4.2 Example (Euclidean Motion
Group)
Here the group G of all proper rigid motions of Euc-
lidean n space is considered. Thus let SO(n) denote
the group of all proper rotations about the origin and
T the group of all translations. Then obviously G
acts transitively on R
n
. Moreover it is well-known
that every g G can uniquely be written as
g = tr where r SO(n) and t T.
Thus the Euclidean motion group is a semi-direct
product (see e.g. (Mackey, 1968), pp. 37 – 45, for
further information on semi-direct products) of
SO(n) and R
n
as follows. Suppose that the rotation
r
(i)
is represented by a matrix A
(i)
and the translation
t
(i)
is represented by a vector b
(i)
then if g
i
= t
i
r
i
g
1
g
2
x = g
1
(A
2
x + b
2
)
= A
1
A
2
x + b
1
+ A
1
b
2
Hence if an element of G is now denoted by (b, A),
the following group operation is obtained
(b
1
, A
1
) ° (b
2
, A
2
) = (b
1
+ A
1
b
2
, A
1
A
2
).
Clearly R
n
appears as a normal subgroup here (the
first component of the Cartesian product).
Thus if the point x
0
is taken to be the origin then
the stability subgroup is simply SO(n) and the quo-
tient is R
n
.
Incidentally, in (Gangolli, 1967) it is mistakenly
claimed that SO(n) is a normal subgroup. This is not
true and thus the quotient does not carry a group
structure (addition) but must be considered as space
of cosets. However, this is only confusing since the
results given in (Gangolli, 1967) are not affected by
the error.
Now theorem 3.2 (albeit because of the above
remark not the corollary) can be applied to obtain
4.2.1 Theorem
Every p.d. kernel K on R
n
that is invariant under the
Euclidean motion group G is given by a p.d. radial
function on R
n
, i.e. a function that depends only on
the Euclidean distance from the origin.
Proof: Since R
n
considered as quotient does not
carry a group structure, it has to be treated as the
space of left cosets. Nevertheless, by theorem 3.2 the
kernel K is still described by a bi-invariant p.d. func-
tion on G which must, a fortiori, be also p.d. on R
n
.
Starting with an arbitrary p.d. function f
2
on R
n
it can
be extended to G by defining it to be constant on left
cosets. However, this will not suffice to make it bi-
invariant. Indeed since (denoting the extended func-
tion by f
1
)
f
1
((b
1
, A
1
) ° (0, A
2
)) = f
1
((b
1
, A
1
A
2
)) =
f
1
((b
1
, I)) = f
1
((b
1
, A
1
)) = f
2
(b
1
)
and thus f
1
is right invariant.
However, since
f
1
((0, A
1
) ° (b
2
, A
2
)) =
f
1
((A
1
b
2
, A
1
A
2
))
one must demand for left invariance that for arbi-
trary rotations A
1
f
1
((A
1
b
2
, A
2
)) = f
1
((b
2
, A
2
)) and
this in turn leads to the requirement that f
2
must be a
p.d. radial function on R
n
, i.e. f
2
(b) = h(||b||), where h
is some other function and ||.|| denotes the length of a
vector in R
n
. Q.E.D.
4.3 Remarks
(i) It is interesting to observe that applying the inva-
riance condition in the case of the Euclidean motion
group leads to a radial basis function kernel and thus
to RBF networks. For example if one considers the
Gaussian kernel
K(x,y):= exp[-||x-y||
2
/2σ
2
],
Then the feature map φ represents the elements of
the feature space as functions in a Hilbert space by
φ(x) = K(x, .),
where the scalar product between functions is then
given by
<Σ
i
α
i
K(x
i
, .), Σ
j
β
j
K(x
j
, .)> =
Σ
i
Σ
j
α
i
β
j
K(x
i
,x
j
).
For further details see also (Shawe-Taylor and Cris-
tianini, 2004), p. 77 and p. 297.
It should also be observed that in the case of finite
permutation groups as treated by Minsky, cf.
(Minsky and Papert, 1990), p. 53, the invariance
condition leads to very severe restrictions.
(ii) It can be shown that any continuous function on
a compact interval can be approximated with arbi-
trary accuracy by a linear combination of RBFs, cf.
(Bishop, 2006), p. 299, (Powell, 1987).
(iii) From example 4.1 (ii) and theorem 4.2.1 it is
now possible to obtain a complete description of all
p.d. kernels on R
n
that are invariant under the Eucli-
dean motion group. Of course, it must be admitted
that here only complex valued kernels have been
considered because of technical convenience whilst
IJCCI 2009 - International Joint Conference on Computational Intelligence
520
generally the real valued ones will be of major inter-
est. Moreover there is the question of choosing a
suitable measure for the direct integrals. Neverthe-
less, modulo these complications the explicit de-
scription is arrived at by “radializing” the positive
definite functions on R
n
along the lines described in
(Gangolli, 1967), p. 134 for Levy Schoenberg Ker-
nels. For further explicit examples see (Gangolli,
1967; Falkowski, 2001; Falkowski, 2003). In partic-
ular in (Gangolli, 1967) several real-valued Mercer
kernels are explicitly described.
5 CONCLUSIONS
Some results from pure mathematics have been
employed to derive a detailed description of group
invariant Mercer kernels, where the group action
was assumed to be transitive. As an application a
classical theorem due to Gelfand and Raikov was
recovered. Thereafter kernels invariant under the
Euclidean motion group were considered in detail. A
complete description (modulo some technical de-
tails) was provided. Moreover it was shown that
these kernels are derived from radial functions on
R
n
. En passant a minor but confusing error in (Gan-
golli, 1967) was rectified. The connection to radial
basis function networks was explained. It seems
rather satisfying that using only invariance condi-
tions (which have also very successfully been em-
ployed in an entirely different context such as quan-
tum mechanics, cf. (Mackey, 1968) Mackey) on the
kernels such explicit results can be derived for inter-
esting practical applications, cf. (Schölkopf et al.,
1999). The author is tempted to paraphrase part of
Minsky and Papert’s remark in (Minsky and Papert,
1990), p. 241: These methods brought the feeling of
“real mathematics”. ... This is still sufficiently rare
in computer science to be significant. We are con-
vinced that respect for “real mathematics” is a pow-
erful heuristic principle, though it must be tempered
with practical judgment.
REFERENCES
Bishop,C.M.: Pattern Recognition and Machine Learning
Springer, (2006)
Cover, T.M.: Geometrical and Statistical Properties of
Systems of Linear Inequalities with Applications in
Pattern Recognition. IEEE Trans. on Electronic Com-
puters, Vol. 14, (1965), pp. 326-334
Cristianini, N.; Shawe-Taylor, J.: An Introduction to Sup-
port Vector Machines and other Kernel-Based Learn-
ing Methods. Cambridge University Press, (2000)
Falkowski, B.-J.: Levy-Schoenberg Kernels on Rieman-
nian Symmetric Spaces of Non-Compact Type. In:
Probability Measures on Groups, Ed. H. Heyer, Sprin-
ger Lecture Notes in Mathematics, Vol. 1210, (1986),
pp. 58-67
Falkowski, B.-J.: Mercer Kernels and 1-Cohomology. In:
Knowledge-Based Intelligent Information Engineering
Systems & Allied Technologies, Proceedings of KES
2001, Eds. N. Bab; L.C. Jain; R.J. Howlett, IOS Press,
(2001), pp. 461-465
Falkowski B.-J.: Mercer Kernels and 1-Cohomology of
Certain Semi-simple Lie Groups. In: Proc. of the 7th
Intl. Conference on Knowledge-Based Intelligent In-
formation and Engineering Systems (KES 2003), Eds.
V. Palade, R.J. Howlett, L.C. Jain, LNAI Vol. 2773,
Springer Verlag, (2003)
Gangolli, R.: Positive Definite Kernels on Homogeneous
Spaces. In: Ann. Inst. H. Poincare B, Vol. 3, (1967),
pp. 121-225
Haykin, S.: Neural Networks, a Comprehensive Founda-
tion. Prentice-Hall, (1999)
Mackey, G.W.: Induced Representations of Groups and
Quantum Mechanics. Benjamin, (1968)
Minsky, M.L.; Papert, S.: Perceptrons. MIT Press, (Ex-
panded Edition 1990)
Naimark, M.A. Normed Rings (English Translation).
Noordhoff, Amsterdam, (1960)
Parthasarathy, K.R.; Schmidt, K.: Positive Definite Ker-
nels, Continuous Tensor Products, and Central Limit
Theorems of Probability Theory. Springer Lecture
Notes in Mathematics, Vol. 272, (1972)
Powell, M.J.D.: Radial Basis Functions for Multivariable
Interpolation: a Review. In J.C. Cox and M.G. Cox
(Eds.) Algorithms for Approximation, Oxford Univer-
sity Press, (1987), pp. 143-167
Schölkopf, B.; Smola, A.J.; Müller, K.-R.: Kernel Princip-
al Component Analysis. In: Schölkopf, B.; Burges,
J.C.; Smola, A.J. (Eds.): Advances in Kernel Methods,
Support Vector Learning. MIT Press, (1999), pp. 327-
352
Shashua, A.; Levin, A.: Taxonomy of Large Margin Prin-
ciple Algorithms for Ordinal Regression Problems.
Technical Report 2002-39, Leibniz Center for Re-
search, School of Computer Science and Eng., the He-
brew University of Jerusalem, (2002)
Shashua, A.; Levin, A.:Ranking with Large Margin Prin-
ciple: Two Approaches, NIPS 14, (2003)
Shawe-Taylor, J.; Cristianini, N.: Kernel Methods for
Pattern Analysis. Cambridge University Press, (2004)
Vapnik, V.N.: Statistical Learning Theory. John Wiley &
Sons, (1998)
Wahba, G.: Support Vector Machines, Reproducing Ker-
nel Hilbert Spaces, and Randomized GACV. In:
Schölkopf, B.; Burges, J.C.; Smola, A.J. (Eds.): Ad-
vances in Kernel Methods, Support Vector Learning.
MIT Press, (1999), pp. 69-88
ON CERTAIN GROUP INVARIANT MERCER KERNELS
521