measures are applied, requires careful consideration
as the success of these algorithms relies heavily on the
choice of the proximity function ((Luxburg, 2007),
(Bach and Jordan, 2003), (Everitt, 1980)).
Most of the previous studies on the spectral clus-
tering algorithm use the Euclidean distance mea-
sure, a distance measure based on linear differences,
to construct the similarity matrix for numeric fea-
ture type ((Shi and Malik, 2000), (Ng et al., 2001),
(Verma and Meila, 2001)) without explicitly stating
the consequences of selecting the distance measure.
However, there are several different proximity mea-
sures available for numeric variable types. Each of
them has their own strengths and weaknesses. To
our knowledge, no in-depth evaluation of the perfor-
mance of these proximity measures on spectral clus-
tering algorithms, specifically showing that the Eu-
clidean distance measure outperforms, has been car-
ried out. As such, an evaluation and an exploratory
study that compares and analyzes the performance of
various proximity measures may potentially provide
important guideline for researchers when selecting a
proximity measure for future studies in this area. This
paper endeavors to evaluate and compare the perfor-
mance of these measures and to imply the conditions
under which these measures may be expected to per-
form well.
This paper is organized as follows. In Section 2,
we discuss the two spectral clustering algorithms that
we used in our experiment. Section 3 presents an
overview of several proximity measures for numeric,
and mixed variable types. This is followed by Section
4, where we present our experimental approach and
evaluate and analyze the results obtained from our ex-
periments. We conclude the paper in Section 5.
2 SPECTRAL CLUSTERING
Spectral clustering algorithms originated from the
area of graph partitioning and manipulate the eigen-
value(s) and eigenvector(s) of the similarity matrix to
find the clusters. There are several advantages, when
compared to other cluster analysis methods, to ap-
plying the spectral clustering algorithms ((Luxburg,
2007), (Ng et al., 2001), (Aiello et al., 2007), (Fis-
cher and Poland, 2004)). Firstly, the algorithms do
not make assumption on the shape of the clusters. As
such, while spectral clustering algorithms may be able
to find meaningful clusters with strongly coherent ob-
jects, algorithms such as K-means or K-medians may
fail to do so. Secondly, the algorithms do not suffer
from local minima. Therefore, it may not be neces-
sary to restart the algorithm with various initialization
options. Thirdly, the algorithms are also more stable
than some algorithms in terms of initializing the user-
specific parameters (i.e. the number of clusters). As
such, the user-specific parameters may often be esti-
mated accurately with the help of theories related to
the algorithms. Prior studies also show that the al-
gorithms from this group thus often outperform tra-
ditional clustering algorithms, such as, K-means and
Single Linkage (Luxburg, 2007). Importantly, the al-
gorithms from the spectral family are able to handle
different types of data (i.e. numeric, nominal, binary,
or mixed) and one only needs to convert the dataset
into a similarity matrix to be able to apply this algo-
rithm on a given dataset (Luxburg, 2007).
The spectral clustering algorithms are divided into
two types, namely recursive algorithms and multi-
way algorithms (Verma and Meila, 2001). In this
paper, we consider two algorithms, one from each
group. From the first group, we select the normal-
ized cut spectral clustering algorithm as this algorithm
proved to have had several practical successes in a va-
riety of fields (Shi and Malik, 2000). We refer to this
algorithm as SM (NCut) in the remainder of the pa-
per. The Ng, Jordan and Weiss algorithm is an im-
provement to the algorithm proposed by Meila and
Shi(Meila and Shi, 2001) and therefore, we select this
algorithm (refer to as NJW(K-means)) from the sec-
ond group. In the following section we present several
algorithm-specific notations before we discuss the al-
gorithms themselves.
2.1 Notations
Similarity Matrix or Weight Matrix, W . Let W
be an N × N symmetric, non-negative matrix where
N is the number of objects in a given dataset. Let i
and j be any two objects in a given dataset, located
at row i and row j, respectively. If the similarity (i.e.
calculated from a proximity measure) between these
two objects is w
i, j
, then it will be located at the cell at
row i and column j in the weight matrix.
Degree Matrix, D. Let d be an N × 1 matrix with
d
i
=
∑
n
j=1
w
i, j
as the entries which denote the total
similarity value from object i to the rest of the objects.
Therefore, the degree matrix D is an N × N diagonal
matrix which contains the elements of d on its main
diagonal.
Laplacian Matrix, L. The Laplacian matrix is con-
structed from the weight matrix W and the degree ma-
trix D. The main diagonal of this matrix is always
non-negative. In graph theory, the eigenvector(s) and
eigenvalue(s) of this matrix contain important infor-
A COMPARATIVE EVALUATION OF PROXIMITY MEASURES FOR SPECTRAL CLUSTERING
31