
2 K-MEANS ALGORITHM 
The k-means algorithm is a partitioning clustering 
algorithm. The  k -means algorithm is very simple 
and most popular clustering algorithm. The k-means 
algorithm is a  squared error-based clustering 
algorithm. 
The k-means is given by MacQueen 
(MacQueen,1967) and aim of this clustering 
algorithm is to divide the dataset into disjoint 
clusters by optimizing an objective function that is 
given below 
  Optimize 
∑∑
=∈
=
k
icix
mixdE
1
),(
             (1) 
Here m
i
 is the center of cluster C
i
 , while d(x,m
i
 ) 
is the euclidean distance between a point x and 
cluster center m
i
. In k-means algorithm, the 
objective function E attempts to minimize the 
distance of each point from the cluster center to 
which the point belongs.  
Consider the data set with ‘n’ objects ,i.e.,   
S = {x
i
 : 1 ≤ i ≤ n}. 
1) Initialize k-partitions randomly or based on some 
prior knowledge.  
i.e. { C
1
 , C
2
 , C
3 
, …….., C
k
 }. 
2) Calculate the cluster prototype matrix M (distance 
matrix of distances between k-clusters and data 
objects) .  
M = { m
1
 , m
2 
, m
3
, …….. , m
k
 } where m
i
 is a 
column matrix 1 × n . 
3)Assign each object in the data set to the nearest 
cluster - Cm  i.e. x
 j
 ∈C
m
 if d(x 
j
 ,C
m
 )  ≤ d(x 
j
 ,C
i
 ) ∀ 
1 ≤ j ≤ k , j ≠m  where j=1,2,3,…….n.  
4) Calculate the average of cluster elements of each 
cluster and change the k-cluster centers by their 
averages. 
5) Again calculate the cluster prototype matrix M. 
6) Repeat steps 3, 4 and 5 until there is no change 
for each cluster.  
3 PAM ALGORITHM 
The purpose for the partitioning of a data set into k 
separate clusters is to find groups whose members 
show a high degree of similarity among themselves 
but dissimilarity with the members of other groups. 
The objective of PAM(Partitioning Around 
Medoids) (Kaufman,1990) is to determine a 
representative object (medoid) for each cluster, that 
is, to find the most centrally located objects within 
the clusters. Initially a set of k-items is taken to be 
the set of medoids. Then, at each step, all objects 
from the input dataset that are not currently medoids 
are examined one by one if they should be 
medoids.That is the algorithm determines whether 
there is an object that should replace one of the 
existing medoids . Swapping of medoids with other 
non-selected objects is based on the value of total 
cost of impact T
ih
 .The PAM represents a cluster by 
a medoid so PAM is also known as k-medoids 
algorithm. 
The PAM algorithm consists of two parts. The first 
build phase follows the following algorithm: 
Phase-1:  
Consider an object i as a candidate.Consider another 
object j that has not been selected as a prior 
candidate. Obtain its dissimilarity d
j
 with the most 
similar previously selected candidates. Obtain its 
dissimilarity with the new candidate i. Call this   d(j; 
i): Take the difference of these two dissimilarities. 
1)  If the difference is positive, then object j 
contributes to the possible selection of i. 
Calculate C
ji 
= max (d
j
 - d(j; i); 0)  where d
j
 
– Euclidian distance between j
th
 object and 
most similar previously selected candidate 
and d(j; i) – Euclidian distance between j
th
 
and i
th
 object . 
2)  Sum Cji over all possible j. 
3)     Choose the object i that maximizes the sum  
       of C
ji 
over all possible j. 
4)  Repeat the process until k objects have been 
found. 
Phase-2: 
The second step attempts to improve the set of 
representative objects. This does so by considering 
all pairs of objects (i; h) in which i has been chosen 
but h has not been chosen as a representative. Next it 
is determined if the clustering results improve if 
object i and h are exchanged. To determine the 
effect of a possible swap between i and h we use the 
following algorithm: 
Consider an object j that has not been previously 
selected. We calculate its swap contribution C
jih
: 
1) If j is further from i and h than from one of the 
other representatives, set C
jih
 to zero.  
2) If j is not further from i than any other 
representatives (d(j;i)=d
j 
), consider one of two 
situations: 
a) j is closer to h than the second closest 
representative & d(j; h) < E
j
 where E
j
 is the 
Euclidian distance of between j
th
 object and the 
second most similarly representative . Then   
C
jih
 = d(j; h)-d(j; i).  
Note: C
jih
 can be either negative or positive 
depending on the positions of j, i and h. Here only if 
ICSOFT 2008 - International Conference on Software and Data Technologies
256