to be applied before clustering. We assume that X
i
has been normalized. Specifically, in our case (i.e.,
when d = 3), µ
i
= (µ
i1
,µ
i2
,µ
i3
)
⊤
corresponds to mean
value of ith center of designed territory and X
j
=
(x
j1
,x
j2
,x
j3
)
⊤
is the vector consisting of the standard-
ized loss cost x
j1
, latitude x
j2
and longitude x
j3
of
the jth FSA, and w
d
is the weight applied to dth di-
mension of data variable. In this work, without loss
of generality, we take w
2
=w
3
=1 and we allow w
1
to
take different values. The idea is to define a relativ-
ity measure between loss cost and geographical loca-
tion as w
1
. When w
1
=1, the loss cost is deemed to be
as important as geographical information, while when
w
1
takes a value greater (less) than 1 the loss cost is
more (less) important than geographical information
in a clustering.
One can also use K-medoid clustering instead of
K-mean. The major difference between these two
approaches is estimate of the center of each cluster.
The K-mean clustering determines each cluster’s cen-
ter using the arithmetics means of each data charac-
teristic, while the K-medoid clustering uses the actu-
ally data points in a given cluster as the center. For
our clustering problem, it does not make any essen-
tial difference, which clustering method is selected,
as we aim for grouping only. Similarly, the hierar-
chical clustering, which seeks to build a hierarchy of
clusters, can also be considered.
2.2 Spatially Constrained Clustering
The K-mean or K-medoid clustering does not neces-
sarily lead to clustering results that satisfy the cluster
contiguity requirement. In this case, spatially con-
strained clustering is needed as all clusters are re-
quired to be spatially contiguous. We start from an
initial clustering. We assume that each cluster from
the initial clustering will contain only a few non-
contiguous points, and we just need to re-allocate
these points following an initial clustering. To re-
allocate these non-contiguous points, we first iden-
tify them, and then re-allocate them to the closest
(minimal-distance) point within a contiguous clus-
ter. In order to implement this allocation of non-
contiguous points, we propose an approach that is
based on Delaunay triangulation (Recchia, 2010;
Renka, 1996). In mathematics, a Delaunay triangu-
lation for a set P of points in a plane is a triangulation,
denoted by DT(P), such that no point in P is inside the
circumcircle of any triangle in DT(P). If a cluster P is
in DT(P) and DT(P) forms a convex hull (Preparata
and Hong, 1977), the clustering then satisfies the con-
tiguity constraint. In order to construct a DT, we pro-
pose the following procedure:
1. We first do K-mean clustering as an initial cluster-
ing.
2. Based on the obtained clustering results from the
previous step, we find all points that are entirely
surrounded by points from other clusters.
3. We then find the neighboring point at minimal dis-
tance to the point that has no neighbors in the
same cluster. We called the associated cluster as a
new cluster.
4. The points that have no neighbors are then reallo-
cated to new clusters.
It is possible that the reallocated points may still be
isolated, thus this entire routine should be iterated un-
til we find that no such isolated point exists. Note that
this implementation is purely based on algorithm we
develop and the boundary created for each cluster is
often not corresponding to the geographical bound-
ary of each basic rating unit. However, based on this
results, one should be able to further refine them to
ensure that the boundary of cluster is determined by
the boundary of FSAs.
2.3 Choice of the Number of Clusters
In data clustering, the number of clusters needs to be
determined first. In this work, the number of clus-
ters represents the number of territories. Finding opti-
mal number of clusters becomes especially challeng-
ing in high dimensional scenarios where visualiza-
tion of data is difficult. In order to be statistically
sound, several methods including average silhouette
(Rousseeuw, 1987) and gap statistic (R. Tibshirani
and Hastie, 2001) have been proposed for estimating
the number of clusters. The silhouette width of an
observation i is defined as
s(i) =
b(i) − a(i)
max{a(i),b(i)}
, (3)
where a(i) is the average distance between i and all
other observations in the same cluster and b(i) is the
minimum average distance between i to other obser-
vations in different clusters. Observations with large
s(i) (almost 1) are well-clustered, observations with
small s(i) (around 0) tend to lie between two clus-
ters and observations with negative s(i) are probably
placed in a wrong cluster.
Varying the total number of clusters from 1 to the
maximum total number of clusters K
max
, the observed
data can be clustered using any algorithm including
K-mean. Next average silhouette can be used to esti-
mate the number of components. For a given number
of clusters K, the overall average silhouette width for
ICPRAM 2017 - 6th International Conference on Pattern Recognition Applications and Methods