ters distributed by different sites/machines.
Furthermore, the structure of the case mem-
ory are directly related with the case base mainte-
nance (CBM) methods (Wilson and Leake, 2001).
The objective of the CBM is maintaining consis-
tency, preserving competence and controlling case-
base growth.
The previous presented approaches does not ad-
dress the problem of missing case features. This prob-
lem is currently addressed in CBR research field (GU
and Aamodt, 2005; Gu and Aamodt, 2006).
We propose the use of clustering techniques to im-
prove the performance of the CBR during the retrieval
phase. However, we use different approach of Yang
and Wu (Yang and Wu, 2000). We do not split the
case memory into distinct locations instead we use
clusters of links to cases. Besides that, our proposal
also deals with cases with missing features. In Sec-
tion 2 we review some clustering concepts essential
to the understanding of our proposal shown in section
3. In section 3 we present also the results of the ap-
plication of our proposal to a CBR system with 915
cases.
2 THE CLUSTERING
TECHNIQUE
Clustering techniques organizes data into groups that
are meaningful, useful or both (Tan et al., 2006). One
group of data is called a cluster, while the entire col-
lection of clusters is commonly referred to as a clus-
tering.
Two types of clustering can be considered: par-
tional and hierarchical. A clustering is hierarchical if
we permit clusters to have subclusters. In partional
clustering the data is divided into non-overlapping
clusters. Tan et al. (Tan et al., 2006) identified five
types of clusters: well-separated, prototype-based,
graph-based, density based and shared-based. In our
work we will consider prototype-based type. A set of
cases is grouped into a cluster with one representative
element. Then, in the retrieval phase the number of
similarity evaluations is reduced considerably.
There are several techniques to split the data, but
k-means and k-medoid are two of the most prominent
techniques associated to prototype-based techniques
(Tan et al., 2006). K-means defines a prototype in
terms of a centroid, while k-medoid defines defines a
prototype in terms of a medoid. The medoid is one
element of the cluster while the centroid is the mean
of the cluster. In our work we use the k-medoid tech-
nique. So each cluster is represented by the most rep-
resentative case among all cases in the group. There
are also different proposals to measure similarity be-
tween data: the Euclidean and cosine distance are the
most used similarity measures. The similarity mea-
sure is used whenever a new case has to be added to
the case memory. Naturally the updated cluster need
to update its prototype.
3 THE APPLICATION OF
CLUSTERING TECHNIQUES
TO THE CBR RETRIEVAL
PHASE
Our proposal, shown in figure 1, has two levels of in-
formation. The first level is formed by a set of links
to the case memory and the second level is the case
memory database. The case links are paths to cases
memory. And the clustering technique is applied to
case links information. The first level of information
requires a low amount of storage space however de-
creases the waiting time of the retrieval process. We
do not considered the division of the database case
memory because it is useful to access a case from
different ways. The figure 1 illustrates the storage
scheme and we can see that groups of clusters are the
interface between CBR process and the database of
cases.
Group of clusters
Case Links
1
Case Memory
Group of clusters
Case Links
2
Group of clusters
Case Links
n
Retrieval Process
Reusing
Revision
Process 1
…
Process 2
Process n
Retention
...
Figure 1: CBR system structure.
Each group has clusters of links to cases. And
each cluster, as shown in table 1, has a reference to
the medoid of the cluster and links to a set of cases
that constitute the cluster.
Each Group of clusters is identified by a binary ar-
ray codification. The binary codification scheme fol-
lows the proposal of Kolodner, table 2, who defines
that a case is formed by a Problem and by a Solu-
tion. And the Problem consists in Objective and a set
IMPROVING CASE RETRIEVAL PERFORMANCE THROUGH THE USE OF CLUSTERING TECHNIQUES
451