nous groups (called clusters). Decision trees may then
be built from each cluster.
In this work, we present a method that adapts the
clusters obtained from any clustering method, accord-
ing to the simplicity of the resulting decision trees.
The rest of this article is organized as follows.
Firstly, related works about simplification of decision
trees are discussed. Then, the presented method is de-
scribed in details. Finally, a prototype is presented,
and the results of experiments are discussed.
2 RELATED WORKS
Several methods have been proposed to simplify de-
cision trees obtained from data, with a minimal im-
pact on the decision trees accuracy (Breslow and Aha,
1997).
Firstly, pruning is a well-known solution to sim-
plify decision trees (Quinlan, 1987; Breslow and Aha,
1997). This technique removes the parts of the trees
which have a low explicative power (i.e. explaining
too few elements or with a high error-rate). More
specific pruning techniques allow to simplify decision
trees according to visual concerns: for instance, a re-
cent work has proposed a pruning technique which is
constrained by the dimensions of the produced deci-
sion tree (Stiglic et al., 2012), and another work has
described an algorithm to build decision tree with a
fixed depth (Farhangfar et al., 2008).
Secondly, decision tree simplification can be done
by working directly on the data, by using preprocess-
ing operations like feature selection and discretization
(Breslow and Aha, 1997). As these operations tend to
simplify the dataset (in term of dimensionality, num-
ber of possible values, etc.), they can also help to re-
duce the complexity of the associated decision tree (at
the expense of accuracy): this idea has been used in a
recent work (Parisot et al., 2013a).
Finally, clustering is a useful technique in data
mining, but it is also a promising tool in the context
of the visual analysis of data (Keim et al., 2008). A
priori, by splitting the data into homogenous groups,
it can be used to obtain simple decision trees. How-
ever, it is not always the case in practice. In fact, var-
ious methods of clustering exist (hierarchical, model-
based, center-based, search-based, fuzzy, etc.) (Gan
et al., 2007), but they often optimize a distance based-
criterion, with no account of the complexity of the
decision trees which are obtained from the clusters.
As a consequence, a recent solution has been pro-
posed to obtain a simple decision tree from each clus-
ter (Parisot et al., 2013b). Nevertheless, the algorithm
does not take into account the similarity between el-
ements and the dissimilartity between clusters: it is
not comparable to classic clustering results (obtained
with k-means, for example), and the results are hard
to interpret.
In this paper, we propose a solution to preserve in-
terpretability, by using existing clustering results as a
starting point, adapting these for simpler cluster-wise
decision trees, while maintaining good cluster quality
metrics.
3 CONTRIBUTION
In this section, we present a method to modify a clus-
tering result in order to simplify the decision tree spe-
cific to each cluster. In addition, the method guaran-
tees that the new clustering result is close to the initial
clustering result.
3.1 Adaptating a Clustering Result
The input of the proposed method is an initial clus-
tering result, which can be computed with any exist-
ing technique (k-means, EM, etc.) (Gan et al., 2007).
This result is then modified by an algorithm, which is
the core of our contribution.
Figure 2: Clustering adaptation method.
In this work, we consider that adapting a cluster-
ing result C
1
,...,C
n
amounts to move elements from
C
i
to C
j
(i 6= j). In addition, we consider that finding
the cluster count, which is a complex problem (Wag-
ner and Wagner, 2007), is managed during the cre-
ation of the initial clustering result. Therefore, the
method does not modify the cluster count during the
clustering adaptation (in other words, no cluster is
created and/or deleted during the process).
3.2 Comparing with the Initial
Clustering Result
In order to guarantee that the modified clustering re-
sult is close to the initial clustering result, we use
DataVisualizationusingDecisionTreesandClustering
81