When a discretization process is to be developed,
four iterative stages must be carried out, (Liu et al.,
2002):
1. The values in the database of the continuous at-
tributes to be discretized are ordered.
2. The best split point for partitioning attribute do-
mains in the case of top-down methods is found,
or the best combination of adjacent partitions for
bottom-up methods is found.
3. If the method is top-down, once the best split
point is found, the domain of each attribute is di-
vided into two partitions, and when the method is
bottom-up, both partitions are merged.
4. Finally, we check whether the stopping criterion
is fulfilled, and if so the process is terminated.
In this general discretization process we have dif-
ferentiated between top-down and bottom-up algo-
rithms. However, there are more complex taxonomies
for the different methods of discretization such as that
presented in (Liu et al., 2002) and which are shown
here:
• Supervised or non-supervised. Non-supervised
methods are those based solely on continuous at-
tribute value in order to carry out discretization,
whereas supervised ones use class value to dis-
cretize continuous attributes, so that they are more
or less uniform with regard to class value.
• Static or dynamic. In both types of methods it
is necessary to define a maximum number of in-
tervals and they differ in that static methods seek
to divide each attribute in partitions sequentially,
whereas dynamic ones discretize domains by di-
viding all the attributes into intervals simultane-
ously.
• Local or Global. Local methods of discretization
are those which use algorithms such as C4.5 or its
successor C5.0, (Quilan, 1993), and they are only
applied to specific regions in the database. On the
other hand, global methods are based on the whole
database to carry out discretization.
• Top-down or Bottom-up. Top-down methods
begin with an empty list of split points and add
them as the discretization process finds intervals.
On the other hand, bottom-up methods begin with
a list full of split points and eliminate points dur-
ing the discretization process.
• Direct or Incremental. Direct methods divide
the dataset directly into k intervals. Thereforethey
need an external input determined by the user to
indicate the number of intervals. Incrememental
methods begin with a simple discretization and
undergo an improvement process. For this rea-
son they need a criterion to indicate when to stop
discretizing.
In addition to the taxonomy exposed, from an-
other viewpoint we consider discretization methods
can also be classified according to the type of parti-
tions constructed, crisp or fuzzy partitions.
Thus, in the literature we find some algorithms
that generate crisp partitions. Among these, in (Holte,
1993) describes a method that performs crisp intervals
taken as a measure the amplitude or frequency, which
need to fix a k number of intervals. Also, (Holte,
1993) describes other method, called R1, which needs
to have a fixed number of k intervals, but in this case,
the measure which used is the class label. Another
method that constructs crisp partitions, D2, is de-
scribed in (Catlett, 1991), where the measure used is
entropy.
On the other hand, we find methods which dis-
cretize continuous values in fuzzy partitions, in this
case, these methods use decision trees, clustering al-
gorithms, genetic algorithms, etc. So, in (Kbir et al.,
2000) a hierarchical fuzzy partition based on 2
|A|
-tree
decomposition is carried out, where |A| is the num-
ber of attributes in the system. This decomposition
is controlled by the degree of certainty of the rules
generated for each fuzzy subspace and the deeper hi-
erarchical level allowed. The fuzzy partitions formed
for each domain are symmetric and triangular. Fur-
thermore, one of the most widely used algorithms for
fuzzy clustering is fuzzy c-means (FCM) (Bezdek,
1981). The algorithm assigns a set of examples, char-
acterized by their respective attributes, to a set number
of classes or clusters. Some methods developed for
fuzzy partitioning start from the FCM algorithm and
add some extension or heuristic to carry out an opti-
mization in the partitions. We can find some examples
in (Li, 2009), (Li et al., 2009). Also, a method that
constructs fuzzy partition using a genetic algorithm
is proposed in (Piero et al., 2003), where fuzzy par-
titions are obtained through beta and triangular func-
tions. The construction process of fuzzy partitions is
divided into two stages. In the first stage, fuzzy par-
titions with beta (Cox et al., 1998) or triangular func-
tions are constructed; and in the second stage these
partitions are adjusted with a genetic algorithm.
ICFC 2010 - International Conference on Fuzzy Computation
6