formed by the original NEFCLASS algorithm, when
the input space is partitioned into EQUAL-WIDTH
fuzzy intervals.
Fig. 2b demonstrates that during the fuzzy set tun-
ing process, the membership function is shifted and
the support is reduced or enlarged, in order to better
match the coverage of the data points belonging to
the associated class, however as we will see later, this
process is strongly informed by the initial conditions
set up by the discretization to produce the initial fuzzy
membership functions.
There are three different modes to be used for rule
selection in NEFCLASS. These modes are based on
the performance of a rule or on the coverage of the
training data. The three options for the rule selec-
tion mode presented here are Simple, Best and Best-
PerClass. The Simple rule selection chooses the first
generated rules until a predefined maximum number
of rules is achieved. The Best rule selection is an al-
gorithm that ranks the rules based on the number of
patterns associated with each rule and select the rules
from this list. The BestPerClass option is selection of
rules by creating an equal number of rules for each
class. This method uses the Best rule selection algo-
rithm to ranks the rules.
After the construction of the fuzzy rules, a fuzzy
set learning procedure is applied to the training data,
so that the membership functions are tuned to better
match the extent of the coverage of each individual
class in the training data space (Nauck et al., 1996,
pp. 239). Fuzzy membership functions will grow or
shrink, as a result, depending on the degree of ambi-
guity between sets and the dataset coverage.
2.2 Discretization
A discretization process divides a continuous numeri-
cal range into a number of covering intervals where
data falling into each discretized interval is treated
as being describable by the same nominal value in
a reduced complexity discrete event space. In fuzzy
work, such intervals are then typically used to define
the support of fuzzy sets, and the precise placement
in the interval is mapped to the degree of membership
in such a set.
In the following discussion, we describe the
EQUAL-WIDTH and MME discretization methods.
For example, imagine a dataset formed of three over-
lapping distributions of 15 points each, as shown with
the three coloured arrangements of points in Fig. 3.
The points defining each class are shown in a horizon-
tal band, and the points are connected together to indi-
cate that they are part of the same class group. In parts
3a and 3b, the results of binning these points with two
(a) EQUAL-WIDTH
(b) MME
Figure 3: Two discretization techniques result in different
intervals produced on the same three-class dataset. The fig-
ure extracted from (Yousefi and Hamilton-Wright, 2016).
different discretization techniques are shown. The
subfigures within Fig. 3 each show the same data,
with the green, red and blue rows of dots (top, middle
and bottom) within each figure describing the data for
each class in the training data.
2.2.1 EQUAL-WIDTH
The EQUAL-WIDTH discretization algorithm divides
the observed range of continuous values for a given
feature into a number of equally sized intervals, pro-
viding a simple mapping of the input space that is
created independent of both the distribution of class
and of the density of feature values within the input
space (Kerber, 1992; Chemielewski and Grzymala-
Busse, 1996).
Fig. 3a demonstrates the partitioning using
EQUAL-WIDTH intervals. Note that the intervals
shown have different numbers of data points within
each (21, 19 and 5 in this case).
2.2.2 Marginal Maximum Entropy
Marginal Maximum Entropy based discretization
(MME) (Chau, 2001; Gokhale, 1999) divides the
dataset into a number of intervals for each feature,
where the number of points is made equal for all of
the intervals, under the assumption that the informa-
tion of each interval is expected to be equal. The in-
tervals generated by this method have an inverse rela-
tionship with the points’ density within them. Fig. 3b
shows the MME intervals for the example three-class
dataset. Note that the intervals in Fig. 3b do not
cover the same fraction of the range of values (i.e.,
the widths differ), being the most dense in regions
where there are more points. The same number of
points (15) occur in each interval. In both of these
discretization strategies, class identity is ignored, so
there is likely no relationship between class label dis-
tribution and discretization boundary.
A Synergistic Approach to Enhance the Accuracy-interpretability Trade-off of the NECLASS Classifier for Skewed Data Distribution
327