
Table 2: The data sets for experiments
Name #objects #continuous
attributes
#decision
classes
Iris 150 4 3
liver-
disorders
345 6 2
Abalone 4177 7 29
We make comparative experiments between local
discretization algorithm and global algorithm,
comparing the number of result cuts discretizing
continuous attributes. The results are shown in table
3, table 4, and table 5 respectively. In the tables,
#cuts L denotes the number of result cuts generated
by local discretization algorithm and #cuts G by
global discretization algorithm.
As the two algorithms are both applied on
consistent information systems and maintain the
original indiscernibility, the smaller number of the
result cuts, the better the algorithm is. From the
comparisons we know that for liver disorders dataset
and abalone dataset, the number of result cuts
generated by global algorithm is far smaller than by
local algorithm. But it is larger for liver iris dataset.
So we can’t say that global algorithm is always
better than local algorithm.
For liver iris data set, the number of result cuts of
attribute sepal_length generated by global algorithm
is far larger than by local algorithm, and the number
Table 3: Comparison of the results on liver disorders.
Attribute Mcv alkphos sgpt Sgo Gam- drinks total
magt
#cuts L 20 22 20 25 30 23 140
#cuts G 3 4 3 2 5 3 20
Table 4: Comparison of the results on iris.
Attribute sepal_ sepal_ petal_ petal_ Total
Length width length width
#cuts L 3 3 6 1 13
#cuts G 34 2 4 2 42
Table 5: Comparison of the results on abalone
Attri- len- diam- hei- Whole shucked viscera shell total
bute gth eter ght weight weight weight weight
#cuts L 421 389 419 539 564 674 555 3561
#cuts G 20 21 30 7 32 32 30 172
of result cuts of other attributes is almost equal. But
for two other data sets, the number of result cuts for
all attributes generated by global algorithm is far
smaller than by local algorithm. Hence, we can say
that the two algorithms are data set sensitive, and we
can conjecture that their quality depends on the
distributions of the values of the attributes and their
decision classes.
6 CONCLUSIONS
For discretization based on rough set, we should
seek possible minimum number of discrete internals,
and at the same time it should not weaken the
indiscernibility ability. This paper examines two
algorithms (Hung Son Nguyen,1996), local
discretization and global discretization. Our
experiments show that the discretization algorithms
are dataset sensitive. Neither of them always
generates smaller number of result cuts. On some
datasets, one algorithm generates fewer result cuts,
but on other datasets it is contrary. We can
conjecture that the quality of the two algorithms
depends on the distributions of the values of the
continuous dataset attributes and their decision
classes. How the distributions affect the results is
what we will study further. With that, we can use
some methods to improve the algorithms.
REFERENCES
Pawlak Z (1982, November 5). Rough Sets. Int'l J.
Computer & Science [J], 11, 341-356.
Nguyen H S, Skowron A (1995). Quantization of real
value attributes. Proceedings of Second Joint Annual
Conf. on Information Science, Wrightsville Beach,
North Carolina, 34-37.
Nguyen H S (1997). Discretization of Real Value
Attributes: Boolean reasoning Approach [PhD
Dissertation]. Warsaw University Warsaw, Poland.
Hung Son Nguyen, Sinh Hoa Nguyen (1996). Some
efficient algorithms for rough set methods. In 6th
International conference on Information Processing
and Management of Uncertainty in Knowledge-Based
Systems, 1451-1456.
Jian-Hua Dai, Yuan-Xiang Li (2002, November 4-5).
Study on discretization based on rough set theory.
Proceedings of the First International Conference on
Machine Learning and Cybernetics, 3, 1371-1373.
MLR, http://www.ics.uci.edu/~mlearn/MLRepository.html
.
ICEIS 2004 - ARTIFICIAL INTELLIGENCE AND DECISION SUPPORT SYSTEMS
514