satisfactory, more useful rules could be found if the
number of parameters was reduced and the number
of classes was increased. With these changes, a
second stage of the KDD process could be
developed. In this stage, the preprocessing step
could be done again to prepare the data for the new
circumstances.
3.4 Second Stage
In this stage, the study is a consequence of the
possible improvements that were found after the first
stage. The differences between them are:
• In the first stage, the value of the class depended
on a calculus on the TCM12 value. However, Class
0 was detected corresponding with a correct
behaviour of the system was set by the expert to
solve some situations. To solve this problem, it is
neccesary to think of another way to calculate the
class. The new classes are obtained calculating the
error between the draught set point value in the BC
(measured by TCM42) and the real draught value in
the BC measured by TCM12. We obtain: Pressure,
High pressure, Draught and High Draught.
• To reduce the number of parameters, we must
take into account the blowers and valves that take
the gas to BC and the blowers and valves that extract
gas from BC. Table 4 shows the range of values
corresponding to each class and the amount of
instances of each class. We have selected only those
rules that are true in more than 1% of the instances
of its class. The fourth column expresses the value
that indicates if we can consider that a rule is true in
a significative number of cases.
Table 5: Number of records covered by the rule
Rule
Identifying
records covered
by the rule
Class
7 215 (2 errors) Pressure
8 216 (1 error) Pressure
3.4.1 Results and conclusions of the second
stage
We don´t have a lot of instances of classes: High
pressure, Draught and High Draught , so the rules
with these classes are not categorical enough.
Table 5 shows the most representative rules of
each class obtained in the second stage. These rules
have a graphic representation, as we show in the
Figure 4.Developing the second stage, we can get
rules that describe the optimal behaviour (class OK).
However, these rules are easier to interprete by the
human expert.
4 CONCLUSIONS
Using data from a real industrial process, we have
developed a KDD proccess. We have obtained a set
of classification rules that approach the system
behaviour. We have two stages in our KDD process.
In the first stage, we worked with two classes, and a
high number of parameters. In the second stage, we
worked with the most significative parameters of
each subsystem, and 5 classes only. In this paper, we
describe all the stages of the process and the most
representative rules. The rules obtained from the
KDD process have helped the human expert to know
the range of values in which the system works
properly. With this analysis we probe the viability of
using authomatic classifiers in a productive process,
with an increase of the production and a decrease of
the contamination.Future works will study the errors
due to the system delays and develop a method of
obtaining rules with delay.
REFERENCES
U.M. Fayyad, G. Piatetsky-Shapiro y P. Smyth: From
Data Mining to Knowledge Discovery in Databases,
pages 37-54. Ai Magazine. 1996
M.S. Chen, J. Han y P.S. Yu. Data Mining: An Overview
from Database Perspective. IEEE Transactions on
knowledge and Data Engineering, pages 866-883.
1996
J.R. Quinlan. C4.5: Programs for Machine Learning.
Morgan Kaufmann Publishers. San Mateo. California,
1993
Perner, P., 2001. Tutorial part on data mining. In: Perner,
P., Ahlemeyer-Stubbe, A. (Eds.), Proceedings of the
1st Industrial Conference on Data Mining, ICDM
2001, IBaI Report, Leipzig, 2001
Santen, A., Koot, G.L.M. y Zullo, L.C., 1997. Statistical
data analysis of a chemical plant. Computers and
Chemical Engineering 21, S1123–S1129
Alex Berson, Stephen Smith y Kurt Thearling: Building
Data Mining Applications for CRM. McGraw-Hill.
New York, 2000
ICEIS 2004 - ARTIFICIAL INTELLIGENCE AND DECISION SUPPORT SYSTEMS
314