stop after rule filtering. Here, we employ NNE
techniques to effectively learn the filtered rules. This
is done by using different neural network
algorithms, i.e. Multilayer Perceptron, Generalized
Feedforward Network, Modular Neural Network and
Radial Basis Function Networks. For this NNE, the
bagging technique would be employed and this
involves re-sampling or re-weighting the training
instances where all the results of each neural
network are aggregated to produce better
predictions. Here, we have done the aggregation via
averaging. This will produce a neural knowledge
base (weights) that would be considered the best
abstraction of the knowledge from the rules and
would serve as a robust repository for decision
support even in the event that users do not provide
sufficient input.
3 CONCLUSION
We would like to highlight that the knowledge
discovery pipeline leaves much to be explored in
terms of the algorithms and techniques that can be
applied at each stage of the pipeline. For our
extended pipeline, we have proposed mean/mode fill
for data preparation, clustering ensemble with SOM
for clustering or data annotation, Boolean Reasoning
for discretization, rough set analysis for rule
extraction, Rule Quality Function (based on support,
consistency and coverage) for rule filtering and
finally, a NNE for rule learning. In addition to
contributing towards an extended knowledge
discovery pipeline, we believe that this featured
work will provide an alternative inductive approach
to support diagnosis and decision support especially
in the medical domain.
With the incorporation of ensembling techniques
for clustering and learning, we hope to minimize the
need, or to reduce the temptation, to switch to other
algorithms by making the most out to the selected
algorithm, i.e. SOM and our ‘cocktail’ of neural
network algorithms. We are currently evaluating
each stage of our extended knowledge discovery
pipeline using continuous, discrete and also possibly
mixed (continuous and discrete) medical datasets
such as those on breast cancer and thyroid disease.
We will also be exploring other methods of
clustering ensembles in the second stage to be
integrated into our extended pipeline in future. We
believe this extended pipeline would result in more
accurate knowledge-based predictions in our effort
to make medical diagnosis and decision support
more reliable and trustworthy.
REFERENCES
Abidi, S.S.R. and Hoe, K.M., 2002. Symbolic Exposition
of Medical Data-Sets: A Data Mining Workbench to
Inductively Derive Data-Defining Symbolic Rules. In
Proceedings of the 15th IEEE Symposium on
Computer Based Medical Systems (CBMS 2002).
Maribor, Slovenia.
Breiman, L., 1996. Bagging Predictors. Machine
Learning. Vol. 24, pp. 123-140.
Dimitriadou, E., Weingessel, A., and Hornik, K., 2003. A
cluster ensembles framework. In Abraham, A.,
Köppen, M., and Franke, K. (eds.), Design and
Application of Hybrid Intelligent Systems. Frontiers in
Artificial Intelligence and Applications. Vol. 104, pp.
528-534. IOS Press.
Fayyad, U., Piatetsky-Shapiro, G. and Smyth, P., 1996.
From Data Mining to Knowledge Discovery in
Databases. AI Magazine. Vol. 17, No. 3, pp. 37-54.
Freund, Y. and Schapire, R.E., 1995. A Decision-
Theoretic Generalization of On-line Learning and an
Application to Boosting. In Proceedings of the Second
European Conference on Computational Learning
Theory. Barcelona, Spain, pp. 23-37.
Hansen, L.K. and Salamon, P., 1990. Neural Network
Ensembles. IEEE Transactions on Pattern Analysis
and Machine Intelligence. Vol. 12, pp. 993-1001.
Johnson, D.S., 1974. Approximation Algorithms for
Combinational Problems. Journal of Computer and
System Sciences, Vol. 9, pp. 256-278.
Michalski, R.S., 1983. A Theory and Methodology of
Inductive Learning. In Michalski, R., Carbonell, J. and
Mitchell, T. (eds.), Machine Learning. An
Artificial Intelligence Approach, pp. 83-134. Springer-
Verlag.
Risvik, K.M., 1997. Discretization of Numerical
Attributes - Preprocessing for Machine Learning.
Project Report. Knowledge Systems Group,
Department of Computer Systems and Telematics,
Norwegian Institute of Technology, University of
Trondheim, Norway.
Strehl, A., and Ghosh, J., 2002. Cluster ensembles - a
knowledge reuse framework for combining multiple
partitions. Journal on Machine Learning Research.
Vol. 3, pp. 583-617.
Yang, Y and Kamel, M., 2003. Clustering Ensemble
Using Swarm Intelligence. In IEEE Swarm
Intelligence Symposium. Indianapolis, Indiana, USA.
Zhou, Z.H., Jiang, Y. and Chen, S.-F., 2003. Extracting
Symbolic Rules from Trained Neural Network
Ensembles. AI Communications. Vol. 16, No. 1, pp. 3-
15.
USING ENSEMBLE AND LEARNING TECHNIQUES TOWARDS EXTENDING THE KNOWLEDGE DISCOVERY
PIPELINE
411