such thing as a globally best feature selection
method, or a globally optimal feature subset. No
individual 3-tuple (generation, evaluation and
validation procedure) can be identified such that it
achieves best performance on any dataset, with any
learning algorithm. However, due to the
particularities of the attributes selected by individual
inducers, we expect that the tuples using the same
inducer in the evaluation and validation steps will
perform better than combined tuples.
Moreover, the experimental results suggest the
possibility of tackling a similar approach to the one
in (Moldovan, 2007). There, because of the high
degree of stability (smaller variations than single
classifiers across several datasets), the system can be
used to establish the baseline accuracy for a certain
dataset. In a similar manner, the selections of several
generation methods can be combined in order to
achieve higher stability and (possibly) improved
performance. The evaluations performed so far in
this direction have yielded promising results.
However, work still has to be done, to perfect the
method, and try new combination approaches. Here
we only experimented with a number of different
generation procedures, in a manner similar to the
ensemble learning methods. The evaluation
functions can also be combined. To do that,
however, you need a more sophisticated approach.
One that seems appropriate is the one used to
establish the baseline accuracy of a dataset, using the
Dempster-Shafer theory.
The feature selection process can be considered
for data imputation as well. By switching the target
concept from the class to a particular feature which
is incomplete, we can efficiently predict the missing
values using only the optimal feature subset which
characterizes the particular attribute. This is another
current concern in our work.
Also, to enhance cost-sensitive learning, the
feature selection mechanism could be modified such
as to consider a cost-sensitive evaluation function,
instead of the prediction accuracy. This is something
we haven’t tackled yet, but the idea seems
promising.
ACKNOWLEDGEMENTS
The authors wish to thank to Dan Bratucu, Cristian
Botau and Adrian Cosinschi for their contributions
to the implementation and for running part of the
tests.
REFERENCES
Almuallim, H., Dietterich, T. G., 1997. “Learning with
many irrelevant features”, In Proceedings of Ninth
National Conference on AI, pp. 547-552.
Cheeseman, P., Stutz, J., 1995. Bayesian classification
(AutoClass): Theory and results. In U. M. Fayyad, G.
Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy,
editors, Advances in Knowledge Discovery and Data
Mining. Menlo Park, CA: AAAI Press, pp. 153–180.
Dash, M., Liu, H., 1997. Feature Selection for
Classification. In Intelligent Data Analysis 1, 131–
156. INSTICC Press.
Freund, Y., Schapire, R., 1997. A decision-theoretic
generalization of on-line learning and an application to
boosting. In Journal of Computer and System
Sciences, 55(1):119–139.
Hall, M. A., Holmes, G., 2003. Benchmarking Attribute
Selection Techniques for Discrete Class Data Mining.
In IEEE Transactions on Knowledge and Data
Engineering, v.15 n.6, p.1437-1447.
Kira, K., Rendell, L. A., 1992. “The feature selection
problem - Traditional methods and a new algorithm”,
In Proceedings of Ninth National Conference on AI,
pp. 129-134.
Kohavi R., John, J. H., 1997, “Wrappers for feature subset
selection”, Artificial Intelligence, Volume 7, Issue 1-2.
John, G.H., 1997. Enhancements to the Data Mining
Process. PhD Thesis, Computer Science Department,
School of Engineering, Stanford University.
John, G.H., Kohavi, R., Pfleger, K., 1994. Irrelevant
features and the subset selection problem. In
Proceedings of the Eleventh International Conference
on Machine Learning, 121–129.
Liu, H., Setiono, R., 1996. “A probabilistic approach to
feature selection–a filter solution”, In Proceedings of
International Conference on Machine Learning, pp.
319-327.
Moldovan, T., Vidrighin, C., Giurgiu, I., Potolea, R.,
2007. "Evidence Combination for Baseline Accuracy
Determination". Proceedings of the 3rd ICCP, 6-8
September, Cluj-Napoca, Romania, pp. 41-48.
Molina L. C., Belanche L., Nebot, A., 2002. “Feature
Selection Algorithms: A Survey and Experimental
Evaluation”, In Proceedings of the 2002 IEEE
International Conference on Data Mining (ICDM'02).
Nilsson, R., 2007. Statistical Feature Selection, with
Applications in Life Science, PhD Thesis, Linkoping
University.
Onaci, A., Vidrighin, C., Cuibus, M., Potolea, R., 2007.
"Enhancing Classifiers through Neural Network
Ensembles". Proceedings of the 3rd ICCP, 6-8
September, Cluj-Napoca, Romania, pp. 57-64.
Witten, I., Frank E., 2005. Data Mining: Practical
machine learning tools and techniques, 2
nd
edition,
Morgan Kaufmann.
TOWARDS A COMBINED APPROACH TO FEATURE SELECTION
139