0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 100 200 300 400 500 600 700 800 900 1000
F1-measure
Feature Set Size
Feature Selection Experiments
Odds Ratio
Chi-Squared
Information Gain
Mutual Information
Term Frequncy
Fisher Crierion
BSS/WSS
GU Metric
NGL
GSS
Figure 3: F1 measure of the Feature Selection Experiments.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 100 200 300 400 500 600 700 800 900 1000
F2-measure
Feature Set Size
Feature Selection Experiments
Odds Ratio
Chi-Squared
Information Gain
Mutual Information
Term Frequncy
Fisher Crierion
BSS/WSS
GU metric
NGL
GSS
Figure 4: F2 measures of the Feature Selection Experi-
ments.
study is promising. The GU metric performs as
well as some of the more common feature selection
methods such as χ
2
and outperforms some other well
known feature selection methods such as OddsRatio
and InformationGain. Our experimental evaluations
are still ongoing. In particular we are continuing ex-
perimental evaluations on different domains and using
different classifiers.
REFERENCES
Caropresso, M., Matwin, S., and Sebastiani, F. (2001). A
learner independent evaluation of usefulness of sta-
tistical phrases for automated text categroization. In
Chin, A., editor, Text Databases and Document Man-
agement: Theory and Practice, pages 78 – 102. idea
group publishing.
Church, K. W. and Hanks, P. (1998). Word association
norms, mutual information and leixicography. In ACL
27, pages 76–83, Vancouver Canada.
Dumais, S. T. and Chen, H. (2000). Hierarchical classifica-
tion of web content. In SIGIR’.
Dumais, S. T., Platt, J., Heckerman, D., and Sahami, M.
(1998). Inductive learning algorithms and representa-
tions for text. In ACM-CIKM, pages 148–155.
Dutoit, S., Yang, H., Callow, J., and Speed, P. (2002). Sta-
tistical methods for identifying differently expressed
genes in replicated cdna microarray experiments.
Journal of American Statistic Association, (97):77–
86.
Fano, R. (1961). Transmission of Information. MIT Press,
Cambridge, MA.
Forman, G. (2003). An extensive empirical study of feature
selection metrics for text classification. The Journal
of Machine Learning Research, 3.
Galavotti, L., Sebastiani, F., and Simi, M. (2000). Experi-
ments on the use of feature selection and negative ev-
idence in automated text categorization. In Borbinha,
J. L. and Baker, T., editors, Proceedings of ECDL-00,
4th European Conference on Research and Advanced
Technology for Digital Libraries, pages 59–68, Lis-
bon, PT. Springer Verlag, Heidelberg, DE.
Joachims, T. (1997). A probabilistic analysis of the rocchio
algorithm with tfidf for text categorization. In ICML,
pages 143–151.
Lang, K. (1995). Newsweeder: Learning to filter netnews.
In 12th International Conference on Machine Learn-
ing.
Mitchel, T. M. (1997). Machine Learning. McGraw-Hill
International.
Mladenic, D. (1998). Machine Learning on non-
homogeneous, distributed text data. PhD thesis, Uni-
versity of Ljubljana,Slovenia.
Mladenic, D., Brank, J., Grobelnik, M., and Milic-Frayling,
N. (2004). Feature selection using linear classifier
weights: Interaction with classification models. In
ACM, editor, SIGIR’04.
Ng, H., Goh, W., and Low, K. (1997). Feature selection,
perceptron learning, and a usability case study for
text categorization. In SIGIR ’97: Proceedings of the
20th Annual International ACM SIGIR Conference on
Research and Development in Information Retrieval,
July 27-31, 1997, Philadelphia, PA, USA, pages 67–
73. ACM.
Pazzani, M. and Billsus, D. (1997). Learning and revis-
ing user profiles: The identification of interesting web
sites. Machine Learning, 27:313–331.
Quinlan, J. R. (1993). C4.5: Programs for Machine Learn-
ing. Morgan Kaufmann.
Rijsbergen, V. (1979). Information Retrieval. Butterworths,
London 2nd edition.
Ruiz, M. E. and Srinivasan, P. (2002). Hierarchical text cat-
egorization using neural networks. Information Re-
trieval, 5(1):87–118.
Y.Yang and Pedersen, J. (1997). A comparative study on
feature selection in text categorization. In Fisher,
D. H., editor, Proceedings of ICML-97, 14th Interna-
tional Conference on Machine Learning, pages 412–
420, Nashville, US. Morgan Kaufmann Publishers,
San Francisco, US.
Zheng, Z. and Srihari, R. (2003). Optimally combining pos-
itive and negative features for text categorization. In
ICML-KDD’2003 Workshop: Learning from Imbal-
anced Data Sets II, Washington, DC.
ICEIS 2007 - International Conference on Enterprise Information Systems
402