features are selected by ACO-based FSS method, i.e.
SVMs
M
1
F results were achieved by only optimizing
one text category (the smallest category). We noted
that optimizing any category will enhance the
classifier’s effectiveness.
Figure 1: SVMs
M
1
F values for SVMs with the seven FSS
methods at different subset of features.
Figure 1 shows
M
1
F results for SVMs text
classifier with the seven FSS methods at different
sizes of feature subsets. It is obvious that our ACO-
based FSS method outperformed the original
classifier (where all the 78699 features are used for
training the SVMs text classifier) and outperformed
the other six FSS methods. Best Chi-square
M
1
F
result was 88.11, and after optimizing the feature
selection of the smallest category,
M
1
F result became
88.743.
5 CONCLUSIONS
Our proposed ACO-based FSS method adapted Chi-
square statistic as heuristic information and the
effectiveness of SVMs as a guidance to better
selecting features in Arabic TC tasks. In this work,
the proposed FSS method was selectively applied to
a single text category (Computer category is the
smallest category). Compared to six classical FSS
methods, it achieved better TC effectiveness results.
Optimizing features for all categories, tuning the
ACO-based FSS parameters and studying their
effects, and comparing our proposed method with
other ACO algorithm flavors are left as future work.
REFERENCES
Manning, C., Schütze, H., 1999. Foundations of Statistical
Natural Language Processing. MIT Press.
Liu, H., Yu, L., 2005. Toward integrating feature selection
algorithms for classification and clustering. IEEE
Transaction on Knowledge and Data Engineering, vol.
17, no. 4, 491-502.
Yang, Y., Pedersen, J., 1997. A Comparative Study on
Feature Selection in Text Categorization. In J. D. H.
Fisher, editor, The 14th
International Conference on
Machine Learning (ICML'97), Morgan Kaufmann,
412-420.
Forman, G., 2003. An Extensive Empirical Study of
Feature Selection Metrics for Text Classification,
Journal of Machine Learning Research, vol. 3, 1289-
1305.
Syiam, M., Fayed, Z., Habib, M., 2006. An Intelligent
System for Arabic Text Categorization. International
Journal of Intellegent Computing & Information
Ssciences, vol.6, no.1, 1-19.
Mesleh, A., 2007. Support Vector Machines based Arabic
Language Text Classification System: Feature
Selection Comparative Study, to appear in the
proceedings of the International Joint Conferences on
Computer, Information, and Systems Sciences, and
Engineering (CIS2E 07), December 3-12, Springer-
Verlag.
Blum, A., & Rivest, R.., 1992. Training a 3-Node Neural
Network is NP-Complete. Neural Networks, vol. 5,
no. 1, 117-127.
Goldberg, D., 1989. Genetic Algorithms in search,
optimization, and machine learning, Addison-Wesley.
Dorigo, M., Maniezzo, V., Colorni A., 1996. The ant
system: optimization by a colony of cooperating
agents. IEEE Transactions on Systems, Man, and
Cybernetics-Part B, vol. 26, no. 1, 29--41.
Elbeltagi, E., Hegazy, T., Grierson, D., 2005. Comparison
among five evolutionary-based optimization
algorithms, Advanced Engineering Informatics, vol.
19, no. 1, 43-53.
Yahya, A., 1989. On the complexity of the initial stages of
Arabic text processing, First Great Lakes Computer
Science Conference; Kalamazoo, Michigan, USA.
Al-Ani, A., 2005. Feature Subset Selection Using Ant
Colony Optimization, International Journal of
Computational Intelligence. vol. 2, no. 1, 53-58.
Jensen, R., Shen, Q., 2003. Finding rough set reducts with
ant colony optimization. In Proceedings of the 2003
UK workshop on computational intelligence, 15-22.
Schreyer, M., Raidl, G., 2002. Letting ants labeling point
features. In Proceedings of the 2002 IEEE congress on
evolutionary computation at the IEEE world congress
on computational intelligence, 1564-1569.
Sivagaminathan, R.K., Ramakrishnan, S., 2007. A hybrid
approach for feature subset selection using neural
networks and ant colony optimization. Expert Systems
with Applications, vol. 33, 49-60.
Baeza-Yates, R., Rieiro-Neto, B., (1999). Modern
Information Retrieval. Addison-Wesley & ACM
Press.
ARABIC TEXT CATEGORIZATION SYSTEM - Using Ant Colony Optimization-based Feature Selection
387