ARABIC TEXT CATEGORIZATION SYSTEM - Using Ant Colony Optimization-based Feature Selection

Abdelwadood Moh’d A. Mesleh, Ghassan Kanaan

2008

Abstract

Feature subset selection (FSS) is an important step for effective text classification (TC) systems. This paper describes a novel FSS method based on Ant Colony Optimization (ACO) and Chi-square statistic. The proposed method adapted Chi-square statistic as heuristic information and the effectiveness of Support Vector Machines (SVMs) text classifier as a guidance to better selecting features for selective categories. Compared to six classical FSS methods, our proposed ACO-based FSS algorithm achieved better TC effectiveness. Evaluation used an in-house Arabic TC corpus. The experimental results are presented in term of macro-averaging F1 measure.

References

  1. Manning, C., Schütze, H., 1999. Foundations of Statistical Natural Language Processing. MIT Press.
  2. Liu, H., Yu, L., 2005. Toward integrating feature selection algorithms for classification and clustering. IEEE Transaction on Knowledge and Data Engineering, vol. 17, no. 4, 491-502.
  3. Yang, Y., Pedersen, J., 1997. A Comparative Study on Feature Selection in Text Categorization. In J. D. H. Fisher, editor, The 14th International Conference on Machine Learning (ICML'97), Morgan Kaufmann, 412-420.
  4. Forman, G., 2003. An Extensive Empirical Study of Feature Selection Metrics for Text Classification, Journal of Machine Learning Research, vol. 3, 1289- 1305.
  5. Syiam, M., Fayed, Z., Habib, M., 2006. An Intelligent System for Arabic Text Categorization. International Journal of Intellegent Computing & Information Ssciences, vol.6, no.1, 1-19.
  6. Mesleh, A., 2007. Support Vector Machines based Arabic Language Text Classification System: Feature Selection Comparative Study, to appear in the proceedings of the International Joint Conferences on Computer, Information, and Systems Sciences, and Engineering (CIS2E 07), December 3-12, SpringerVerlag.
  7. Blum, A., & Rivest, R.., 1992. Training a 3-Node Neural Network is NP-Complete. Neural Networks, vol. 5, no. 1, 117-127.
  8. Goldberg, D., 1989. Genetic Algorithms in search, optimization, and machine learning, Addison-Wesley.
  9. Dorigo, M., Maniezzo, V., Colorni A., 1996. The ant system: optimization by a colony of cooperating agents. IEEE Transactions on Systems, Man, and Cybernetics-Part B, vol. 26, no. 1, 29--41.
  10. Elbeltagi, E., Hegazy, T., Grierson, D., 2005. Comparison among five evolutionary-based optimization algorithms, Advanced Engineering Informatics, vol. 19, no. 1, 43-53.
  11. Yahya, A., 1989. On the complexity of the initial stages of Arabic text processing, First Great Lakes Computer Science Conference; Kalamazoo, Michigan, USA.
  12. Al-Ani, A., 2005. Feature Subset Selection Using Ant Colony Optimization, International Journal of Computational Intelligence. vol. 2, no. 1, 53-58.
  13. Jensen, R., Shen, Q., 2003. Finding rough set reducts with ant colony optimization. In Proceedings of the 2003 UK workshop on computational intelligence, 15-22.
  14. Schreyer, M., Raidl, G., 2002. Letting ants labeling point features. In Proceedings of the 2002 IEEE congress on evolutionary computation at the IEEE world congress on computational intelligence, 1564-1569.
  15. Sivagaminathan, R.K., Ramakrishnan, S., 2007. A hybrid approach for feature subset selection using neural networks and ant colony optimization. Expert Systems with Applications, vol. 33, 49-60.
  16. Baeza-Yates, R., Rieiro-Neto, B., (1999). Modern Information Retrieval. Addison-Wesley & ACM Press.
Download


Paper Citation


in Harvard Style

Moh’d A. Mesleh A. and Kanaan G. (2008). ARABIC TEXT CATEGORIZATION SYSTEM - Using Ant Colony Optimization-based Feature Selection . In Proceedings of the Third International Conference on Software and Data Technologies - Volume 1: ICSOFT, ISBN 978-989-8111-51-7, pages 384-387. DOI: 10.5220/0001892803840387


in Bibtex Style

@conference{icsoft08,
author={Abdelwadood Moh’d A. Mesleh and Ghassan Kanaan},
title={ARABIC TEXT CATEGORIZATION SYSTEM - Using Ant Colony Optimization-based Feature Selection},
booktitle={Proceedings of the Third International Conference on Software and Data Technologies - Volume 1: ICSOFT,},
year={2008},
pages={384-387},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001892803840387},
isbn={978-989-8111-51-7},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Third International Conference on Software and Data Technologies - Volume 1: ICSOFT,
TI - ARABIC TEXT CATEGORIZATION SYSTEM - Using Ant Colony Optimization-based Feature Selection
SN - 978-989-8111-51-7
AU - Moh’d A. Mesleh A.
AU - Kanaan G.
PY - 2008
SP - 384
EP - 387
DO - 10.5220/0001892803840387