ARABIC TEXT CATEGORIZATION SYSTEM - Using Ant Colony Optimization-based Feature Selection
Abdelwadood Moh’d A. Mesleh, Ghassan Kanaan
2008
Abstract
Feature subset selection (FSS) is an important step for effective text classification (TC) systems. This paper describes a novel FSS method based on Ant Colony Optimization (ACO) and Chi-square statistic. The proposed method adapted Chi-square statistic as heuristic information and the effectiveness of Support Vector Machines (SVMs) text classifier as a guidance to better selecting features for selective categories. Compared to six classical FSS methods, our proposed ACO-based FSS algorithm achieved better TC effectiveness. Evaluation used an in-house Arabic TC corpus. The experimental results are presented in term of macro-averaging F1 measure.
References
- Manning, C., Schütze, H., 1999. Foundations of Statistical Natural Language Processing. MIT Press.
- Liu, H., Yu, L., 2005. Toward integrating feature selection algorithms for classification and clustering. IEEE Transaction on Knowledge and Data Engineering, vol. 17, no. 4, 491-502.
- Yang, Y., Pedersen, J., 1997. A Comparative Study on Feature Selection in Text Categorization. In J. D. H. Fisher, editor, The 14th International Conference on Machine Learning (ICML'97), Morgan Kaufmann, 412-420.
- Forman, G., 2003. An Extensive Empirical Study of Feature Selection Metrics for Text Classification, Journal of Machine Learning Research, vol. 3, 1289- 1305.
- Syiam, M., Fayed, Z., Habib, M., 2006. An Intelligent System for Arabic Text Categorization. International Journal of Intellegent Computing & Information Ssciences, vol.6, no.1, 1-19.
- Mesleh, A., 2007. Support Vector Machines based Arabic Language Text Classification System: Feature Selection Comparative Study, to appear in the proceedings of the International Joint Conferences on Computer, Information, and Systems Sciences, and Engineering (CIS2E 07), December 3-12, SpringerVerlag.
- Blum, A., & Rivest, R.., 1992. Training a 3-Node Neural Network is NP-Complete. Neural Networks, vol. 5, no. 1, 117-127.
- Goldberg, D., 1989. Genetic Algorithms in search, optimization, and machine learning, Addison-Wesley.
- Dorigo, M., Maniezzo, V., Colorni A., 1996. The ant system: optimization by a colony of cooperating agents. IEEE Transactions on Systems, Man, and Cybernetics-Part B, vol. 26, no. 1, 29--41.
- Elbeltagi, E., Hegazy, T., Grierson, D., 2005. Comparison among five evolutionary-based optimization algorithms, Advanced Engineering Informatics, vol. 19, no. 1, 43-53.
- Yahya, A., 1989. On the complexity of the initial stages of Arabic text processing, First Great Lakes Computer Science Conference; Kalamazoo, Michigan, USA.
- Al-Ani, A., 2005. Feature Subset Selection Using Ant Colony Optimization, International Journal of Computational Intelligence. vol. 2, no. 1, 53-58.
- Jensen, R., Shen, Q., 2003. Finding rough set reducts with ant colony optimization. In Proceedings of the 2003 UK workshop on computational intelligence, 15-22.
- Schreyer, M., Raidl, G., 2002. Letting ants labeling point features. In Proceedings of the 2002 IEEE congress on evolutionary computation at the IEEE world congress on computational intelligence, 1564-1569.
- Sivagaminathan, R.K., Ramakrishnan, S., 2007. A hybrid approach for feature subset selection using neural networks and ant colony optimization. Expert Systems with Applications, vol. 33, 49-60.
- Baeza-Yates, R., Rieiro-Neto, B., (1999). Modern Information Retrieval. Addison-Wesley & ACM Press.
Paper Citation
in Harvard Style
Moh’d A. Mesleh A. and Kanaan G. (2008). ARABIC TEXT CATEGORIZATION SYSTEM - Using Ant Colony Optimization-based Feature Selection . In Proceedings of the Third International Conference on Software and Data Technologies - Volume 1: ICSOFT, ISBN 978-989-8111-51-7, pages 384-387. DOI: 10.5220/0001892803840387
in Bibtex Style
@conference{icsoft08,
author={Abdelwadood Moh’d A. Mesleh and Ghassan Kanaan},
title={ARABIC TEXT CATEGORIZATION SYSTEM - Using Ant Colony Optimization-based Feature Selection},
booktitle={Proceedings of the Third International Conference on Software and Data Technologies - Volume 1: ICSOFT,},
year={2008},
pages={384-387},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001892803840387},
isbn={978-989-8111-51-7},
}
in EndNote Style
TY - CONF
JO - Proceedings of the Third International Conference on Software and Data Technologies - Volume 1: ICSOFT,
TI - ARABIC TEXT CATEGORIZATION SYSTEM - Using Ant Colony Optimization-based Feature Selection
SN - 978-989-8111-51-7
AU - Moh’d A. Mesleh A.
AU - Kanaan G.
PY - 2008
SP - 384
EP - 387
DO - 10.5220/0001892803840387