like to deeply investigate the relation between
, , C and
B
values when dealing with small
categories like (Computer and Education). For this
particular category, we have played with the
2
and
the classifier parameters, but we could not enhance
the Recall or the Precision values. The investigation
of other feature selection algorithms remains for
future works. And Building a bigger Arabic
Language TC Corpus shall be considered as well in
our future research.
ACKNOWLEDGEMENTS
Many thanks to Dr. Ghassan Kannaan for providing
the TC Arabic dataset and thanks to Dr. Nevin
Darwish for emailing me her paper in TC. And
thanks to Dr. Asim El Shiekh for Financial support.
REFERENCES
Manning, C., Schütze, H., (1999). Foundations of
Statistical Natural Language Processing. MIT Press.
Sebastiani, F., (2002). Machine Learning in Automated
Text Categorization. ACM Computing Surveys, 34
(1), 1-47.
Yang,
Y., & Liu, X., (1999). A re-examination of text
categorization methods. 22
nd
Annual International
ACM Conference on Research and Development in
Information Retrieval (SIGIR'99), 42-49.
Joachims, T., (1998). Text categorization with support
vector machines: Learning with many relevant
features. Proceedings of the 10
th
European Conference
on Machine Learning, pages 137–142
Schapire, R. & Singer, Y., (2000). BoosTexter: A
boosting-based system for text categorization.
Machine Learning, 39, No.2/3.
Vapnik, V., (1998). Statistical learning theory, John Wiley
& Sons, Inc., N.Y.
Benkhalifa, M.,
Mouradi, A., Bouyakhf, H., (2001).
Integrating WordNet knowledge to supplement
training data in semi-supervised agglomerative
hierarchical clustering for text categorization.
International Journal of Intelligent Systems. 16 (8):
929-947.
Elkourdi, M., Bensaid, A., & Rachidi, T., (2004).
Automatic Arabic Document Categorization Based on
the Naïve Bayes Algorithm. Proceedings of COLING
20th Workshop on Computational Approaches to
Arabic Script-based Languages, Geneva, August 23
rd
-
27
th
.2004, 51-58.
Samir, A., Ata, W., & Darwish, N., (2005), A New
Technique for Automatic Text Categorization for
Arabic Documents, 5
th
IBIMA Conference (The
internet & information technology in modern
organizations), December 13-15, 2005, Cairo, Egypt.
Salton, G,. Wong A., & Yang S., (1975). A Vector Space
Model for Automatic Indexing. Communications of
the ACM, 18(11), pp. 613-620.
Hofmann, H., (2003). Introduction to Machine Learning,
Draft Version 1.1.5, November 10, 2003.
Salton, G., & Buckley, C., (1988). Term weighting
approaches in automatic text retrieval. Information
Processing and Management, 24 (5), 513-523.
Yang,
Y., & Pedersen, J., (1997). A comparative study on
feature selection in text categorization. In J. D. H.
Fisher, editor, The 14
th
International Conference on
Machine Learning (ICML'97), 412-420. Morgan
Kaufmann.
Schutze, H., Hull, D., & Pedersen, J., (1995). A
comparison of classifiers and document
representations for the routing problem. Proceedings
of the 18th Annual International ACM SIGIR
Conference on Research and Development in
Information Retrieval, 229-237.
Yang, Y., & Wilbur, J., (1996). Using corpus statistics to
remove redundant words in text categorization.
Journal of the American Society for Information
Science, 47(5), 357-369.
Mitchell, T., (1996). Machine Learning, New York,
McGraw Hill
.
Vapnik, V., (1995). The Nature of Statistical Learning
Theory. Springer-Verlag Berlin.
Hofmann, T., (2000). Learning the similarity of
documents: An information geometric approach to
document retrieval and categorization. Advances in
Neural Information Processing Systems, 12, 914–920.
Takamura, H., Matsumoto, Y., & Yamada, H., (2004).
Modeling Category Structures with a Kernel Function.
Proceedings of Computational Natural Language
Learning.
Proceedings of CoNLL-2004, Boston, MA,
USA, 57-64.
Cristianini, N., & Shawe-Taylor, J., (2000). An
Introduction to Support Vector Machines and other
kernel-based learning methods. Cambridge University
Press.
Al-Shalabi, R., Kanaan, G., & Gharaibeh, M., (2006).
Arabic text categorization using kNN Algorithm,
Proceeding of the 4
th
International Multiconference on
Computer Science and Information Technology,
volume 4, Amman, Jordan. Retrieved March 20, 2007,
from
http://csit2006.asu.edu.jo/proceedings.
Baeza-Yates, R., & Rieiro-Neto, B., (1999). Modern
Information Retrieval. Addison-Wesley & ACM
Press.
Larkey, L., Ballesteros, L., & Connell, M., (2002).
Improving Stemming for Arabic Information
Retrieval: Light Stemming and Co-occurrence
Analysis. Proceedings of the 25
th
Annual International
Conference on Research and Development in
Information Retrieval (SIGIR 2002), Tampere,
Finland, August 11-15, 2002, 275-282.
ICSOFT 2007 - International Conference on Software and Data Technologies
240