A Weighted Maximum Entropy Language Model for Text Classification
Kostas Fragos, Yannis Maistros, Christos Skourlas
2005
Abstract
The Maximum entropy (ME) approach has been extensively used in various Natural Language Processing tasks, such as language modeling, part-of-speech tagging, text classification and text segmentation. Previous work in text classification was conducted using maximum entropy modeling with binary-valued features or counts of feature words. In this work, we present a method for applying Maximum Entropy modeling for text classification in a different way. Weights are used to select the features of the model and estimate the contribution of each extracted feature in the classification task. Using the X square test to assess the importance of each candidate feature we rank them and the most prevalent features, the most highly ranked, are used as the features of the model. Hence, instead of applying Maximum Entropy modeling in the classical way, we use the X square values to assign weights to the features of the model. Our method was evaluated on Reuters-21578 dataset for test classification tasks, giving promising results and comparably performing with some of the “state of the art” classification schemes.
References
- Lewis, D. and Ringuette, M., A comparison of two learning algorithms for text categorization. In The Third Annual Symposium on Document Analysis and Information Retrieval pp.81-93, 1994
- Makoto, I. and Takenobu, T., Cluster-based text categorization: a comparison of category search strategies, In ACM SIGIR'95, pp.273-280, 1995
- McCallum, A. and Nigam, K., A comparison of event models for naïve Bayes text classification, In AAAI-98 Workshop on Learning for Text Categorization, pp.41-48, 1998
- Masand, B., Lino, G. and Waltz, D., Classifying news stories using memory based reasoning, In ACM SIGIR'92, pp.59-65, 1992
- Yang, Y. and Liu, X., A re-examination of text categorization methods, In ACM SIGIR'99, pp.42-49, 1999
- Yang, Y., Expert network: Effective and efficient learning from human decisions in text categorization and retrieval, In ACM SIGIR'94, pp.13-22, 1994
- Buckley, C., Salton, G. and Allan, J., The effect of adding relevance information in a relevance feedback environment, In ACM SIGIR'94, pp.292-300, 1994
- Joachims, T., A probabilistic analysis of the rocchio algorithm with TFIDF for text categorization, In ICML'97, pp.143-151, 1997
- Guo, H. and Gelfand S. B., Classification trees with neural network feature extraction, In IEEE Trans. on Neural Networks, Vol. 3, No. 6, pp.923-933, Nov., 1992
- Liu, J. M. and Chua, T. S., Building semantic perception net for topic spotting, In ACL'01, pp.370-377, 2001
- Ruiz, M. E. and Srinivasan, P., Hierarchical neural networks for text categorization, In ACM SIGIR'99, pp.81-82, 1999
- Schutze, H., Hull, D. A. and Pedersen, J. O., A comparison of classifier and document representations for the routing problem, In ACM SIGIR'95, pp.229-237, 1995
- Cortes, C. and Vapnik, V., Support vector networks, In Machine Learning, Vol.20, pp.273- 297, 1995
- Joachims, T., Learning to classify text using Support Vector Machines, Kluwer Academic Publishers, 2002
- Joachims, T., Text categorization with Support Vector Machines: learning with many relevant features, In ECML'98, pp.137-142, 1998
- Schapire, R. and Singer, Y., BoosTexter: A boosting-based system for text categorization, In Machine Learning, Vol.39, No.2-3, pp.135-168, 2000
- Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C.J., Classification and Regression Trees, Wadsworth Int. 1984
- Brodley, C. E. and Utgoff, P. E., Multivariate decision trees, In Machine Learning, Vol.19, No.1, pp.45-77, 1995
- Denoyer, L., Zaragoza, H. and Gallinari, P., HMM-based passage models for document classification and ranking, In ECIR'01, 2001
- Miller, D. R. H., Leek, T. and Schwartz, R. M., A Hidden Markov model information retrieval system, In ACM SIGIR'99, pp.214-221, 1999
- Kira, K. and Rendell, L. A practical approach to feature selection. In Proc. 9th International workshop on machine learning (pp. 249-256) 1992
- Gilad-Bachrach, Navot A., Tishby N. Margin Based Feature Selection - Theory and Algorithms. In Proc of ICML 2004
- Stanley F. Chen and Rosenfeld R. A Gaussian prior for smoothing maximum entropy models. Technical report CMU-CS-99108, Carnegie Mellon University, 1999
- Ronald Rosenfeld. Adaptive statistical language modelling: A maximum entropy approach, PhD thesis, Carnegie Mellon University, 1994
- Ratnparkhi Adwait, J. Reynar, S. Roukos. A maximum entropy model for prepositional phrase attachment. In proceedings of the ARPA Human Language Technology Workshop, pages 250-255, 1994
- Ratnparkhi Adwait. A maximum entropy model for part-of-speech tagging. In Proceedings of the Empirical Methods in Natural Language Conference, 1996
- Shannon C.E. 1948. A mathematical theory of communication. Bell System Technical Journal 27:379 - 423, 623 - 656
- Berger A, A Brief Maxent Tutorial. http://www-2.cs.cmu.edu/aberger/maxent.html 29.Berger A. 1997. The improved iterative scaling algorithm: a gentle introduction http://www-2.cs.cmu.edu/aberger/maxent.html
- Della Pietra S., Della Pietra V. and Lafferty J., Inducing features of random fields. IEEE transaction on Pattern Analysis and Machine Intelligence, 19(4), 1997
- Nigam K., J. Lafferty, A. McCallum. Using maximum entropy for text classification, 1999
- Dumais, S. T., Platt, J., Heckerman, D., and Sahami, M, Inductive learning algorithms and representations for text categorization. Submitted for publication, 1998 http://research.microsoft.com/sdumais/cikm98.doc
- Mikheev A., Feature Lattics and maximum entropy models. In machine Learning, McGraw-Hill, New York, 1999
- Yang, Y. and Pedersen J., A comparative study on feature selection in text categorization. Fourteenth International Conference on Machine Learning (ICML'97) pp 412-420, 1997
- Berger A., Della Pietra S., Della Pietra V., A maximum entropy approach to natural language processing, Computational Linguistics, 22 (1), pp 39-71, 1996
Paper Citation
in Harvard Style
Fragos K., Maistros Y. and Skourlas C. (2005). A Weighted Maximum Entropy Language Model for Text Classification . In Proceedings of the 2nd International Workshop on Natural Language Understanding and Cognitive Science - Volume 1: NLUCS, (ICEIS 2005) ISBN 972-8865-23-6X, pages 55-67. DOI: 10.5220/0002571800550067
in Bibtex Style
@conference{nlucs05,
author={Kostas Fragos and Yannis Maistros and Christos Skourlas},
title={A Weighted Maximum Entropy Language Model for Text Classification},
booktitle={Proceedings of the 2nd International Workshop on Natural Language Understanding and Cognitive Science - Volume 1: NLUCS, (ICEIS 2005)},
year={2005},
pages={55-67},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002571800550067},
isbn={972-8865-23-6X},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 2nd International Workshop on Natural Language Understanding and Cognitive Science - Volume 1: NLUCS, (ICEIS 2005)
TI - A Weighted Maximum Entropy Language Model for Text Classification
SN - 972-8865-23-6X
AU - Fragos K.
AU - Maistros Y.
AU - Skourlas C.
PY - 2005
SP - 55
EP - 67
DO - 10.5220/0002571800550067