
results is that the length independenceassumptions we made in Section 2 are too unreal-
istic and perhaps an explicit length model has to be included in our general formulation.
We feel that better results could be achieved by improving the feature selection
techniques and perhaps including a weighting of the different terms, in a similar way as
it is done in prototype selection for
nearest neighbors classifiers.
References
1. McCallum, A., Nigam, K.: A comparison of event models for naive Bayes text classification.
In: AAAI/ICML-98 Workshop on Learning for Text Categorization, AAAI Press (1998) 41–
48
2. Lafuente, J., Juan, A.: Comparaci´on de Codificaciones de Documentos para Clasificaci´on
con K Vecinos M´as Pr´oximos. In: Proc. of the I Jornadas de Tratamiento y Recuperaci´on de
Informaci´on (JOTRI), Val`encia (Spain) (2002) 37–44 (In spanish).
3. Nigam, K., Lafferty, J., McCallum, A.: Using maximum entropy for text classification (1999)
4. Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization.
In Fisher, D.H., ed.: Proceedings of ICML-97, 14th International Conference on Machine
Learning, Nashville, US, Morgan Kaufmann Publishers, San Francisco, US (1997) 412–420
5. Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley Series in Telecommuni-
cations. John Wiley & Sons, New York, NY, USA (1991)
6. Nigam, K., McCallum, A.K., Thrun, S., Mitchell, T.M.: Text classification from labeled and
unlabeled documents using EM. Machine Learning 39 (2000) 103–134
7. Ney, H., Martin, S., Wessel, F.: Satistical Language Modeling Using Leaving-One-Out. In:
Corpus-based Methods in Language and Speech Proceesing. Kluwer Academic Publishers,
Dordrecht, the Netherlands (1997) 174–207
8. Juan, A., Ney, H.: Reversing and Smoothing the Multinomial Naive Bayes Text Classifier.
In: Proc. of the 2nd Int. Workshop on Pattern Recognition in Information Systems (PRIS
2002), Alacant (Spain) (2002) 200–212
9. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. John Wiley & Sons, New York,
NY, USA (2001)
10. Efron, B., Tibshirani, R.J.: An Introduction to the Bootstrap. Chapman & Hall, New York,
NY, USA (1993)
11. Group, C.T.L.: World wide knowledge base (web
kb) project. (http://www-
2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/wwkb/)
12. Chen, S.F., Goodman, J.: An empirical study of smoothing techniques for language mod-
eling. In Joshi, A., Palmer, M., eds.: Proceedings of the Thirty-Fourth Annual Meeting of
the Association for Computational Linguistics, San Francisco, Morgan Kaufmann Publishers
(1996) 310–318
117