USAGE BASED INDEXING OF WEB RESOURCES WITH NATURAL LANGUAGE PROCESSING

Armelle Brun, Anne Boyer

Abstract

Due to the huge amount of available information via Internet, the identification of reliable and interesting items becomes more and more difficult and time consuming. This paper is a position paper describing our intended work in the framework of multimedia information retrieval by browsing techniques within web navigation. It relies on a usage-based indexing of resources: we ignore the nature, the content and the structure of resources. We describe a new approach taking advantage of the similarity between statistical modeling of language and document retrieval systems. A syntax of usage is computed that designs a Statistical Grammar of Usage (SGU). A SGU enables resources classification to perform a personalized navigation assistant tool. It relies both on collaborative filtering to compute virtual communities of users and a new distance dependent trigger model. The resulting SGU is a community dependent SGU.

References

  1. Abramson, N. (1963). Information Theory and Coding. McGraw-Hill, New-York.
  2. Baeza-Yates, R. and Ribeiro-Neto, B. (1999). Modern Information Retrieval. ACM Press, New York.
  3. Brun, A., Smaïli, K., and Haton, J. (2002). Contribution too topic identification by using word similarity. In ICSLP2002.
  4. Castagnos, S. and Boyer, A. (2006a). A client/server userbased collaborative filtering algorithm model and implementation. In Proceedings of ECAI 2006, Italy.
  5. Castagnos, S. and Boyer, A. (2006b). Frac+: A distributed collaborative filtering model for client/server architectures. In WEBIST 2006, Portugal.
  6. Chan, P. (1999). A non-invasive learning approach to building web user profiles. In KDD 1999 - Workshop on Web Usage Analysis and User Profiling, USA.
  7. Herlocker, J., Konstan, J., Terveen, L., and Riedl, J. (2004). Evaluating collaborative filtering recommender systems. ACM Transactions on Information Systems (TOIS), 22(1):5-53.
  8. Jelinek, F. and Mercer, R. (1980). Interpolated estimation of markov source parameters from sparse data. In Wk. on Pattern Recognition in Practice, pages 381-397.
  9. Jurawski, D. and Martin, J. H. (2000). Speech and Language Processing: an Introduction to Natural Language Processing. Prentice-Hall.
  10. Rosenfeld, R. (1996). A maximum entropy approach to adaptative statistical language modeling. Computer Speech and Language, 10:187-228.
  11. Rosenfeld, R. (2000). Two decades of statistical language modeling: Where do we go from here.
  12. Smaïli, K., Brun, A., Zitouni, I., and Haton, J. (1999). Automatic and manual clustering for large vocabulary speech re cognition: A comparative study. In Eurospeech'99, Hungary.
Download


Paper Citation


in Harvard Style

Brun A. and Boyer A. (2007). USAGE BASED INDEXING OF WEB RESOURCES WITH NATURAL LANGUAGE PROCESSING . In Proceedings of the Third International Conference on Web Information Systems and Technologies - Volume 2: WEBIST, ISBN 978-972-8865-78-8, pages 220-225. DOI: 10.5220/0001278902200225


in Bibtex Style

@conference{webist07,
author={Armelle Brun and Anne Boyer},
title={USAGE BASED INDEXING OF WEB RESOURCES WITH NATURAL LANGUAGE PROCESSING},
booktitle={Proceedings of the Third International Conference on Web Information Systems and Technologies - Volume 2: WEBIST,},
year={2007},
pages={220-225},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001278902200225},
isbn={978-972-8865-78-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Third International Conference on Web Information Systems and Technologies - Volume 2: WEBIST,
TI - USAGE BASED INDEXING OF WEB RESOURCES WITH NATURAL LANGUAGE PROCESSING
SN - 978-972-8865-78-8
AU - Brun A.
AU - Boyer A.
PY - 2007
SP - 220
EP - 225
DO - 10.5220/0001278902200225