SUPPORTING INFORMATION RETRIEVAL IN RSS FEEDS

Georges Dubus, Mathieu Bruyen, Nacéra Bennacer

Abstract

Really Simple Syndication (RSS) information feeds present new challenges to information retrieval technologies. In this paper we propose a RSS feeds retrieval approach which aims to give for an user a personalized view of items and making easier the access to their content. In our proposal, we define different filters in order to construct the vocabulary used in text describing items feeds. This filtering takes into account both the lexical category and the frequency of terms. The set of items feeds is then represented in a m-dimensional vector space. The k-means clustering algorithm with an adapted centroid computation and a distance measure is applied to find automatically clusters. The clusters indexed by relevant terms can so be refined, labeled and browsed by the user. We experiment the approach on a collection of items feeds collected from news sites. The resulting clusters show a good quality of their cohesion and their separation. This provides meaningful classes to organize the information and to classify new items feeds.

References

  1. Aliguliyev, R. M. (2009). Performance evaluation of density-based clustering methods. In Information Sciences, volume 179, pages 3583-3602.
  2. Arthur, D. and Vassilvitskii, S. (2007). k-means++: The advantages of careful seeding. In Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, pages 1027-1035. Society for Industrial and Applied Mathematics.
  3. Cimiano, P., Handschuh, S., and Staab, S. (2005). Gimme'the context : Context driven automatic semantic annotation with c-pankow. In Proceeddings of Wide World Web Conference (WWW). ACM.
  4. Etzioni, O., Cafarella, M., Downey, D., Kok, S., Popescu, A., Shaked, T., Soderland, S., Weld, D., and Yates, A. (2005). Unsupervised named-entity extraction from the web: An experimental study. In Artificial Intelligence Journal, volume 165(1), pages 91-134.
  5. Jain, A. K., Murty, M. N., and Flynn, P. J. (1999). Data clustering: a review. volume 31, pages 264-323.
  6. Salton, G., Wong, A., and Yang, C. S. (1975). A vector space model for information retrieval. In Communications of the ACM, volume 18(11), pages 613-620.
  7. Thiam, M., Bennacer, N., Pernelle, N., and Loˆ, M. (2009). Incremental ontology-based extraction and alignment in semi-structured documents. In Proceedings of Dexa conference, LNCS 5690, pages 611-618. Springer.
Download


Paper Citation


in Harvard Style

Dubus G., Bruyen M. and Bennacer N. (2010). SUPPORTING INFORMATION RETRIEVAL IN RSS FEEDS . In Proceedings of the 6th International Conference on Web Information Systems and Technology - Volume 1: WEBIST, ISBN 978-989-674-025-2, pages 307-312. DOI: 10.5220/0002809103070312


in Bibtex Style

@conference{webist10,
author={Georges Dubus and Mathieu Bruyen and Nacéra Bennacer},
title={SUPPORTING INFORMATION RETRIEVAL IN RSS FEEDS},
booktitle={Proceedings of the 6th International Conference on Web Information Systems and Technology - Volume 1: WEBIST,},
year={2010},
pages={307-312},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002809103070312},
isbn={978-989-674-025-2},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 6th International Conference on Web Information Systems and Technology - Volume 1: WEBIST,
TI - SUPPORTING INFORMATION RETRIEVAL IN RSS FEEDS
SN - 978-989-674-025-2
AU - Dubus G.
AU - Bruyen M.
AU - Bennacer N.
PY - 2010
SP - 307
EP - 312
DO - 10.5220/0002809103070312