Evaluating the Unification of Multiple Information Retrieval Techniques into a News Indexing Service

Christos Bouras, Vassilis Tsogkas

Abstract

While online information sources are rapidly increasing in amount, so does the daily available online news content. Several approaches have being proposed for organizing this immense amount of data. In this work we explore the integration of multiple information retrieval techniques, like text preprocessing, n-grams expansion, summarization, categorization and item/user clustering into a single mechanism designed to consolidate and index news articles from major news portals from around the web. Our goal is to allow users to seamlessly and quickly get the news of the day that are of appeal to them via our system. We show how, the application of each one of the proposed techniques gradually improves the precision results in terms of the suggested news articles for a number of registered system users and how, aggregately, these techniques provide a unified solution to the recommendation problem.

References

  1. Bouras, C., & Tsogkas, V., 2008. Improving text summarization using noun retrieval techniques. In Knowledge-Based Intelligent Information and Engineering Systems (pp. 593-600).
  2. Bouras, C., & Tsogkas, V., 2010. W-kmeans: clustering news articles using WordNet. In Knowledge-Based and Intelligent Information and Engineering Systems (pp. 379-388).
  3. Bouras, C., & Tsogkas, V., 2011. Clustering user preferences using W-kmeans. In Signal-Image Technology and Internet-Based Systems (SITIS), 2011 Seventh International Conference on (pp. 75-82). IEEE.
  4. Bouras, C., & Tsogkas, V. Enhancing news articles clustering using word n-grams. 2013. 2nd Intenational Conference on Data Management Technologies and Applications, Reykjavvk, Iceland
  5. Bianco, A., Mardente, G., Mellia, M., Munafo, M., & Muscariello, L., 2005. Web user session characterization via clustering techniques. In Global Telecommunications Conference, 2005. GLOBECOM'05. IEEE (Vol. 2, pp. 6-pp). IEEE.
  6. Hand, D. J., Mannila, H., & Smyth, P., 2001. Principles of data mining. MIT press.
  7. Kim, B. M., Li, Q., Park, C. S., Kim, S. G., & Kim, J. Y. (2006). A new approach for combining content-based and collaborative filters. Journal of Intelligent Information Systems, 27(1), 79-91.
  8. Lops, P., Degemmis, M., & Semeraro, G., 2007. Improving social filtering techniques through WordNet-Based user profiles. In User Modeling 2007 (pp. 268-277).
  9. Ntoutsi, E., Stefanidis, K., Nørvåg, K., & Kriegel, H. P., 2012, Fast group recommendations by applying user clustering. In Conceptual Modeling (pp. 126-140).
  10. Moore, R., Lopes, J., 1999. Paper templates. In TEMPLATE'06, 1st International Conference on Template Production. SciTePress.
  11. Pazzani, M. J., & Billsus, D., 2007. Content-based recommendation systems. In The adaptive web (pp. 325-341).
  12. Smith, J., 1998. The book, The publishing company. London, 2nd edition.
  13. Tang, N., & Vemuri, V. R., 2005. User-interest-based document filtering via semi-supervised clustering. In Foundations of Intelligent Systems (pp. 573-582).
  14. White, R. W., Chu, W., Hassan, A., He, X., Song, Y., & Wang, H., 2013. Enhancing personalized search by mining and modeling task behavior. In Proceedings of the 22nd international conference on World Wide Web (pp. 1411-1420). International World Wide Web Conferences Steering Committee.
  15. Yu, K., Schwaighofer, A., & Tresp, V. (2002, August). Collaborative ensemble learning: Combining collaborative and content-based information filtering via hierarchical Bayes. In Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence (pp. 616-623).
Download


Paper Citation


in Harvard Style

Bouras C. and Tsogkas V. (2014). Evaluating the Unification of Multiple Information Retrieval Techniques into a News Indexing Service . In Proceedings of 3rd International Conference on Data Management Technologies and Applications - Volume 1: DATA, ISBN 978-989-758-035-2, pages 33-40. DOI: 10.5220/0004998000330040


in Bibtex Style

@conference{data14,
author={Christos Bouras and Vassilis Tsogkas},
title={Evaluating the Unification of Multiple Information Retrieval Techniques into a News Indexing Service},
booktitle={Proceedings of 3rd International Conference on Data Management Technologies and Applications - Volume 1: DATA,},
year={2014},
pages={33-40},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004998000330040},
isbn={978-989-758-035-2},
}


in EndNote Style

TY - CONF
JO - Proceedings of 3rd International Conference on Data Management Technologies and Applications - Volume 1: DATA,
TI - Evaluating the Unification of Multiple Information Retrieval Techniques into a News Indexing Service
SN - 978-989-758-035-2
AU - Bouras C.
AU - Tsogkas V.
PY - 2014
SP - 33
EP - 40
DO - 10.5220/0004998000330040