CONTENT-BASED RECOMMENDATION ALGORITHMS ON THE HADOOP MAPREDUCE FRAMEWORK

Toon De Pessemier, Kris Vanhecke, Simon Dooms, Luc Martens

Abstract

Content-based recommender systems are widely used to generate personal suggestions for content items based on their metadata description. However, due to the required (text) processing of these metadata, the computational complexity of the recommendation algorithms is high, which hampers their application in large-scale. This computational load reinforces the necessity of a reliable, scalable and distributed processing platform for calculating recommendations. Hadoop is such a platform that supports data-intensive distributed applications based on map and reduce tasks. Therefore, we investigated how Hadoop can be utilized as a cloud computing platform to solve the scalability problem of content-based recommendation algorithms. The various MapReduce operations, necessary for keyword extraction and generating content-based suggestions for the end-user, are elucidated in this paper. Experimental results on Wikipedia articles prove the appropriateness of Hadoop as an efficient and scalable platform for computing content-based recommendations.

References

  1. Brown, R. A. (2009). Hadoop at home: large-scale computing at a small college. In SIGCSE 7809: Proceedings of the 40th ACM technical symposium on Computer science education, pages 106-110, New York, NY, USA. ACM.
  2. Dean, J. and Ghemawat, S. (2008). Mapreduce: simplified data processing on large clusters. Commun. ACM, 51(1):107-113.
  3. Elsayed, T., Lin, J., and Oard, D. W. (2008). Pairwise document similarity in large collections with mapreduce. In HLT 7808: Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies, pages 265-268, Morristown, NJ, USA. Association for Computational Linguistics.
  4. Ghemawat, S., Gobioff, H., and Leung, S.-T. (2003). The google file system. In SOSP 7803: Proceedings of the nineteenth ACM symposium on Operating systems principles, pages 29-43, New York, NY, USA. ACM.
  5. Lämmel, R. (2007). Google's mapreduce programming model - revisited. Sci. Comput. Program., 68(3):208-237.
  6. Mladenic, D. (1999). Text-learning and related intelligent agents: A survey. IEEE Intelligent Systems, 14(4):44- 54.
  7. Salton, G. and McGill, M. J. (1983). Introduction to modern information retrieval. McGraw-Hill computer science series. McGraw-Hill, New York, NY.
Download


Paper Citation


in Harvard Style

De Pessemier T., Vanhecke K., Dooms S. and Martens L. (2011). CONTENT-BASED RECOMMENDATION ALGORITHMS ON THE HADOOP MAPREDUCE FRAMEWORK . In Proceedings of the 7th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST, ISBN 978-989-8425-51-5, pages 237-240. DOI: 10.5220/0003193802370240


in Bibtex Style

@conference{webist11,
author={Toon De Pessemier and Kris Vanhecke and Simon Dooms and Luc Martens},
title={CONTENT-BASED RECOMMENDATION ALGORITHMS ON THE HADOOP MAPREDUCE FRAMEWORK},
booktitle={Proceedings of the 7th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,},
year={2011},
pages={237-240},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003193802370240},
isbn={978-989-8425-51-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 7th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,
TI - CONTENT-BASED RECOMMENDATION ALGORITHMS ON THE HADOOP MAPREDUCE FRAMEWORK
SN - 978-989-8425-51-5
AU - De Pessemier T.
AU - Vanhecke K.
AU - Dooms S.
AU - Martens L.
PY - 2011
SP - 237
EP - 240
DO - 10.5220/0003193802370240