A MapReduce Architecture for Web Site User Behaviour Monitoring in Real Time

Bill Karakostas, Babis Theodoulidis


Monitoring the behaviour of large numbers of web site users in real time poses significant performance challenges, due to the decentralised location and volume of generated data. This paper proposes a MapReduce-style architecture where the processing of event series from the Web users is performed by a number of cascading mappers, reducers and rereducers, local to the event origin. With the use of static analysis and a prototype implementation, we show how this architecture is capable to carry out time series analysis in real time for very large web data sets, based on the actual events, instead of resorting to sampling or other extrapolation techniques.


  1. Agosti , Maristella , Franco Crivellari Giorgio Maria Di Nunzio2012. Web log analysis: a review of a decade of studies about information acquisition, inspection and interpretation of user interaction. Data Mining and Knowledge Discovery archive. Volume 24 Issue 3, May 2012 . Pages 663-696.
  2. Arsham, Hosein. 2012. Time-Critical Decision Making for Business Administration. Merrick School of Business University of Baltimore. Available from http://home.ubalt.edu/ntsbarsh/stat-data/forecast.htm# rgintroduction.
  3. Campanile, Ferdinando Alessandro Cilardo, Luigi Coppolino, Luigi Romano. 2007. Adaptable Parsing of Real-Time Data Streams. In 15th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP'07) Naples, Italy February 07-February 09.
  4. Cayci, Aysegul Selcuk Sumengen Cagatay Turkay Selim Balcisoy Yucel Saygin. 2009. Temporal Dynamics of User Interests in Web Search Queries. In International Conference on Advanced Information Networking and Applications Workshops, Bradford, United Kingdom. May 26-May 29.
  5. Devi, T, Ramasubramanian, Thiyagarajan Sivakumar Kuppusamy. 2012. Hive and Hadoop for Data Analytics on Large Web Logs. Available from http://www.devx.com/Java/Article/48100.
  6. Fu, Qiang Jian-Guang Lou Yi Wang Jiang Li. 2009. Execution Anomaly Detection in Distributed Systems through Unstructured Log Analysis. In Ninth IEEE International Conference on Data Mining. Miami, Florida December 06-December 09.
  7. Hawwash, Basheer & Olfa Nasraoui. 2010. Mining andtracking evolving web user trends from large web server logs. Statistical Analysis and Data Mining archive Volume 3 Issue 2, April 2010 Pages 106-125 John Wiley & Sons, Inc. New York, NY, USA.
  8. Iyengar, Arun K. Mark S. Squillante Li Zhang. 1999. Analysis and characterization of large-scale Web server access patterns and performance. World Wide Web archives Volume 2 Issue 1-2, 1999 Pages 85 - 100.
  9. Lee, Kyong-Ha , Yoon-Joon Lee, Hyunsik Choi, Yon Dohn Chung, Bongki Moon. 2011. Parallel Data Processing with MapReduce: A Survey. SIGMOD Record, December 2011 (Vol. 40, No. 4).
  10. Li, Lei (2011) Fast Algorithms for Mining Co-evolving TimeSeries September 2011 CMU-CS-11-127. Computer Science Department. School of Computer Science. Carnegie Mellon University Pittsburgh, PA.
  11. Masseglia, F. M. Teisseire P. Poncelet (2001) Real-Time Web Usage Mining: A Heuristic Based Distributed Miner. WISE 7801 In Proceedings of the Second International Conference on Web Information Systems Engineering (WISE'01) Volume 1 - Volume 1 Page 288 IEEE Computer Society Washington, DC, USA.
  12. Masseglia, F., M. Teisseire, P. Poncelet. 2002. Real Time Web Usage Mining with a Distributed Navigation Analysis. In Proc. 12th International Workshop on Research Issues in Data Engineering: Engineering ECommerce/E-Business Systems (RIDE'02) San Jose, California February 24-February 25.
  13. Pabarskaite, Zidrina. 2003. Decision trees for web log mining. Intelligent Data Analysis archive. Volume 7 Issue 2, April 2003 Pages 141 - 154 .
  14. Pun, Ka-I Yain-Whar Si. 2009. Audit Trail Analysis for TrafficIntensive Web Application. In IEEE International Conference on e-Business Engineering Macau, China October 21-October 23.
  15. Royans. 2010. Real-Time MapReduce using S4. Available at http://www.royans.net/arch/page/4/.
  16. Sudhamathy, G. 2010. Mining web logs: an automated approach. In A2CWiC 7810: Proceedings of the 1st Amrita ACM-W Celebration on Women in Computing in India September 2010.
  17. Sugaya, Midori, Ken Igarashi Masaaki Goshima Shinpei Nakata Kimio Kuramitsu. 2011. Extensible online log analysis system. In EWDC 7811 Proceedings of the 13th European Workshop on Dependable Computing Pages 79-84.
  18. Yoshino, Matsuki , Atsuro Handa, Norihisa Komoda, Michiko Oba. 2011. Resource usage monitoring for web systems using real-time statistical analysis of log data. In The 13th WSEAS international conference on mathematical methods, computational techniques and intelligent systems.
  19. Ying Zhang, Bernard J. Jansen, Amanda Spink. 2009. Time series analysis of a Web search engine transaction log Information Processing and Management: an International Journal, Volume 45 Issue 2.
  20. Zhang, Fan,Junwei Cao, Xiaolong Song, Hong Cai, Chen Wu. 2010 . AMREF: An Adaptive MapReduce Framework for Real Time Applications. In GCC 2010: (157-162).

Paper Citation

in Harvard Style

Karakostas B. and Theodoulidis B. (2013). A MapReduce Architecture for Web Site User Behaviour Monitoring in Real Time . In Proceedings of the 2nd International Conference on Data Technologies and Applications - Volume 1: DATA, ISBN 978-989-8565-67-9, pages 45-52. DOI: 10.5220/0004332600450052

in Bibtex Style

author={Bill Karakostas and Babis Theodoulidis},
title={A MapReduce Architecture for Web Site User Behaviour Monitoring in Real Time},
booktitle={Proceedings of the 2nd International Conference on Data Technologies and Applications - Volume 1: DATA,},

in EndNote Style

JO - Proceedings of the 2nd International Conference on Data Technologies and Applications - Volume 1: DATA,
TI - A MapReduce Architecture for Web Site User Behaviour Monitoring in Real Time
SN - 978-989-8565-67-9
AU - Karakostas B.
AU - Theodoulidis B.
PY - 2013
SP - 45
EP - 52
DO - 10.5220/0004332600450052