5 CONCLUSIONS AND
FURTHER RESEARCH
This paper has argued that real time analysis of web
user behaviour requires a suitably distributed and
multistage processing architecture, that collects and
analyses user behaviour data online rather than in
batch. Thus, in this paper’s proposed approach web
logs are not stored on files or databases, with
processing instead taking place in memory and near
to the source of the events, as they occur. The
approach utilises multiple processing stages in order
to improve performance, scalability and resilience.
Mappers, reducers and rereducers can be added and
withdrawn from the monitoring system, either due to
failures or because of other availability and
performance requirements.
The proposed system can be integrated with
various types of decision support, and other
command-and- control-type systems. Visualisation
tools can be used for example, to show in real time
the activity status for the whole web site,
highlighting areas and paths of high or low activity.
A high traffic path can for example indicate the
pages with the most visitors as well as the order they
were visited. Aberrant behaviour can also be
detected with this system, as when a rate in some
type of behaviour exceeds a specified threshold
within a specified temporal window. An example of
such aberrant behaviour would be a more than 100%
increase in the number of ‘bouncers’ within an hour.
Finally, automated tools can be devised to
calculate the optimal numbers of mappers, reducers
and rereducers based on the number and type of
monitoring events and historical data about the web
site traffic such as audit trail data from the web
application performance.
REFERENCES
Agosti , Maristella , Franco Crivellari Giorgio Maria Di
Nunzio2012. Web log analysis: a review of a decade
of studies about information acquisition, inspection
and interpretation of user interaction. Data Mining and
Knowledge Discovery archive. Volume 24 Issue 3,
May 2012 . Pages 663-696.
Arsham, Hosein. 2012. Time-Critical Decision Making for
Business Administration. Merrick School of Business
University of Baltimore. Available from
http://home.ubalt.edu/ntsbarsh/stat-data/forecast.htm#
rgintroduction.
Campanile, Ferdinando Alessandro Cilardo, Luigi
Coppolino, Luigi Romano. 2007. Adaptable Parsing of
Real-Time Data Streams. In 15th Euromicro
International Conference on Parallel, Distributed and
Network-Based Processing (PDP'07) Naples, Italy
February 07-February 09.
Cayci, Aysegul Selcuk Sumengen Cagatay Turkay Selim
Balcisoy Yucel Saygin. 2009. Temporal Dynamics of
User Interests in Web Search Queries. In International
Conference on Advanced Information Networking and
Applications Workshops, Bradford, United Kingdom.
May 26-May 29.
Devi, T, Ramasubramanian, Thiyagarajan Sivakumar
Kuppusamy. 2012. Hive and Hadoop for Data
Analytics on Large Web Logs. Available from
http://www.devx.com/Java/Article/48100.
Fu, Qiang Jian-Guang Lou Yi Wang Jiang Li. 2009.
Execution Anomaly Detection in Distributed Systems
through Unstructured Log Analysis. In Ninth IEEE
International Conference on Data Mining. Miami,
Florida December 06-December 09.
Hawwash, Basheer & Olfa Nasraoui. 2010. Mining
andtracking evolving web user trends from large web
server logs. Statistical Analysis and Data Mining
archive Volume 3 Issue 2, April 2010 Pages 106-125
John Wiley & Sons, Inc. New York, NY, USA.
Iyengar, Arun K. Mark S. Squillante Li Zhang. 1999.
Analysis and characterization of large-scale Web
server access patterns and performance. World Wide
Web archives Volume 2 Issue 1-2, 1999 Pages 85 –
100.
Lee, Kyong-Ha , Yoon-Joon Lee, Hyunsik Choi, Yon
Dohn Chung, Bongki Moon. 2011. Parallel Data
Processing with MapReduce: A Survey. SIGMOD
Record, December 2011 (Vol. 40, No. 4).
Li, Lei (2011) Fast Algorithms for Mining Co-evolving
TimeSeries September 2011 CMU-CS-11-127.
Computer Science Department. School of Computer
Science. Carnegie Mellon University Pittsburgh, PA.
Masseglia, F. M. Teisseire P. Poncelet (2001) Real-Time
Web Usage Mining: A Heuristic Based Distributed
Miner. WISE '01 In Proceedings of the Second
International Conference on Web Information Systems
Engineering (WISE'01) Volume 1 - Volume 1 Page
288 IEEE Computer Society Washington, DC, USA.
Masseglia, F., M. Teisseire, P. Poncelet. 2002. Real Time
Web Usage Mining with a Distributed Navigation
Analysis. In Proc. 12th International Workshop on
Research Issues in Data Engineering: Engineering E-
Commerce/E-Business Systems (RIDE'02) San Jose,
California February 24-February 25.
Pabarskaite, Zidrina. 2003. Decision trees for web log
mining. Intelligent Data Analysis archive. Volume 7
Issue 2, April 2003 Pages 141 - 154 .
Pun, Ka-I Yain-Whar Si. 2009. Audit Trail Analysis for
TrafficIntensive Web Application. In
IEEE
International Conference on e-Business Engineering
Macau, China October 21-October 23.
Royans. 2010. Real-Time MapReduce using S4. Available
at http://www.royans.net/arch/page/4/.
Sudhamathy, G. 2010. Mining web logs: an automated
approach. In A2CWiC '10: Proceedings of the 1st
Amrita ACM-W Celebration on Women in Computing
AMapReduceArchitectureforWebSiteUserBehaviourMonitoringinRealTime
51