COLLECTIVE BEHAVIOUR IN INTERNET - Tendency Analysis of the Frequency of User Web Queries

Joan Codina-Filba, David F. Nettleton

2010

Abstract

In this paper we propose a classification for different observable trends over time for user web queries. The focus is on the identification of general collective trends, based on search query keywords, of the user community in Internet and how they behave over a given time period. We give some representative examples of real search queries and their tendencies. From these examples we define a set of descriptive features which can be used as inputs for data modelling. Then we use a selection of non supervised (clustering) and supervised modelling techniques to classify the trends. The results show that it is relatively easy to classify the basic hypothetical trends we have defined, and we identify which of the chosen learning techniques are best able to model the data. However, the presence of more complex, noisy or mixed trends make the classification more difficult.

References

  1. Aizen, J., Huttenlocher, D., Kleinberg, J., Novak, A. (2003). Traffic-Based Feedback on the Web. Department of Computer Science, Cornell University. Proceedings of the National Academy of Sciences.
  2. Baeza-Yates, R., Hurtado, C., Mendoza, M. and Dupret G. (2005). Modeling user search behavior. In Proceedings of the Third Latin American Web Congress 2005, p. 242 - 251. Buenos Aires, Argentina, Oct. 2005.
  3. Cacheda, F., and Viña, Á. (2001). Experiences retrieving information in the world wide web. In Proceedings of the 6th IEEE Symposium on Computers and Communications, pp. 72-79. Hammamet, Tunisia. July.
  4. Cho, J. and Roy, S. (2004). Impact of Search Engines on Page Popularity. In Proceedings of the Thirteenth International World Wide Web Conference, New York, USA, Pages: 20 - 29.
  5. Choi, H., Varian, H (2009). Predicting Initial Claims for Unemployment Benefits. Google Inc. - research.google.com
  6. Choi, H., Varian, H. (2010). Predicting the Present with Google Trends. Google Inc., Draft - www.googleresearch.blogspot.com
  7. Google Trends. (2010). Google Inc. www.google.com/trends
  8. Hall, M., Eibe F., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I. (2009). The WEKA Data Mining Software: An Update; SIGKDD Explorations, Volume 11, Issue 1.
  9. Klienberg, J. (2002). Bursty and Hierarchical Structure in Streams. Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
  10. Nettleton, D.F., Baeza-Yates, R. (2008). Web retrieval: Techniques for the aggregation and selection of queries and answers. International Journal of Intelligent Systems, Vol. 23/12 p1223-1234.
  11. Nettleton, D.F., Orriols-Puig, A., Fornells, A. (2010). A Study of the Effect of Different Types of Noise on the Precision of Supervised Learning Techniques. Artificial Intelligence Review, Ed. Springer, Vol. 33, Num. 4, p275-306.
  12. Rabiner, L. R. (1989). A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proc. of the IEEE, Vol.77, No.2, pp.257- 286.
  13. Rech, Jörg. (2007). Discovering trends in software engineering with google trends. ACM SIGSOFT Software Eng. Notes, Vol. 32 , Issue 2, Pages 1 - 2.
  14. Silvestri, F. (2004). High Performance Issues in Web Search Engines: Algorithms and Techniques. Ph. D. Thesis TD 5/04. Universita degli studi di Pisa, Dipartimento di Informatica, May 2004, http://hpc.isti.cnr.it/silvestr
  15. Yu, S., Zhou, Z.H., Steinbach, M., Hand, D.J. & Steinberg, D. (2007). Top 10 algorithms in data mining. Knowledge and Inf. Systems, 14(1):1-37.
Download


Paper Citation


in Harvard Style

Codina-Filba J. and Nettleton D. (2010). COLLECTIVE BEHAVIOUR IN INTERNET - Tendency Analysis of the Frequency of User Web Queries . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2010) ISBN 978-989-8425-28-7, pages 168-175. DOI: 10.5220/0003069501680175


in Bibtex Style

@conference{kdir10,
author={Joan Codina-Filba and David F. Nettleton},
title={COLLECTIVE BEHAVIOUR IN INTERNET - Tendency Analysis of the Frequency of User Web Queries},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2010)},
year={2010},
pages={168-175},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003069501680175},
isbn={978-989-8425-28-7},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2010)
TI - COLLECTIVE BEHAVIOUR IN INTERNET - Tendency Analysis of the Frequency of User Web Queries
SN - 978-989-8425-28-7
AU - Codina-Filba J.
AU - Nettleton D.
PY - 2010
SP - 168
EP - 175
DO - 10.5220/0003069501680175