ON THE EVOLUTION OF SEARCH ENGINE RANKINGS

PanagiotisTakis Metaxas

2009

Abstract

Search Engines have greatly influenced the way we experience the web. Since the early days of the web, users have been relying on them to get informed and make decisions. When the web was relatively small, web directories were built and maintained using human experts to screen and categorize pages according to their characteristics. By the mid 1990's, however, it was apparent that the human expert model of categorizing web pages does not scale. The first search engines appeared and they have been evolving ever since, taking over the role that web directories used to play. But what need makes a search engine evolve? Beyond the financial objectives, there is a need for quality in search results. Users interact with search engines through search query results. Search engines know that the quality of their ranking will determine how successful they are. If users perceive the results as valuable and reliable, they will use it again. Otherwise, it is easy for them to switch to another search engine. Search results, however, are not simply based on well-designed scientific principles, but they are influenced by web spammers. Web spamming, the practice of introducing artificial text and links into web pages to affect the results of web searches, has been recognized as a major search engine problem. It is also a serious users problem because they are not aware of it and they tend to confuse trusting the search engine with trusting the results of a search. In this paper, we analyze the influence that web spam has on the evolution of the search engines and we identify the strong relationship of spamming methods on the web to propagandistic techniques in society. Our analysis provides a foundation for understanding why spamming works and offers new insight on how to address it. In particular, it suggests that one could use social anti-propagandistic techniques to recognize web spam.

References

  1. Benczúr, A., Csalogány, K., Sarlós, T., and Uher, M. (2005). Spam Rank - Fully automatic link spam detection. In Proceedings of the AIRWeb Workshop.
  2. Bharat, K., Chang, B.-W., Henzinger, M. R., and Ruhl, M. (2001). Who links to whom: Mining linkage between web sites. In Proceedings of the 2001 IEEE International Conference on Data Mining, pages 51-58.
  3. Bianchini, M., Gori, M., and Scarselli, F. (2003). PageRank and web communities. In Web Intelligence Conference 2003.
  4. Brin, S. and Page, L. (1998). The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, 30(1-7):107-117.
  5. Broder, A. (2002). A taxonomy of web search. SIGIR Forum, 36(2):3-10.
  6. Broder, A., Kumar, R., Maghoul, F., Raghavan, P., Rajagopalan, S., Stata, R., Tomkins, A., and Wiener, J. (2000). Graph structure in the web. Comput. Networks, 33(1-6):309-320.
  7. Cho, J. and Roy, S. (2004). Impact of search engines on page popularity. In WWW 2004.
  8. CNETNews (1996). Engine sells results, draws fire. http://news.cnet.com/2100-1023-215491.html.
  9. Corey, T. S. (2001). Catching on-line traders in a web of lies: The perils of internet stock fraud. Ford Marrin Esposito, Witmeyer & Glesser, LLP. http://www.fmew.com/archive/lies/.
  10. Fetterly, D., Manasse, M., and Najork, M. (2004). Spam, damn spam, and statistics. In WebDB2004.
  11. Fetterly, D., Manasse, M., Najork, M., and Wiener, J. (2003). A large-scale study of the evolution of web pages. In Proceedings of the twelfth international conference on World Wide Web, pages 669-678. ACM Press.
  12. Flake, G. W., Lawrence, S., Giles, C. L., and Coetzee, F. (2002). Self-organization of the web and identification of communities. IEEE Computer, 35(3):66-71.
  13. Graham, L. and Metaxas, P. T. (2003). “Of course it's true; i saw it on the internet!”: Critical thinking in the internet era. Commun. ACM, 46(5):70-75.
  14. Gyöngyi, Z. and Garcia-Molina, H. (2005). Web spam taxonomy. In Proceedings of the AIRWeb Workshop.
  15. Gyöngyi, Z., Garcia-Molina, H., and Pedersen, J. (2004). Combating web spam with TrustRank. In VLDB 2004.
  16. Hansell, S. (2007). Google keeps tweaking its search engine. New York Times.
  17. Henzinger, M. R. (2001). Hyperlink analysis for the web. IEEE Internet Computing, 5(1):45-50.
  18. Henzinger, M. R., Motwani, R., and Silverstein, C. (2002). Challenges in web search engines. SIGIR Forum, 36(2):11-22.
  19. Hindman, M., Tsioutsiouliklis, K., and Johnson, J. (2003). Googlearchy: How a few heavily-linked sites dominate politics on the web. In Annual Meeting of the Midwest Political Science Association.
  20. Introna, L. and Nissenbaum, H. (2000). Defining the web: The politics of search engines. Computer, 33(1):54- 62.
  21. Kleinberg, J. M. (1999). Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604- 632.
  22. Kumar, R., Raghavan, P., Rajagopalan, S., and Tomkins, A. (1999). Trawling the Web for emerging cybercommunities. Computer Networks (Amsterdam, Netherlands: 1999), 31(11-16):1481-1493.
  23. Lee, A. M. and Lee(eds.), E. B. (1939). The Fine Art of Propaganda. The Institute for Propaganda Analysis. Harcourt, Brace and Co.
  24. Lynch, C. A. (2001). When documents deceive: trust and provenance as new factors for information retrieval in a tangled web. J. Am. Soc. Inf. Sci. Technol., 52(1):12- 17.
  25. Marchiori, M. (1997). The quest for correct information on the web: hyper search engines. Comput. Netw. ISDN Syst., 29(8-13):1225-1235.
  26. Maulding, M. L. (1997). Lycos: Design choices in an internet search service. IEEE Expert, JanuaryFebruary(12):8-11.
  27. Pringle, G., Allison, L., and Dowe, D. L. (1998). What is a tall poppy among web pages? In Proceedings of the seventh international conference on World Wide Web 7, pages 369-377. Elsevier Science Publishers B. V.
  28. Raghavan, P. (2002). Social networks: From the web to the enterprise. IEEE Internet Computing, 6(1):91-94.
  29. Salton, G. (1972). Dynamic document processing. Commun. ACM, 15(7):658-668.
  30. Silverstein, C., Marais, H., Henzinger, M., and Moricz, M. (1999). Analysis of a very large web search engine query log. SIGIR Forum, 33(1):6-12.
  31. Totty, M. and Mangalindan, M. (2003). As google becomes web's gatekeeper, sites fight to get in. In Wall Street Journal CCXLI(39).
  32. Vedder, A. (2000). Medical data, new information technologies and the need for normative principles other than privacy rules. In Law and Medicine. M. Freeman and A. Lewis (Eds.), (Series Current Legal Issues), pages 441-459. Oxford University Press.
  33. Wasserman, S. and Faust, K. (1994). Social Network Analysis: Methods and Applications. Cambridge University Press.
  34. Wu, B. and Davison, B. (2005). Identifying link farm spam pages. In WWW 2005.
Download


Paper Citation


in Harvard Style

Metaxas P. (2009). ON THE EVOLUTION OF SEARCH ENGINE RANKINGS . In Proceedings of the Fifth International Conference on Web Information Systems and Technologies - Volume 1: WEBIST, ISBN 978-989-8111-81-4, pages 200-207. DOI: 10.5220/0001843102000207


in Bibtex Style

@conference{webist09,
author={PanagiotisTakis Metaxas},
title={ON THE EVOLUTION OF SEARCH ENGINE RANKINGS},
booktitle={Proceedings of the Fifth International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,},
year={2009},
pages={200-207},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001843102000207},
isbn={978-989-8111-81-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Fifth International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,
TI - ON THE EVOLUTION OF SEARCH ENGINE RANKINGS
SN - 978-989-8111-81-4
AU - Metaxas P.
PY - 2009
SP - 200
EP - 207
DO - 10.5220/0001843102000207