THE SPANISH WEB IN NUMBERS - Main Features of the Spanish Hidden Web

Manuel Álvarez, Fidel Cacheda, Rafael López-García, Víctor M. Prieto

Abstract

This article submits a study about the web sites of the “.es” domains which focuses on the level of use of the technologies that hinder the traversal of the Web to the crawling systems. The study is centred on HTML scripts and forms, since they are two well-known entry points to the “Hidden Web”. For the case of scripts, it pays special attention to redirection and dynamic construction of URLs. The article concludes that a crawler should process those technologies in order to obtain most of the documents of the Web.

References

  1. Álvarez, M., Cacheda, F., Pan, A., 2009. Análisis Macroscópico de los Dominios .es. In JITEL'09, VIII Jornadas de Ingeniería Telemática.
  2. Bergman, M., 2000. The Deep Web. Surfacing Hidden Value. In Technical Report, BrightPlanet LLC.
  3. BuiltWith Trends. 2011. BuiltWith Technology Usage Statistics. In http://trends.builtwith.com/
  4. Chang, K. C.-C., He, B., Patel, M., Li, C., Zhang, Z., 2004. Structured Databases on the Web: Observations and Implications. In SIGMOD Record, vol. 33, no. 3.
  5. De Kunder, M. 2011. The size of the World Wide Web. In http://www.worldwidewebsize.com/
  6. Google. 2011. Web Authoring Statistics. In http://code.google.com/intl/es-MX/webstats/index.html
  7. Gyöngyi, Z., Garcia-Molina, H., 2005. Web Spam Taxonomy. In AIRWeb'05, Proceedings of the First International Workshop on Adversarial Information Retrieval on the Web.
  8. Internet Systems Consortium. 2011. The ISC Domain Survey. In http://www.isc.org/solutions/survey
  9. Koster, M. 1994. A Standard for Robot Exclusion. In http://www.robotstxt.org/orig.html
  10. Netcraft. 2011. March 2011 Web Servers Survey. In http://news.netcraft.com/archives/category/web-server -survey/
  11. Red.es. 2011. In http://www.red.es
  12. VeriSign. 2011. Internet Profiling Service Statistics. In http://www.nic.at/en/uebernic/statistics/ips_statistics_i nformations/
  13. Weideman, M., Schwenke, F. 2006. The influence that JavaScript™ has on the visibility of a Website to search engines - a pilot study. In Information Research, vol. 11, no. 4.
  14. Wu, B., Davison., B.D. 2005. Cloaking and Redirection: A Preliminary Study. In AIRWeb 7805, Proceedings of First International Workshop on Adversarial Information Retrieval on the Web.
Download


Paper Citation


in Harvard Style

Álvarez M., Cacheda F., López-García R. and M. Prieto V. (2011). THE SPANISH WEB IN NUMBERS - Main Features of the Spanish Hidden Web . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2011) ISBN 978-989-8425-79-9, pages 363-366. DOI: 10.5220/0003626603710374


in Bibtex Style

@conference{kdir11,
author={Manuel Álvarez and Fidel Cacheda and Rafael López-García and Víctor M. Prieto},
title={THE SPANISH WEB IN NUMBERS - Main Features of the Spanish Hidden Web},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2011)},
year={2011},
pages={363-366},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003626603710374},
isbn={978-989-8425-79-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2011)
TI - THE SPANISH WEB IN NUMBERS - Main Features of the Spanish Hidden Web
SN - 978-989-8425-79-9
AU - Álvarez M.
AU - Cacheda F.
AU - López-García R.
AU - M. Prieto V.
PY - 2011
SP - 363
EP - 366
DO - 10.5220/0003626603710374