FOCUSING WEB CRAWLS ON LOCATION-SPECIFIC CONTENT

Lefteris Kozanidis, Sofia Stamou, George Spiros

Abstract

Retrieving relevant data for location-sensitive keyword queries is a challenging task that has so far been addressed as a problem of automatically determining the geographical orientation of web searches. Unfortu-nately, identifying localizable queries is not sufficient per se for performing successful location-sensitive searches, unless there exists a geo-referenced index of data sources against which localizable queries are searched. In this paper, we propose a novel approach towards the automatic construction of a geo-referenced search engine index. Our approach relies on a geo-focused crawler that incorporates a structural parser and uses GeoWordNet as a knowledge base in order to automatically deduce the geo-spatial information that is latent in the pages’ contents. Based on location-descriptive elements in the page URLs and anchor text, the crawler directs the pages to a location-sensitive downloader. This downloading module resolves the geo-graphical references of the URL location elements and organizes them into indexable hierarchical structures. The location-aware URL hierarchies are linked to their respective pages, resulting into a geo-referenced index against which location-sensitive queries can be answered.

References

  1. Amitay, E., Har'El, N., Silvan, R., Soffer, A. 2004. Weba-where: geo-tagging web content. In Proceedings of the 27th Annual Intl. SIGIR Conference.
  2. Borges, K., Laender, A., Mederios, C., Davis, C. 2007. Discovering geographic locations in web pages using urban addresses. In Proceedings of the 4th Intl Workshop on GIR
  3. Buscaldi, D., Roso, P. 2008. Geo-WordNet: automatic georeferencing of WordNet. In Proceedings of the 6th Intl. LREC Conference.
  4. Chakrabarti, S., van den Berg, M., Dom, B. 2000. Focused crawling: a new approach to topic-specific web resources discovery. Computer Networks, 31(11-16): 1623-1640.
  5. Chung, C., Clarke, C.L.A., 2002. Topic-oriented collaborative crawling, In CIKM Conference, pp. 34-42.
  6. Ding, J., Gravano, L., Shivakumar, N. 2000. Computing geographical scopes of web resources. In Proceedings of the VLDB Conference.
  7. Exposto, J., Macedo, J., Pina, A., Alves, A., Rufino, J. 2005. Geographical partition for distributed web crawling. In Proceedings of the 2nd GIR Workshop.
  8. Fellbaum, Ch. (ed.) 1998. WordNet: An Electronic Lexical Database, MIT Press.
  9. Fu, G., Jones, C.R., Abdelmoty, A. 2005. Building a geographical ontology for intelligent spatial search on the Web. In Proceedings of the IASTED Intl. Conference on Databases and Applications. pp. 167-172.
  10. Gao, W., Lee, H.C., Miao, Y. 2006. Geographically focused collaborative crawling. In Proceedings of the WWW Conference.
  11. Hill, L. 2000. Core elements of digital gazetteers: placements, categories and footprints. In Research and Advanced Technology of Digital Libraries.
  12. Himmelstein, M. 2005. Local search: the internet is yellow pages. In Computer, v.38, n.2, pp. 26-34.
  13. Map 24. Available at: http://developer.navteq.com/site/ global/zones/ms/downloads.jsp.
  14. Markowetz, A., Brinkhoff, T., Seeger, B., 2004. Geographic information retrieval. In Web Dynamics.
  15. Martins, B., Silva, M.J., Andrade, L. 2005. Indexing and ranking on Geo-IR systems. In Proceedings of the 2nd Intl. Workshop on GIR.
  16. Salton, G., Wong, A., Yang, S.C. 1975. A vector space model for automatic indexing. In Communications of the ACM, Vol.18, No.11, pp. 631-620.
  17. Silva, M.J., Martins, B., Chaves, M., Cardoso, N., Afonso, A.P. 2006. Adding geographic scopes to web resources. In Computers, Environment and Urban Systems, vol. 30, pp. 378-399.
  18. Smith, D., Mann, G. 2003. Bootstrapping toponyms classifiers. In Proceedings of the HLT-NAACL Workshop on Analysis of Geographic References, pp. 45-49.
  19. Wang, L., Wang, C., Xie, X., Forman, J., Lu, Y.S., Ma, W,Y., Li, Y. 2005a. Detecting dominant locations from search queries. In Proceedings of the SIGIR Conference.
  20. Wang, C., Xie, X., Wang, L., Lu, Y.S., Ma, W,Y. 2005b. Detecting geographic locations from web sources. In Proceedings of the 2nd Intl. Workshop on GIR.
  21. Welch, M., Cho, J. 2008. Automatically Identifying Localizable Queries. In Proceedings of the SIGIR Conference.
  22. Yu, B., Cai, G. 2007. A query-aware document ranking method for geographic information retrieval. In Proceedings of the 4th Intl Workshop on GIR.
Download


Paper Citation


in Harvard Style

Kozanidis L., Stamou S. and Spiros G. (2009). FOCUSING WEB CRAWLS ON LOCATION-SPECIFIC CONTENT . In Proceedings of the Fifth International Conference on Web Information Systems and Technologies - Volume 1: WEBIST, ISBN 978-989-8111-81-4, pages 244-249. DOI: 10.5220/0001823002440249


in Bibtex Style

@conference{webist09,
author={Lefteris Kozanidis and Sofia Stamou and George Spiros},
title={FOCUSING WEB CRAWLS ON LOCATION-SPECIFIC CONTENT},
booktitle={Proceedings of the Fifth International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,},
year={2009},
pages={244-249},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001823002440249},
isbn={978-989-8111-81-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Fifth International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,
TI - FOCUSING WEB CRAWLS ON LOCATION-SPECIFIC CONTENT
SN - 978-989-8111-81-4
AU - Kozanidis L.
AU - Stamou S.
AU - Spiros G.
PY - 2009
SP - 244
EP - 249
DO - 10.5220/0001823002440249