AD-HOC GEOREFERENCING OF WEB-PAGES USING STREET-NAME PREFIX TREES

Andrei Tabarcea, Ville Hautamäki, Pasi Fränti

Abstract

A bottleneck of constructing location-based web searches is that most web-pages do not contain any explicit geocoding such as geotags. Alternative solution can be based on ad-hoc georeferencing which relies on street addresses, but the problem is how to extract and validate the address strings from free-form text. We propose a rule-based solution that detects address-based locations using a gazetteer and street-name prefix trees created from the gazetteer. We compare this approach against a method that doesn’t require a gazetteer (a heuristic method that assumes that street-name has a certain structure) and a method that also uses data structures created from the gazetteer in the form of street-name arrays. Experiments using our location based search engine prototype (MOPSI) for Finland and Singapore, show that the proposed prefix-tree solution is twice as fast and 10% more accurate than its rule-based alternative and 10 times faster if an array structure is used when accessing the gazetteer.

References

  1. Ahlers D. and Boll S. (2008a). Retrieving address-based locations from the web. Int. Workshop on Geographic Information Retrieval, 27-34, Napa Valey, CA.
  2. Ahlers D. and Boll S. (2008b). Urban Web Crawling. ACM Int.workshop on Location and the web. Vol. 300, 25-32. Beijing, China.
  3. Amitay E., Har'El N., Sivan R. and Soffer A. (2004). Web-a-where: geotagging web content. ACM SIGIR Conf. on Research and Development in Information Retrieval, Sheffield, UK, 273-280.
  4. Borges K., Laender A., Medeiros C. and Davis Jr. C. (2007). Discovering geographic locations in web pages using urban addresses. ACM Workshop on Geographical Information Retrieval. Lisbon, Portugal, 31-36.
  5. Buyukkokten O., Cho J., Garcia-Molina H., Gravano L. and Shivakumar N. (1999). Exploiting geographical location information of web pages. WebDB (Informal Proceedings), - dbpubs.stanford.edu
Download


Paper Citation


in Harvard Style

Tabarcea A., Hautamäki V. and Fränti P. (2010). AD-HOC GEOREFERENCING OF WEB-PAGES USING STREET-NAME PREFIX TREES . In Proceedings of the 6th International Conference on Web Information Systems and Technology - Volume 1: WEBIST, ISBN 978-989-674-025-2, pages 237-244. DOI: 10.5220/0002804002370244


in Bibtex Style

@conference{webist10,
author={Andrei Tabarcea and Ville Hautamäki and Pasi Fränti},
title={AD-HOC GEOREFERENCING OF WEB-PAGES USING STREET-NAME PREFIX TREES},
booktitle={Proceedings of the 6th International Conference on Web Information Systems and Technology - Volume 1: WEBIST,},
year={2010},
pages={237-244},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002804002370244},
isbn={978-989-674-025-2},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 6th International Conference on Web Information Systems and Technology - Volume 1: WEBIST,
TI - AD-HOC GEOREFERENCING OF WEB-PAGES USING STREET-NAME PREFIX TREES
SN - 978-989-674-025-2
AU - Tabarcea A.
AU - Hautamäki V.
AU - Fränti P.
PY - 2010
SP - 237
EP - 244
DO - 10.5220/0002804002370244