prefix tree solution is 57% faster and 10% more
accurate, on average, than the heuristic solution. In
comparison to Brute Force, it is 10 times faster.
The resulting solution improves the speed and
quality of web-page georeferencing and removes
one bottleneck for creating efficient location-based
search engine as the prototype MOPSI search.
Table 4: Average search times for the address detection.
Method
Time (s)
Standard
deviation
Number of
validated
addresses
Rural municipalities
Brute-Force 3.01 2.43 3.7
Heuristic 1.54 1.15 2.5
Prefix Tree 0.51 0.35 3.7
Urban Municipalities
Brute-Force 10.18 7.11 19.8
Heuristic 1.70 1.24 18.6
Prefix Tree 0.87 0.85 19.8
Total
Brute-Force 6.59 6.40 11.8
Heuristic 1.62 1.20 10.5
Prefix Tree 0.69 0.68 11.8
ACKNOWLEDGEMENTS
The research was supported by EU/EAKR and the
work of Ville Hautamäki by the Academy of
Finland, under project 131298.
REFERENCES
Ahlers D. and Boll S. (2008a). Retrieving address-based
locations from the web. Int. Workshop on Geographic
Information Retrieval, 27-34, Napa Valey, CA.
Ahlers D. and Boll S. (2008b). Urban Web Crawling.
ACM Int.workshop on Location and the web. Vol. 300,
25–32. Beijing, China.
Amitay E., Har'El N., Sivan R. and Soffer A. (2004).
Web-a-where: geotagging web content. ACM SIGIR
Conf. on Research and Development in Information
Retrieval, Sheffield, UK, 273–280.
Borges K., Laender A., Medeiros C. and Davis Jr. C.
(2007). Discovering geographic locations in web
pages using urban addresses. ACM Workshop on
Geographical Information Retrieval. Lisbon, Portugal,
31-36.
Buyukkokten O., Cho J., Garcia-Molina H., Gravano L.
and Shivakumar N. (1999). Exploiting geographical
location information of web pages. WebDB (Informal
Proceedings), – dbpubs.stanford.edu
Cai W., Wang S. and Jiang Q. (2005). Address Extraction:
Extraction of Location-Based Information from the
Web. Web Technologies Research and Development -
APWeb 2005, Volume 3399/2005, 925-937
Can L., Qian Z., Xiaofeng M. and Wenyin L. (2005).
Postal address detection from web documents. Web
Information Retrieval and Integration. Int. Workshop
on Challenges in Web Information Retrieval and
Integration, 40 - 45
Fränti P., Kuittinen J., Tabarcea A. and Sakala L. (2010).
MOPSI Location-based Search Engine: Concept,
Architecture and Prototype. ACM Symposium on
Applied Computing, Sierre, Switzerland.
Gravano L., V Hatzivassiloglou V. and Lichtenstein R.
(2003). Categorizing web queries according to
geographical locality. Int. Conf. on Information and
Knowledge Management, New Orleans, LA, 325–333.
Hariharan G., Fränti P. and Mehta S. (2002). Data mining
for personal navigation. SPIE Conf. on Data Mining
and Knowledge Discovery, vol. 4730, 355-365.
Hill L., Frew J. and Zheng Q. (1999). Geographic names:
The implementation of a gazetteer in a georeferenced
digital library. D-Lib Mag., January 1999, 5 (1)
Jones C.B., Abdelmoty A.I. , Finch D., Fu G. and Vaid S.
(2004). The SPIRIT spatial search engine:
Architecture, ontologies and spatial indexing. LNCS
Lecture Notes in Computer Science, Springer.
Kuittinen J. (2006). Using location information in search
engines. MSc thesis, Univ. of Joensuu (in Finnish)
Lee H.C., Liu H. and Miller R.J. (2007). Geographically-
Sensitive Link Analysis. IEEE/WIC/ACM Int. Conf. on
Web Intelligence, Silicon Valley, CA, 628–634.
McCurley, K.S. (2001). Geospatial mapping and
navigation of the web. Int. Conf. on WWW, 221-229.
Mikheev A., Moens M. and Grover C. (1999). Named
entity recognition without gazetteers. Conf. on
European Chapter of the Association for
Computational Linguistics, Bergen, Norway, 1–8.
Morimoto Y., Aono M., Houle M.E. and McCurley K.S.
(2003). Extracting spatial knowledge from the web.
Symposium on Applications and the Internet, 326–333.
Navarro G. and Raffinot M. (2002). Flexible Pattern
Matching in Strings. Cambridge University Press.
Silva M.J., Martins B., Chaves M., Afonso A.P. and
Cardoso N. (2006). Adding geographic scopes to web
resources. Computers, Environment and Urban
Systems, 30 (4), GIR, 378–399.
Souza L.A., Davis C.A. Jr.,Borges, K.A.V., Delboni T.M.
and Laender A.H.F. (2005). The role of gazetteers in
geographic knowledge discovery on the Web. 3rd
Latin American Web Congress, 9.
Viola P. and Narasimhan M. (2005). Learning to extract
information from semi-structured text using a
discriminative context free grammar. ACM SIGIR
Conf. on Research and Development in Information
Retrieval, Salvador, Brazil, 330–337.
Wang C., Xie X., Wang L., Lu Y. and Ma W.Y. (2005).
Detecting geographic locations from web resources.
Workshop on Geographic Information Retrieval,
Bremen, Germany, 17 – 24.
WEBIST 2010 - 6th International Conference on Web Information Systems and Technologies
244