Author:
Konstantin Clemens
Affiliation:
Technische Universität Berlin, Service-centric Networking and Germany
Keyword(s):
Geocoding, Postal Address Search, Spelling Variant, Spelling Error, Document Search.
Related
Ontology
Subjects/Areas/Topics:
Applications
;
Data Engineering
;
Databases and Data Security
;
Pattern Recognition
;
Query Processing and Optimization
;
Web Applications
Abstract:
In previous research, to mimic user queries with typos and abbreviations, a statistical model was used. It was trained to generate spelling variants of address terms that a human would use. A geocoding system enhanced with these spelling variants proved to yield results with higher precision and recall. To train the statistical model, thus far, user queries and their expected results were required to be linked with each other. Such training data is very costly to obtain. In this paper, a novel approach to derive such spelling variants from user queries alone is proposed. A linkage between collected user queries and result addresses is no longer required. The experiment conducted proves that this approach is a reasonable way to observe, derive, and index spelling variants too, allowing to measurably improve the precision and recall metrics of a geocoder.