erage, correctness, granularity, balance and richness
of annotation influence the outcome of any evaluation
experiment (Leidner, 2006). Therefore, even stan-
dardized evaluation corpora such as the one designed
by Leidner require geo-taggers to use a fixed gazetteer
to provide comparable results.
Studies show (Hersh et al., 2000; Allan et al.,
2005) that information retrieval performance mea-
sures as for instance recall do not always corre-
spond to adequate gains in actual user satisfaction
(Turpin and Scholer, 2006). Work by Turpin and
Hersh (Turpin and Hersh, 2001) suggests that im-
provements of information retrieval metrics do not
necessarily translate into better user performance for
specific search tasks.
Martins et al. (Martins et al., 2005) recommend
to close the gap between performance metrics and
user experience by performing user studies. Despite
the additional effort required to implement such stud-
ies, work by Nielsen and Landauer (Nielsen and Lan-
dauer, 1993) suggests that approximately 80% of the
described usability problems can be detected with
only five users (Martins et al., 2005).
This work addresses the need for comparative
evaluations and user participation by applying the
concept of utility to geo-tagger evaluation metrics.
Intra-personal settings translate tagging results into
utility values and allow to measure the performance
according to the user’s specific needs.
The remainder of this paper is organized as fol-
lows. Section 2 elaborates on challenges faced in geo-
tagging. Section 3 presents a blueprint for applying
the concept of utility to geo-tagging and describes the
process of deploying a geo-evaluation ontology. Sec-
tion 4 demonstrates the usefulness of utility centered
evaluations by comparing the utility based technique
to conventional approaches. The paper closes with an
outlook and draws conclusions in Section 5.
2 EVALUATING GEO-TAGS
Web pages often contain multiple references to geo-
graphic locations. State of the art geo-taggers facili-
tate these references to identify the site’s geographic
context and resolve ambiguities using the obtained
context. A focus algorithm decides based on the
identified geographic entities on the site’s geography
(Amitay et al., 2004). Tuning parameters determine
the focus algorithm’s behavior, such as whether it is
biased toward higher-level geographic units (such as
countries and continents) or prefers low-level entities
such as cities or towns.
Biases make judging the tagger’s performance dif-
ficult. An article about Wolfgang Amadeus Mozart,
for example, contains one reference to Salzburg and
two to Vienna - both cities in Austria. Depending on
the focus algorithm’s configuration, the page’s geog-
raphy might be set to (i) Salzburg (bias toward low-
level geographic units), (ii) Austria (bias toward high-
level geographic units), or (iii) Vienna (bias toward
low-level geographic units with a large population).
The task of judging the value of a particular an-
swer is far from trivial, because each possible solution
has a certain degree of correctness. Work comparing
results to a gold standard often fails to value these nu-
ances.
This paper therefore suggests to apply the concept
of utility, as found in economic theory, to the evalua-
tion of geo-taggers. The geographies returned by the
tagger are assessed based on preferences specified by
the user along different ontological dimensions and
get scored accordingly.
Maximizing utility instead of the number of cor-
rectly tagged documents, provides advantages in re-
gard to: (i) granularity - the architecture even ac-
counts for slight variations in the grade of “correct-
ness” of the proposed geo-tags; (ii) adaptability -
users can specify their individual utility profiles, pro-
viding the architect with means to assess the tagger’s
performance in accordance with the particular prefer-
ences of a user; and (iii) holistic observability - the
geo-tagger’s designer is no longer restricted to ob-
serve gains, but can consider costs in terms of com-
puting power, storage, network traffic, and response
times.
3 METHOD
Figure 1 outlines how the utility based approach uses
ontologies to evaluate the geo-tagging performance.
The framework compares the geo-tagger’s annota-
tions with tags retrieved from a gold standard. Cor-
rect results yield the full score, incorrect results are
evaluating using ontology based scoring which veri-
fies whether the result is related to the correct answer
in regard to the dimensions specified in the evaluation
ontology and the extend of such a possible relation.
Queries against the data source identify such ontolog-
ical relationships between the computed and the cor-
rect tag, which are evaluated considering the answer’s
deviation from the correct answer and the user’s pref-
erence settings.
A UTILITY CENTERED APPROACH FOR EVALUATING AND OPTIMIZING GEO-TAGGING
135