1. Austria/Carinthia/Spittal/Heiligenblut/Grossglockner (mountain)
2. Austria/Carinthia/Spittal/Heiligenblut (village)
3. Austria/Carinthia/Spittal (district)
4. Austria/National Park Hohe Tauern (national park)
5. Austria/Carinthia (state)
6. Austria/Salzburg (Neighbor) (state)
7. Austria/Tyrol (Neighbor) (state)
8. Austria (country)
Figure 1: Ranking of possible “correct” results for geo-tagging an article covering the “Grossglockner”.
3. the probability of x
0
, f (x
0
) and F(x
0
).
Provided that the conditions described in Sec-
tion 3.1 are fulfilled only five possible optimal poli-
cies are possible - (i) always test, (ii) never test, (iii)
test if u > u
t
, (iv) if u < u
t
, or (v) if u
t
< u < u
0
t
.
The expected utility equals to
1. E(u|x
0
) for accepting without testing,
2. T (r, v) with testing, and
3. v
0
if the action is dropped and a new set (S
A
) is
selected according to the optimal policy.
4 METHOD
This section focuses on the application of the STS
model to Web services. At first we describe heuristics
for estimating cost functions (c
s
, c
t
), and the common
probability mass function h(x
0
,x
1
,u) Afterwards the
process of applying search-test-stop to tagging appli-
cations is elaborated.
4.1 Cost functions
In the conventional STS model costs refer to the in-
vestment in terms of time and money for gathering
information. By applying this idea to software, costs
comprise all expenses in terms of CPU-time, band-
width and storage cost necessary to search for or test
certain answers.
Large scale Semantic Web projects, like the ID-
IOM media watch on climate change (Scharl et al.,
2007), process hundred thousands of pages a week.
Querying geonames for geo-tagging such numbers of
documents would add days of processing time to the
IDIOM architecture.
This research focuses solely on costs in terms of
response time, because they are the limiting factor
in our current research projects. Other applications
might require extending this approach to additional
factors like CPU-time, bandwidth, etc.
4.2 Utility Distributions
Applying the STS model to economic problems yields
cash deposits and payments. Transferring this idea
to information science is a little bit more subtle, be-
cause the utility is highly dependent on the applica-
tion and its user’s preferences. Even within one do-
main the notion of an answer set’s (S
A
) value might
not be clear. For instance in a geo context the “cor-
rect” answer for a certain problem may be a particu-
lar mountain in Austria, but the geo-tagger might not
identify the mountain but the surrounding region or
at least the state in which it is located (compare Fig-
ure 1). Assigning concrete utility values to these al-
ternatives is not possible without detailed information
regarding the application and user preferences. Ap-
proaches for evaluating the set’s value might there-
fore vary from binary methods (full score for correct
answers; no points for incomplete/incorrect answers)
to complex ontology based approaches, evaluating the
grade of correctness and severe of deviations.
4.3 Application
This work has been motivated by performance issues
in a geo-tagging application facilitating resources
from geonames.org and WordNet for improving tag-
ging accuracy. Based on the experience garnered dur-
ing the evaluation of STS models, this section will
present a heuristic for determining the cost functions
(c
s
, c
t
) and the common probability mass function
h(x
0
,x
1
,u).
4.3.1 Cost functions
Searching leads to external queries and therefore
costs. Measuring a service’s performance over a cer-
tain time period allows estimating the average re-
sponse time and variance.
STS fits best for situations, where the query cost
c
s
is in the same order as the average utility retrieved
(O(c
s
) = O(u)). In settings with O(c
s
) O(u) the
search costs have no significant impact on the utility
Figure 1: Ranking of possible “correct” results for geo-tagging an article covering the “Grossglockner”.
2. the probability function of r, f (r|x
0
) and F(r|x
0
),
3. the probability of x
0
, f (x
0
) and F(x
0
).
Provided that the conditions described in Sec-
tion 3.1 are fulfilled only five possible optimal poli-
cies are possible - (i) always test, (ii) never test, (iii)
test if u > u
t
, (iv) if u < u
t
, or (v) if u
t
< u < u
0
t
.
The expected utility equals to
1. E(u|x
0
) for accepting without testing,
2. T (r,v) with testing, and
3. v
0
if the action is dropped and a new set (S
A
) is
selected according to the optimal policy.
4 METHOD
This section focuses on the application of the STS
model to Web services. At first we describe heuristics
for estimating cost functions (c
s
, c
t
), and the common
probability mass function h(x
0
,x
1
,u) Afterwards the
process of applying search-test-stop to tagging appli-
cations is elaborated.
4.1 Cost Functions
In the conventional STS model costs refer to the in-
vestment in terms of time and money for gathering
information. By applying this idea to software, costs
comprise all expenses in terms of CPU-time, band-
width and storage cost necessary to search for or test
certain answers.
Large scale Semantic Web projects, like the ID-
IOM media watch on climate change (Scharl et al.,
2007), process hundred thousands of pages a week.
Querying geonames for geo-tagging such numbers of
documents would add days of processing time to the
IDIOM architecture.
This research focuses solely on costs in terms of
response time, because they are the limiting factor
in our current research projects. Other applications
might require extending this approach to additional
factors like CPU-time, bandwidth, etc.
4.2 Utility Distributions
Applying the STS model to economic problems yields
cash deposits and payments. Transferring this idea
to information science is a little bit more subtle, be-
cause the utility is highly dependent on the applica-
tion and its user’s preferences. Even within one do-
main the notion of an answer set’s (S
A
) value might
not be clear. For instance in a geo context the “cor-
rect” answer for a certain problem may be a particu-
lar mountain in Austria, but the geo-tagger might not
identify the mountain but the surrounding region or
at least the state in which it is located (compare Fig-
ure 1). Assigning concrete utility values to these al-
ternatives is not possible without detailed information
regarding the application and user preferences. Ap-
proaches for evaluating the set’s value might there-
fore vary from binary methods (full score for correct
answers; no points for incomplete/incorrect answers)
to complex ontology based approaches, evaluating the
grade of correctness and severe of deviations.
4.3 Application
This work has been motivated by performance issues
in a geo-tagging application facilitating resources
from geonames.org and WordNet for improving tag-
ging accuracy. Based on the experience garnered dur-
ing the evaluation of STS models, this section will
present a heuristic for determining the cost functions
(c
s
, c
t
) and the common probability mass function
h(x
0
,x
1
,u).
4.3.1 Cost Functions
Searching leads to external queries and therefore
costs. Measuring a service’s performance over a cer-
tain time period allows estimating the average re-
sponse time and variance.
STS fits best for situations, where the query cost
c
s
is in the same order as the average utility retrieved
(O(c
s
) = O(u)). In settings with O(c
s
) O(u) the
search costs have no significant impact on the utility
and if O(c
s
) O(u) no searching will take place at
ICSOFT 2008 - International Conference on Software and Data Technologies
114