is a metric used to compare two or more search
algorithms (Kowalski, 1997). It measures the
number of unique relevant results produced by one
algorithm and not produced by other algorithms.
Because we are exploring the effectiveness of
stemming, we compare the results produced by a
search without stemming with those produced by a
search with stemming. The equations for precision
and unique relevance recall are:
Image relevance was determined by three trained
image raters. If at least two of the three raters
deemed an image relevant, the image was
determined to be relevant. The goal of this analysis
is simply to test our hypothesis that stemming is a
useful addition to a text-based image retrieval
system for use on the Web.
4.2 Local Image Search Results
Each of the 24 queries used to generate the local
dataset were rerun on the local dataset both with and
without stemming. In both cases the relevant images
were noted and search precision calculated. The
images returned using both algorithms were
compared, and uniquely relevant images were
identified. In 21 of the 24 queries, the results were
identical between the two searches. In the remaining
3 queries, stemming produced all of the images
returned by the search without stemming as well as a
few additional images. Unlike the other 21 queries,
these 3 queries contain terms that lend themselves
well to stemming. In 2 of these 3 queries, the
additional images returned by stemming were
relevant to the search query. The results from the
local data search can be seen in Table 4.1.
4.3 Web Image Search Results
To examine the effectiveness of stemming on text-
based image retrieval on the Web, 10 queries were
used. These queries were selected for their perceived
appropriateness for stemming. They are not meant to
be a representative set of queries for the application
of Web image retrieval. These 10 queries were
submitted both with and without stemming. The
resulting relevant images were identified and the
search precisions recorded. In 8 of the 10 queries,
the result sets where stemming was implemented
contained all of the images produced without
stemming along with additional relevant images. In
the remaining 2 queries, stemming allowed a match
in a feature determined to lower the relevance
according to the image relevance equation. The
average precision among the 10 queries was 82.5%
without stemming and 84.5% with stemming. The
average unique relevance recall with stemming was
0.5% and 27.7% with stemming. The results from
the Web image search can be seen in Table 4.2.
4.4 Discussion
This research shows that stemming is useful to a
certain extent in text-based image retrieval for
obtaining additional relevant results. It also shows
that stemming a given HTML feature only when a
match is not found without stemming allows
stemming to be implemented in a manner that is
unlikely to exclude results that would have been
returned had stemming not been implemented.
On the queries ran against the local data set, a
small improvement was seen with the addition of
stemming. Of the 24 queries, only 3 produced
additional images with stemming. Of these 3
queries, 2 returned additional relevant images. This
is likely due to the nature of the queries themselves.
A total of 15 of the queries were proper names of
people, places, or monuments. Because stemming is
applied in an effort to create matches among
multiple word forms, it follows that queries of
proper names would not benefit from stemming. For
the same reason, the 3 queries of holidays, "new
year", "thanks giving", and "halloween" are unlikely
to benefit from stemming. Of the remaining 6
queries, 2 contained the result set that was improved
by stemming. These queries were "burning house"
and "raining". Both of these queries contain a
present participle verb ending in "ing". Stemming in
these situations not only returned additional relevant
images, thus raising the URR, but also improved the
precision for the search. The one query that
produced additional results, none of which were
relevant, was the query of "thanks giving". Due to
the word "giving" in this query, it is not surprising
that stemming produced additional results. However,
"thanks giving" is typically written as a single word
"thanksgivng", which may have some effect on the
results for this query. These results illustrate the fact
that stemming is not necessarily beneficial for all
types of search queries.
For the result set produced by image search on
the Web, stemming proved to be useful. The average
URR with stemming was 27.7% while increasing the
precision by 2.0% from 82.5% to
Precision =
Number _ Relevant _ Retrieved
Total _ Number _ Retrieved
(4.1)
URR =
Number _ Unique _ Relevant
Number _ Relevant
(4.2)
WEBIST 2008 - International Conference on Web Information Systems and Technologies
228