decreases with the number of terms in the query.
The bottom row in the table shows that the fraction
of queries with above average precision improves
with the number of terms till it hits the high mark
where humans start to be overwhelmed by the size
of Boolean expression.
We suggest that a query with fewer than 3 terms
alert us to the possibility that the searcher is finding
it difficult to identify appropriate domain terms (or
jargons) for the query. More than 4 terms makes it
difficult to organise the terms in an effective
Boolean query. The best performing query sizes,
however, do not coincide with the most common
query size. The most common size of the queries as
reported in (Jones et al, 1998) is 2 terms and it
accounts for about third of all queries.
Other researchers have reported that domain-
savvy searchers use a small number of domain
specific terms in their search query. Our observation
is not inconsistent with those findings. To further
test our inference, we grouped the volunteer queries
into three nearly equal size groups based on their
performance relative to the synthesised queries.
Queries with precision up to 2 units below the
corresponding synthesised query were marked good.
Those that had precision 5 or more units below the
synthesised queries were marked poor. The group in
the middle had 14 cases. Table 4 shows the
distribution of terms in the two groups.
The proportion of queries in good group with 4
terms is about three times as high as in the poor
performing group.
Table 3: Fraction of queries with stated precision
characteristics as function of terms in query.
Terms in query 2 3 4 5
(precision of user query <
average precision) among
queries with single
improvement attempt
100% 44% 20% 0%
Query precision > average
precision
0% 50% 93% 29%
5 CONCLUSIONS
Four terms and above average coverage emerges as
a good predictor of a successful Boolean web search
query. Indeed, all samples with this property in our
data have above-average precision of 15 or more.
One-half of these queries achieve perfect precision
score of 20.
To further improve odds for success choose 4-
terms, above average coverage with several attempts
to improve query Minimum precision delivered by
these queries in our survey is 19.
Table 4: Distribution of terms in two groups of volunteer
queries marked good and poor.
Number of terms Good queries Poor queries
Count 13 12
2 0% 8%
3 31% 42%
4 54% 17%
5 15% 33%
Total 100% 100%
REFERENCES
Baeza-Yates, R., and Riberio-Neto, B. (1999). Modern
Information retrieval, Addison-Wesley, Reading, Ma.
Brin, S. and Page, L. 1998. The Anatomy of a Large-Scale
Hypertextual Web Search Engine. Computer Networks
30(1-7) pp. 107-117
Broder, A.Z. (2002). A taxonomy of web search. SIGIR
Forum 36(2) pp. 3-10
Chakrabarti, S. (2003). Mining the Web: Discovering
knowledge from hypertext data, Morgan Kaufmann
Publishers, Amsterdam
Hölscher, C. and Strube, G. (2000). Web search behavior
of Internet experts and newbies. Computer Networks
33(1-6) pp. 337-346
Jansen, B.J. (2000). The effect of query complexity on
web searching results, Information Research, 6(1)
Jones, S., Cunningham, S.J. and McNab, R. (1998). Usage
Analysis of a Digital Library. In: Third ACM Conf. on
Digital Libraries, Pittsburgh, PA, USA. June 23-26.
Jones, S., Cunningham, S.J., McNab, R.J., Boddie, S.J.
(2000). A transaction log analysis of a digital library.
Int. J. on Digital Libraries 3(2) pp. 152-169
Malhotra, V., Patro, S. and Johnson, D. (2005). Synthesise
web queries: Search the Web by examples, Int. conf. on
enterprise information systems, Maimi, Florida.
Manning, C.D. and Schtze, H. (1999). Foundations of
statistical natural language processing, MIT press,
Cambridge, MA
Powers, D.M.W. (2003). Recall and precision versus the
bookmakers, Joint International conference on
cognitive science. pp. 529-534
Silverstein, C., Henzinger, M., Marais, H., and Moricz, M.
1999. Analysis of a Very Large Web Search Engine
Query Log, ACM SIGIR Forum, 33(1) pp. 6-12.
Spink, A. (2002). A user-centered approach to evaluating
human interaction with Web search engines: an
exploratory study. Inf. Process. Manage. 38(3).
Witten, I.H. and Frank, E. (2000). Data mining: practical
machine learning tools and techniques with Java
implementations. Morgan Kaufmann Publishers, San
Francisco.
WEBIST 2005 - WEB INTERFACES AND APPLICATIONS
344