relative responsiveness (5-point scale, 1= worst, 5=
best) of the summaries was also assessed by
measuring the amount of information that helps the
user in successfully retrieving information. This
measure seems to coincide with our usefulness one,
which is assessed in our experiment through
questions 1, 2, and 5. The average responsiveness in
DUC 2005 was 48% for automatic summaries,
against 93% for the reference ones (Hachey et al.,
2005).
Accordingly looking at our data, usefulness
averaged 72% for those three questions. Certainly,
this is quite a simplistic comparison, due to the
profound differences of both assessments. However,
the whole design of the experiment is quite
significant, when compared to the DUC ones. To
make it statistically significant, we must invest on its
robustness (e.g., by increasing the amount of Web
users and search engine answers).
6 FINAL REMARKS
The reported results show a significant proximity of
ExtraWeb with Google. This means that ExtraWeb
may also be useful for the users to make decisions
on retrieving documents, in spite of their low score
(68%) on full satisfaction with the results of the
emulated search task. Although the experiment was
not intended to control either the homogeneity of the
judging population or its subjectivity in
accomplishing the demanded task, the analysis of
their scores shows that the overall judgment was
quite consistent. However, the same extrinsic task-
oriented evaluation may yield different results when
a higher scale on both, judges and retrieved
documents, is taken into account. Usually, users
assessing the same task and set of results of a search
engine would not necessarily respond in the same
way and some of them might read the extracts more
carefully than others. As a consequence, their
judgments could be more accurate. This is very
likely to be evident when scaling up the type of
assessment reported in this paper.
Another important issue to pinpoint is that
ExtraWeb is domain-independent. However, it
depends on previous HTML-marking keywords
which are usually accomplished by the documents
authors. The alternative to this would be to generate
a keywords list through statistical methods such as
Luhn’s itself. However, this would not yield
keywords as expressive as the authored ones.
Future work shall build on both, improving the
enrichment of the ontology and assessing more
broadly the system, in a distributed environment in
real-time. Most probably, it will be relevant to
reproduce similar quality questions to the ones used
in the most recent DUCs too.
REFERENCES
Amitay, E., 2001. What lays in the layout: Using anchor-
paragraph arrangements to extract descriptions of
Web documents. PhD Thesis. Department Mani of
Computing, Macquarie University.
Barros, F. A., Gonçalves, P. F., Santos, T. L. V. L., 1998.
Providing Context to Web Searches: The Use of
Ontologies to Enhance Search Engine's Accuracy.
Journal of the Brazilian Computer Society, 5(2):45-55.
Chirita, P. A., Nejdl, W., Paiu, R., Kohlschütter, C., 2005.
Using ODP meta-data to personalize search. In the
Proc. of the 28th Annual International ACM SIGIR
Conference on Research and Development in
Information Retrieval, pp. 178-185.
Conklin, J., 1987. Hypertext: An Introduction and Survey.
IEEE Computer, 20(9), pp.17-41.
Dorr, B., Monz, C., President, S., Schwartz, R., Zajic, D.,
2005. A Methodology for Extrinsic Evaluation of Text
Summarization: Does ROUGE Correlate? In the Proc.
of the ACL Workshop on Intrinsic and Extrinsic
Evaluation Measures for Machine Translation and/or
Summarization, pp. 1-8.
Edmundson, H. P., 1969. New Methods in Automatic
Extracting. Journal of the ACM, 16(2):264-285.
Greghi, J. G., Martins, R. T., Nunes, M. G. V., 2002.
Diadorim: a lexical database for brazilian portuguese.
In the Proc. of the Third International Conference on
language Resources and Evaluation. 4:1346-1350.
Griesbaum, J., 2004. Evaluation of three German search
engines: Altavista.de, Google.de and Lycos.de.
Information Research, 9(4), paper 189.
Hachey, B., Murray, G., Reitter, D., 2005. Embra System
at DUC 2005: Query-oriented multi-document
summarization with a very large latent semantic.
Document Understanding Conference 2005,
Vancouver, British Columbia, Canada.
Haveliwala, T. H., 2002. Topic-sensitive PageRank. In the
Proc. of the Eleventh International World Wide Web
Conference, Honolulu, Hawaii.
Inktomi-Corp., 2003. Web search relevance test. Ve-ritest.
Available at http://www.veritest.com/clients/reports/
inktomi/inktomi_Web_search_test.pdf [March 2006].
Jansen, B. J., Spink, A., Saracevic, T., 2000. Real life, real
users, and real needs: a study and analysis of user
queries on the web. Information Processing and
Management, 36(2):207-227.
Lewis, J. R., 1995. Computer Usability Satisfaction
Questionnaires: Psychometric Evaluation and
Instructions for Use. International Journal of Human-
Computer Interaction, 7(1):57-78.
Liang, S. F., Devlin, S., Tait, J., 2004. Feature Selection
for Summarising: The Sunderland DUC 2004
ExtraWeb-AnExtrinsicTask-orientedEvaluationofWebpageExtracts
473