(http://www.intute.ac.uk), a popular search engine
among students for finding high quality educational
websites, a searcher may select from a list of subject
areas and/or resource types for his/her search, and
he/she is then taken to the result page. We think the
rigidity of this approach may limit the user to search
within the classification of the resources.
Additionally, there are many forms of metadata
which have not been fully exploited during the
search process.
To overcome the above limitations, we propose
to experiment with the dynamic query approach
based on Shneiderman’s philosophy (Shneiderman
1994) of letting users experiment in real time to tune
search results. Dynamic queries help users search
and explore large amounts of information by
presenting them with an overview of the datasets,
and then allow them quickly to filter out unwanted
information. “Users fly through information spaces
by incrementally adjusting a query (with sliders,
buttons, and other filters) while continuously
viewing the changing results.” A popular example of
this approach is that of Kayak.co.uk, a meta-search
engine which searches over 100 travel sites.
In DYNIQX, search results from a number of
search engines are fused into a single list by both the
relevance of each result to the search query based on
our indexing of top results returned from these
search engine, and the rankings of the result
provided by one or more search engines as below:
(|) (1 )(|) /(log( ()1)
fuse average
pqd pqd Rank d
λλ
∝− + +
where q is the query, p
fuse
(q|d) is the fused
conditional probability of document d used to rank it
in the final list, p(q|d) is the conditional probability
of d based on our index, λ is a parameter adjusting
the effect of the two components in the final
probability, and Rank
average
(d) is the average ranking
of document d given by search engines. In the
equation we take the log of the average ranking in
order to transform the linear distribution of the
rankings of d for integrating with the document
conditional probability.
DYNIQX provides a novel way of meta-
searching a number of search engines in terms that
high quality search results from a number of search
engines are integrated, metadata from heterogeneous
sources are unified for filtering and searching these
high quality search results, high quality results based
on a number of queries covering a topic are all
integrated in DYNIQX, and features such as
metadata-driven controls and term clouds are used
for facilitating search.
The architecture of our DYNIQX system is
shown in Figure 1. In Figure 1, first, a user sends a
query to the DYNIQX system. The query is
processed and translated into the appropriate form
for each search service, e.g., PubMed. For each
query, each search engine, e.g., Intute, PubMed, or
Google Scholar, returns a ranked list of search
results. Results from all these ranked lists are
indexed and searched by Lucene (Hatcher and
Gospodnetic 2004).
Unlike typical search engines where the user can
only specify one query at a time, in DYNIQX, the
user can specify a number of queries describing
different aspects of a search topic, e.g., “bird flu”,
“avian influenza”, and “H5N1” etc for finding
documents on “bird flu”. Each query is translated
into the appropriate form for each search service,
e.g., PubMed. For each query, each search engine
returns a ranked list of results. Results are ranked by
their overall relevance scores to the topic in a single
ranked list. For each result, its overall relevance
score integrates the relevance of the result to the
queries based on the Lucene index, rankings and
relevance scores of the result by each search engine,
and metadata associated with the result. Metadata
from these heterogeneous sources are unified for
filtering results. This is illustrated in the DYNIQX
search interface shown in Figure 2.
In Figure 2, in Section A, a user adds a number
of search queries shown in Section B. Statistics of
search results from different search engines are
shown in a table in Section B. The user can
select/deselect search engines in Section E for meta-
search. Once search results are retrieved from search
engines, the user can view a single ranked list in
Section G. When more results arrive, the user clicks
a refresh button in Section A to refresh the single
ranked list. Based on the significance of terms
measured by their document frequencies, a term
cloud is displayed in Section F for filtering result. In
Section D, the user can exclude/include queries in
the meta-search. Metadata associated with search
results are used for re-ranking search results in
Section C.
3 CONCLUSIONS
In this paper, we propose a novel metadata based
search engine called DYNIQX which fuses
information from data collections of heterogeneous
nature. Metadata from multiple sources are
integrated for generating dynamic controls in the
forms of sliders and tick boxes etc for the users to
further filter and rank search results. Since the effect
of metadata in IR has not been sufficiently studied
A NOVEL METADATA BASED META-SEARCH ENGINE
313