of Web search) there are six approaches to web
search personalization, namely relevance feedback
and query modification, personalization by content
analysis, recommender systems, personalization by
link analysis, social search engines, and finally
mobile and context-aware searching. Finally, there
are some works that tackle the same problem
characterizing and modeling the dynamic nature of
internet itself as a medium that carries the
disseminated information (Anagnostopoulos and
Stavropoulos, 2006). Moreover, in (Predictive
Modeling of First-Click Behavior in Web-Search)
the authors predict user surfing patterns improving
in this way web search.
2.1 Our Approach
In this paper the proposed personalization algorithm
is a client-side agent that provides meta-results
based on the users’ web search interactions using
five different web search services. The term meta-
result defines a re-ranked result, which was acquired
from one or more third party search services and it is
presented to the user without labeling its source(s).
In our approach, we assume that past search
behavior is an indicator of the user’s future behavior,
as a basis for user modeling. The construction of the
personalized preferences is performed in a totally
transparent way, without interferences in the users’
browsing behavior, while the merged meta-results
are presented without labeling their source, ensuring
in this way that the user is completely unbiased to
his preferences. The only feedback that the user
receives is a text paragraph regarding the URL as
most of web search engines do. The personalized
preferences are recorded client-side and they are
updated continuously according to the meta-results
acquired by the user, the time spent for their
exploration as well as the search depth in terms of
hyperlinks. Thus, the user’s profile is also adjusted
to any possible changes in respect to his information
needs.
The data stored for the personalized algorithm,
define some search preference features regarding the
information explored by the user through the visited
meta-results (query, involved search service, ranking
position, timestamp, link depth). Every time the user
browses a URL from the merged meta-results, these
features are kept into an XML file and update in
parallel the weights that define the user’s confidence
(or priority) in respect to each of the employed web
search services. In the timestamp field the time
where the user spends in order to explore the
specific result is recorded. An instance of the XML
files that stores the above information is depicted in
Figure 1. Summarizing, the data regarding the
personalization, are implicitly gathered through the
user’s interaction with the system, so the user is not
biased nor encumbered with submitting information
to the meta-search engine. Personalized data are
stored client-side on the user’s machine, providing
privacy and security.
As far as the adaptation is concerned, the
algorithm is dynamically adjusted in order to reflect
the user’s current interests and preferences, while
the profile is updated on-line (during the search on
the returned meta-results) so it instantaneously adapt
to the user’s behavior during his search. Using such
adaptation mechanisms the proposed meta-search
engine process and re-ranks third party search
results.
In order to personalize the similarity of a meta-
result in respect to the user’s preferences in our
approach, we use two probabilistic functions. These
functions assign a probability value according to the
time where the user spends for information
exploration as well as according to the depth of the
investigated link (web page). As depth we define the
number of hyperlinks used from the initiation of the
search where the starting point is the meta-results,
until the URL reached by the user.
We consider time as an important factor in
personalization since the more time the user spends
exploring a specific result, the more this result is
possible to be relevant and vice versa. The period of
time consumed during a search session was modeled
according to a standard lognormal distribution since
this distribution fits with the results made in respect
to web search behavior (Mondosoft Development
Team - White paper, 2004).
The methodology and solutions presented in
(Mondosoft Development Team - White paper,
2004), were based on an extensive set of real-world
data gathered from 400 widely varying web sites
that use a hosted-based search solution. According
to the above study it was concluded that web users
want to obtain search results as fast as they can and
with the minimum possible effort. The typical
behavior of a web user is similar to the
aforementioned pattern as stated in (Mondosoft
Development Team - White paper, 2004).
Consequently, based on this pattern we once more
modeled the web user search depth link behavior
according to a lognormal survival function.
As far as the time spent in information
exploration by the user is concerned, we assumed
that if the investigation of a proposed result (or a
result that derived from further link search) during a
WEBIST 2007 - International Conference on Web Information Systems and Technologies
68