TOWARD SOCIAL SEARCH
From Explicit to Implicit Collaboration to Predict Users’ Interests
Luca Longo, Stephen Barrett and Pierpaolo Dondio
Department of Computer Science and Statistics, Trinity College Dublin
Keywords:
Social search, User behavior, Computational trust, Web site classification.
Abstract:
The concept of social search has been acquiring importance in the WWW as large-scale collaborative com-
puting environments have become feasible.This field focuses on the reader’s perspective in order to assign
relevance and trustworthiness to web pages. Although current web searching technologies tend to rely on
explicit human recommendations, these techniques are hard to scale as feedback is hard to obtain. Implicit
feedback techniques, on the other hand, can collect data indirectly. The challenge is in producing implicit
web-rankings by reasoning over users’ activity during a web-search without recourse to explicit human inter-
ventions. This paper presents a comparison between explicit and implicit users’ feedbacks upon web pages.
An experiment, involving 25 volunteers explicitly evaluating the usefulness of 12 thematic web-sites, was per-
formed implicitly gathering their web browsing activity. The results obtained prove the existence of a strong
correlation between explicit judgments and generated implicit feedbacks.
1 INTRODUCTION
The main advantage of systems supporting social
search is that Web pages are considered relevant
and trustworthy from the reader’s perspective rather
than web sites owners. Such solutions contrast with
the majority of the current search engines, above all
Google, whose Page-Rank algorithm assigns impor-
tance to web-pages based on the analysis of the link
structure of the Web. A key open challenge in de-
signing social search systems is to improve the over-
all information seeking and consuming activities on
the Web. Reading time, scrolling, cut-paste are all
considered relevant implicit sources of user prefer-
ences (Kelly and Belkin, 2001). In the current web-
searching technologies, a substantial inhibitor in gath-
ering this content from users is that they tend to be
resistant to invasive techniques and there exists lack
of motivation to actively generate recommendations.
Implicit feedback techniques gather data indirectly
and the key issue is to produce implicit web-rankings
automatically deduced by reasoning on the activity
performed by users over web-pages. In this work
we propose a novel approach to collaborative social
search that analyses users’ actionsduring Internetses-
sions capturing this activity. Such activity embod-
ies implicit ‘human judgement’ where each web-page
has been viewed and endorsed by one or more peo-
ple concluding that it is There are three key benefits
of such a solution. First, as each result has been se-
lected by users, by reasoning on their behaviour, it
is possible to obtain a relevant degree of trustworthi-
ness. Second, the social search engine operates con-
currently over continuously updating of user activity
and so it is well positioned to display stronger results
more current or in context with changing information.
Third, it is possible to reduce the impact of link Spam
by relying less on link structure of web-pages. The
paper is structured presenting in 2 related works; in
3 we underline the hypothesis, we describe the ex-
periment and the formal model commenting obtained
results. We conclude in 4, with future work and open
issues.
2 RELATED WORK
The concept of social search has been acquiring im-
portance as the World Wide Web grows in size and
web-searching technology has become an essential
need in web-browsing. Several methods have been
conceived from the simplest, based on sharing book-
marks, to more sophisticated approaches that com-
bine human intelligence with computer paradigms
(Agichtein, et al,.2006) supporting collaborativegath-
693
Longo L., Stephen B. and Dondio P.
TOWARD SOCIAL SEARCH - From Explicit to Implicit Collaboration to Predict Usersâ
˘
A
´
Z Interests.
DOI: 10.5220/0001841406840687
In Proceedings of the Fifth International Conference on Web Information Systems and Technologies (WEBIST 2009), page
ISBN: 978-989-8111-81-4
Copyright
c
2009 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved
ering and collaborative directories. In (Atterer
et al.,2006) the authors focused on tasks such as clas-
sifying the user with regard to computer usage profi-
ciency or making a detailed assessment of how long it
took users to fill in fields of a form. They developed
an HTTP proxy that collects data about mouse move-
ments, keyboard input and more. Similarly in the
work of Velayathan and Al. (Velayathan and Yamada,
2007), an unobtrusively framework logs and analyses
users’ behaviour to extract effective rules to evaluate
web-pages using a machine-learning techniques. In
the work reported in (Kelly and Belkin, 2001) the au-
thors focused on the hypothesis that users will spend
more time, scroll more often and interact more with
those documents they find relevant. Similarly, in the
work of Weinreich et Al., (Weinreich et al., 2006) au-
thors found that users spend less than 12 seconds on
nearly 50% of the web-pages shown to them demon-
strating users make nearly 50% of their decision to
navigate to the next page before reading substantial
part of the contents.
Collaboration is a process where people interact
each other toward a common goal, by sharing their
knowledge, learning and building consensus. There
are two main way to provide judgement: explicitly
and implicitly. In the former way, users can pro-
vide feedback using a specific metric, for example as
in eBay and Amazon community. In the latter way,
implicit judgements are inferred from user behaviour
while doing a specific action. Collaboration applied
to the Web 2.0 supports a new kind of shared intelli-
gence, named Collective Intelligence where users are
able to generate their own content building up an in-
frastructure where contributions are not merely quan-
titative but also qualitative.
The relevance of Trust and Reputation in human
societies is indisputably recognised A trust-based de-
cision is a multi-stage process on a specific domain.
This process starts identifying and selecting pieces
of trust evidence, generally domain-specific, conduct-
ing an analysis over the application involved. Subse-
quently, trust values are produced performing a Trust
computation over the pieces of evidences estimating
the trustworthiness of entities in the domain consid-
ered. Both the previous steps are informed by a no-
tion of trust in the Trust model and the final Trust de-
cision is taken by considering the computed valued
along with exogenous factor like disposition or risk
assessments. The proliferation of collaborative envi-
ronments represent good examples in which Compu-
tational Trust paradigms are applied in order to eval-
uate the trustworthiness of virtual identities. Longo et
al. (Longo et al., 2007) conceived a set of rare trust
evidences based on time and applied on Wikipedia,
demonstrating how plausible Trust decisions can be
reached using exclusively temporal factors. Team-
work and co-operation (Montaner et al., 2002) repre-
sent other areas where the game theory is the predom-
inant paradigm considered to design Computational
Trust models.
3 IMPLICIT/EXPLICIT
COLLABORATION:
EXPERIMENT AND TRUST
MODEL
The hypothesis behind this work is to understand
whether, taking into account an entity and applying
Computational Trust paradigms by using reasoning
techniques, explicit human judgements are correlated
with the corresponding implicit derived feedback. We
explore this question in the context of web-page me-
dia. If the answer is positive, i.e., there exists a cor-
relation between them, it is possible to build up a col-
laborative environment achieving good predictions in
a non-invasive way. In particular, we can conclude
that, examining users’ behaviour while surfing the In-
ternet, we can generate a set of ranked results where
the top ones represent the most valuable content con-
sidered by users and thus, by implication valuable to
other similar users. We refer at this kind of collabora-
tion as implicit collaboration to distinguish from the
classic, explicit collaboration, where users expressly
provide feedback, evaluations and judgements. Our
solution was to log all the activity in the browser gath-
ering the main events(E
i
) that may occur during an In-
ternet session. The logger does not perform any kind
of computation, it does not apply any Computational
Trust paradigms nor does it filter out events.
We conducted experiments in order to investigate
the ability of our approach to gather logs of user be-
haviour. 25 unpaid volunteers, with different back-
grounds, were recruited to participate in this study.
We asked each of them to organise a trip to Morocco,
2-weeks long, surfing a pre-defined list of web-sites
from which it is possible to collect information about
popular cities, transports, hotels. We proposed a list
of 12 selected urls, that users can use within 60 min-
utes in which they have to naturally interact with the
browser, collecting useful data, cutting and pasting
relevant information, bookmarking interesting pages,
submitting data, saving picture or documents in order
to recover this information in the future. Finally, we
ask to each of them to explicitly provide a judgement
of the usefulness of each web-site using a common
scale from 1 to 10 (1 means not useful and 10 means
WEBIST 2009 - 5th International Conference on Web Information Systems and Technologies
694
Table 1: Event hierarchy with associated weights.
Type Description Weight
1 very rare E
1
: save as (page) W
1
= 22%
2 rare E
2
: bookmark W
2
= 18%
3 rare E
3
: printing W
3
= 18%
4 not frequent E
4
: save as (picture) W
4
= 10%
5 not frequent E
5
: download W
5
= 10%
6 frequent E
6
: cut & paste text W
6
= 8%
7 frequent E
7
: search text W
7
= 8%
8 very frequent E
8
: form input W
8
= 4%
9 very frequent E
9
: scrolling W
9
= 2%
very useful). In this experiment we assume that vol-
unteers act not maliciously, hence the data contained
in the logs file is the proper representation of the real
actions performed by them while surfing the Inter-
net. We assume also that volunteers do not change
their behaviour in order to alter generated logs data.
At the end of the experiment, a set of noisy infor-
mation is obtained for each user, containing his ac-
tivity while surfing the given web-sites. Since vol-
unteers can jump from one given web-site to another
one, gathered logs need to be filtered and aggregated
to produce a well defined set of data that we refer as
‘user-behaviour pattern’. For these reasons, we devel-
oped a function named ‘filter/aggregator’ that analy-
ses the data, filters all the urls not in the given set
and aggregate it grouping per web-page. This mod-
ule produces 12 user-behaviour patterns’ containing
the occurrences of events, one for each given web-
sites. At this stage, it is now possible, by applying
Computational Trust paradigms, to generate a unique
value of a given pattern indicating the usefulness of
a given web-site for a specific user. More than one
Computational Trust factors may be adopted and each
of their outputs may be aggregated to obtain a more
precise usefulness/trustworthiness degree for a given
web-site. Considering the filter/aggregator’ function,
12 unique values, one for each given web-site are
generated. We defined a basic Computational Trust
model that extracts the occurrences of events in the
‘user-behaviour pattern’ and compute a real value in
the range [0..1]. Since ‘scrolling’ events are more fre-
quent than ‘save as’ events or a ‘cut & paste’ should
be less important than ‘bookmark’ events, a hierarchy
is needed to discriminate their importance. The goal
of this paper is not to study a hierarchy of such events
hence we consider the work presented by (Velayathan
and Yamada, 2007), in which the authors provide the
frequency of the most and the least events performed
in their experiment while surfing the Internet, to pro-
pose our ‘event-hierarchy’ with weights, as described
in the table 1.
The final real value of the model is computed
by aggregating the occurrences of each event con-
tained in the user-behaviour pattern’ and by using
the ‘event-hierarchy as shown in the following for-
mal model:
Trust
value
: BP[] [0..1]
Trust
value
(BP[]) =
n
i=1
S(BP[E
i
])
S =
0 if E
i
, BP[E
i
] = 0
1
4
W
i
if BP[E
i
] = 1 & E
i
= 6, 7, 8, 9
1
2
W
i
if BP[E
i
] = 2 & E
i
= 6, 7, 8, 9
W
i
if BP[E
i
] 2 & E
i
= 6, 7, 8, 9
1
2
W
i
if BP[E
i
] = 1 & E
i
= 4, 5
W
i
if BP[E
i
] 2 & E
i
= 4, 5
W
i
if BP[E
i
] 1 & E
i
= 1, 2, 3
where BP is the ‘user-behaviour pattern vector, E
i
is a specific event and n is the cardinality of the possi-
ble events. For frequent and very frequent events, the
model assigns full corresponding weight (W
i
), if and
only if events occurred more than twice, otherwise
1
2
of weight is returned for 2 occurrences,
1
4
for just one
and 0 otherwise. For not frequent event two occur-
rences are enough to set full weight and
1
2
for just one
occurrence. Eventually, for rare events such as ‘book-
marks’, ‘save as page’ and ‘printing’, just one occur-
rence is relevant, hence full corresponding weight is
assigned. Taking into account both the 12 values pro-
duced by using our schema and the 12 judgements
provided by each volunteer, we can use statistical cor-
relation indexes to test the hypothesis. In particular,
in this work we adopt the Pearson’s correlation co-
efficient that measures the strength of the linear de-
pendence between the implicit and the explicit val-
ues. If the correlation value obtained by considering
the implicit value and the explicit judgement, for a
given web-site and a given volunteer, tends to 1, a lin-
ear equation describes the relationship positively with
the implicit value increasing with the explicit value.
A score of 1 shows the inverted relationship and a
value tending to zero shows that there is no linear re-
lationship so the variable considered are independent.
In order to test the hypothesis, we expect high correla-
tion values, one for each use: if the majority of these
values tend to 1, our hyphotesis is confirmed and we
can sustain there exists a strong relationship between
implicit feedback and explicit human judgements as
captured by our model.
The set of Pearson’s correlation values obtained
from the experiment are encouraging. The mean of
the users is 0.6 and more than 50% of them are above
the value of 0.7. The 24% of them has a correla-
tion value less then the threshold of 0.5. The 12%
TOWARD SOCIAL SEARCH - From Explicit to Implicit Collaboration to Predict Users' Interests
695
of user show a low relationship between the explicit
judgement and the implicit derived value: this fact
shows our method did not succeed for 3 people. In 2
cases the experiment returns negative correlation co-
efficients, hence implicit and explicit values have an
inverted relationship. 68% of users exhibit a corre-
lation value above the mean. The strategy proposed
is not strong in the short term: a web-site visited
by a couple of users may have an average of trust-
worthiness higher than a web-site visited by thousand
of users. An approach to resolve this problem may
be the adoption of a threshold, explicitly defined or
learned with unsupervised techniques, indicating the
minimum number of users who had to have implicitly
viewed the same site.
4 CONCLUSIONS AND FUTURE
WORK
In this study we performed a context-dependent com-
parison between explicit human judgements, pro-
vided by volunteers, and implicit judgements derived
by using Computational Trust techniques. Through
an experiment we demonstrated how, taking into ac-
count a digital entity as a web-site, human explicit
judgement can be strongly connected to the implicit
derived value on the same entity. The evaluation was
conducted by considering 12 Urls evaluated by users
explicitly providing a degree of usefulness. During
browsing sessions we logged users’ activity and a
behaviour-pattern, containing the occurrences of gen-
erated events, was extracted for each of them and each
web-site. Computational Trust paradigms helped us
to automatically evaluate these patterns and to gen-
erate trustworthiness values. The Pearson’s coeffi-
cient was used to study the correlation between ex-
plicit users’ judgements and derived usefulness val-
ues. Even considering a small number of users and
a basic Trust model, encouraging results were ob-
tained proving our hypothesis and underlying how
it is possible to automatically evaluate entities such
web-pages, by reasoning on users’ activity over them.
This is a new approach and thus there is further work
to do. Authors believe it represents a start point
to predict users’ interests and to build up a third-
generation social search engine based on implicit col-
laboration. Future works will be focused on exper-
iment in malicious environments taking into account
privacy/anonymity issues. New reasoning techniques
should be investigated to better evaluate web-pages
and new algorithms are needed to semantically con-
nect searching queries to relevant sets of Urls gener-
ated by our schema.
REFERENCES
Kelly D. and J. Belkin N. (2001). Reading Time, Scrolling
and Interaction: exploring Implicit Sources of User
Preferences for Relevance Feedback During Interac-
tive Information Retrieval, SIGIR 2001, New Orleans,
LA, USA.
Agichtein E., Brill E. and Dumais S. (2006). Improving
Web Search Ranking by Incorporating User Behavior
Information. SIGIR 2006, USA.
Atterer R., Wnuk M. and Schmidt A. (2006). Knowing
the User’s Every Move - User Activity Tracking for
Website Usability Evaluation and Implicit Interaction.
WWW 2006, Edinburgh.
Velayathan G. and Yamada S. (2007). Behavior-based Web
Page Evaluation, WWW 2007, Canada.
Longo L., Dondio P. and Barrett S. (2007). Temporal Fac-
tors to evaluate trustworthiness of virtual identities.
IEEE SECOVAL 2007, France.
Weinreich H, Obendort H., Herder E. and Mayer M. (2006).
Off the Beaten Tracks: Exploring Three Aspects of
Web Navigation. WWW 2006.
Montaner M., Lopez B. and De La Rosa J. (2002). De-
veloping Trust in Recommender Agents. AAMAS-02,
Bologna, Italy, 2002.
WEBIST 2009 - 5th International Conference on Web Information Systems and Technologies
696