this limitation in coverage inherited by social
algorithms, benefiting in the same time from the
accuracy of social-based recommendations not
sufficiently supported by collaborative filtering
methods. Similarly, the work introduced in (He and
Chu, 2010) show that the collaborative
recommendation system benefits from the social
annotations and friendships established among users,
items and tags. Only approaches presented in (Kang
et al., 2013); (Sohn et al., 2013) use degree centrality
as an SNA measurement along content-based
filtering with FOAF (Friend of a Friend) ontology to
compute centrality of each tag, respectively degree of
importance of the particular user, and that way
recommend content.
In (Shokouhi, 2013), a personalized auto-
completion ranker is presented which takes into
consideration demographic-based features, i.e.,
age, gender and location extracted from Microsoft
Live profiles of users when searching via Bing.
Results on the effectiveness of the ranker before and
after personalization (re-ranking) show that
demographic features significantly improve ranking
when compared to the (no-reranking) baseline.
Utilizing user-specific data for improved query
suggestion by re-ranking the original results obtained
by traditional ranking approaches is not new and has
been approached by several studies already. Authors
in (Wu et al., 2015) employ user generated ratings and
comments of books in Amazon as helpful metadata
when suggesting social books while searching.
Further in (Cheng and Cantú-Paz, 2010), a framework
for the personalization of click models in sponsored
search is presented which bases on user-specific and
demographic-based features that reflect the click
behavior of individuals and groups.
To the best of our knowledge, none of these
existing systems considers users acting as nodes in a
unimodal graph and their analysis with SNA
techniques in a collaborative filtering (CF) approach
to recommend query to a given user.
3 OUR APPROACH
Our SNA-based approach of query recommendation
takes into account some personal attributes of users,
like home city and gender, as well as their query topic
or categories (e.g., politics, or sports). Social network
analysis (SNA) metrics are applied over the generated
uni-modal user-user network in order to generate the
similarity matrix.
3.1 System Architecture
Figure 1: System architecture.
In Figure 1, the architecture of our proposed SNA-
based system of query recommendation system is
depicted. At the input, the system is supplied with the
following type of data: the user’s social profile data
(e.g., its gender, and home city) and the query posted
by the user. Based on input data, a similarity matrix
is generated which serves to find the most similar user
to the current user. After this step, if there is more
than one concurrent user, ranking of users using SNA
metrics, either degree or authority centrality is next
performed. Final step is searching in query log for
queries with most similar keywords to those
submitted by concurrent users. Regarding query of
current user filtering of queries is made using Jaccard
similarity coefficient (Phillips, 2013). Two datasets
have been used in our proposed system. First dataset
contains data from AOL search engine during three
months of 2006. It consists of data about the user id
in anonym form such as AnnonID (which expected to
be replaced by real User ID in a future), the posted
query itself, as well as the query time field and the
rank field. Second dataset comprises of data gathered
from Text Retrieval Conference (TREC), published
during 2001-2014. Web queries retrieved from TREC
dataset contain topic of the query along with the co-
clicked query, the actual query, and the clicked URL.
Data from two datasets have been merged into a
single collection using the matching keyword criteria.
From AOL dataset one of six available user’s
collection of queries have been used in our scenario,
it contained 3013956 queries, while TREC dataset
contains 5980 queries belonging to 350 distinct
topics. Topics from TREC dataset have been further
categorized into 8 categories, according to Google
Trend Search for a better grouping purposes and due
to inappropriate grouping of topics from AOL
datasets. For instance some of topics from AOL
dataset were: hunger, Chevrolet Trucks and deer, so
it was necessary to merge these topics (queries) in one
of eight categories (Lifestyle, Travel & Leisure and