layer is DHT connection layer that provides DHT key
search, key redistribution etc. This layer is formed by
links to adjacent nodes in structured P2P, the so-
called “fingers”. The third layer is formed by
connections between similar nodes, where the
similarity is interpreted like an equality of locality-
sensitive hashes.
It is important to note, that links to neighbour
nodes in the third layer are not exactly identifiers of
nodes in P2P network, they are entrances to
anonymized paths to these nodes.
2) The Master node: The distributed nature of the
proposed system causes one hindrance. LSH-based
nearest neighbour search implies that when searching
for the neighbours of object x, all the locality-
sensitive hash functions that were used to hash other
objects and fill hash tables are applied to x. In the
proposed architecture, an object being hashed is a
vector of normalized ratings assigned by the user to
different items of interest and hashing functions
family is represented by random hyperplane
projections. To define a hyperplane the
dimensionality of the space has to be known. In some
cases, for instance, when the rating storage is
centralized, when ratings are immutable or all
possible items are known in advance, knowing
dimensionality is not a problem. However, in case of
distributed rating storage when each node holds only
ratings of one user, overall item space dimensionality
can be found out only though communication
between nodes. Dimensionality means the number of
dimensions as well as their order. It is easy to see that
if one user encounters items in the following order:
(Item1:1, Item2:-0.5, Item3:1), and another user
encounters and rates the same items in another order:
(Item1:1, Item3:1, Item2:-0.5), then their hashes with
hyperplane (0, 1, 0) would be different although the
ratings match perfectly.
Hence, it is needed to synchronize item space
characteristics and random projection hyperplanes
across all nodes. The problem of maintaining a global
shared state in the P2P network is rather nettlesome,
and there are numerous papers dedicated to this
problem, e.g. (Hu, Bhuyan and Feng, 2012; Oster et
al., 2006; Chen et al., 2005). In the proposed system
this problem is addressed in a way similar to the one
presented in (Mastroianni, Pirro and Talia, 2008) and
sacrificing the P2P-purity of the system. It is the
Master node that, first, collects all new items
discovered and rated by peers, maintains their
ordering and generates new locality-sensitive hash
functions. So, each peer must connect to the Master
node in two situations: first, to notify about some
previously unknown item (which should become a
new dimension), second, to get a new set of locality-
sensitive hash functions. It must be noted, that there
is no necessity in generation of new hash functions
after an assessment of each new item. Using outdated
hash functions with lower dimensions is still possible,
but it gradually decreases the quality of
recommendations. So, each user node collects the
new rated items (which were not assigned identifiers
yet) and then sends a batch of these items to the
Master node. The Master node, in turn, accumulates
new items, and when their number is great enough
assigns them an ordering and issues a new set of
locality sensitive hash functions. It is also important
that the new set is not an entire replacement of the
previous, but contains only several new hash
functions.
4.3 Scenarios
This subsection describes how five main scenarios of
the recommendation system are implemented by
means of the proposed architecture. These scenarios
are: attractiveness estimation for a given item,
recommendations query, rating an item, refreshing
hash functions, and the search for similar peers.
1) Attractiveness estimation of a given item:
attractiveness estimation on a node is possible only
after the integration of this node into the P2P network
and locating the nodes of the users with similar
ratings (hereinafter these nodes are referred to as
neighbour nodes). Let the neighbour nodes for the
given one be stored in the Neighbours list. Then
attractiveness estimation for the item is performed by
sending requests to each node from the Neighbours
list passing the item identifier over. Each neighbour
node answers with a binary value meaning if it can
recommend this item to others or not. Attractiveness
estimation for the set of items is done mostly in the
same way, except that the requester node passes the
list of item identifiers instead one identifier and the
answer contains a list of pairs (itemId,
recommend_flag) for all items that the neighbour
node is able to recommend.
Informally, attractiveness estimation scenario can
be interpreted as asking an advice from co-minded
people. In centralized systems it is performed in some
conceptual way, in the proposed hybrid P2P system it
is performed literally sending requests to the
respective nodes. When answering attractiveness
estimation request, a node can base the response on
the rating that is stored for the given item, or infer the
rating from some other information. This is an
extension point of the proposed system architecture.
ICEIS2015-17thInternationalConferenceonEnterpriseInformationSystems
538