Friendship prediction is seen as a subtask of link pre-
diction. However, there are three major difficulties
with friendship prediction not solved by existing link
prediction methods;
One-class Setting. Since it is virtually impossible to
monitor all activities of a pair of people, it is pos-
sible that they are friends even if no interaction
is observed between the pair. Thus, in practice,
we cannot distinguish known absent link from
unknown link. Therefore, available observations
consist of known present links only, i.e. positive
samples only, and no negative samples are avail-
able for learning.
Sparse Friend Network. In general and practical
settings, the number of friends is extremely small
compared to the total number of participants. In
conjunction with the one-class setting, this yields
a small number of known links, i.e. most links re-
main unknown.
Affinity Differs from Similarity. General node sim-
ilarity approaches such as cosine similarity do not
always match the characteristics of friendship, es-
pecially given that node information is highly di-
mensional and sparse.
Existing missing link prediction methods can be
classified into two types: (1) topological-information-
based and (2) node-information-based. The first type
uses only known network structure such as common
adjacent nodes or paths between nodes. Well-known
methods are Jaccard Similarity (Lü and Zhou, 2011)
and Adamic/Adar (Adamic and Adar, 2003). These
methods are problematic if the known network is
sparse. The second type uses information of nodes
as well as known network structure; they try to pre-
dict links even if a node is isolated from known links.
Since the number of known links is small for friend-
ship prediction as mentioned above, the proposed
method takes the node-information-based approach.
Two state-of-the-art node-information-basedlink pre-
diction methods were proposed recently.
Latent Feature Model: Menon et al. proposed a su-
pervised learning method to predict links by ap-
plying latent feature model (LFL) in (Menon and
Elkan, 2010) and (Menon and Elkan, 2011). This
method tries to minimize the loss between pre-
dicted results and known present/absent links by
adjusting latent features. Yang et al. proposed
Joint Friendship and Interest Propagation (FIP)
which combines latent feature models for user–
user and user–item (Yang et al., 2011).
Link Propagation: Kashima et al. proposed the
Link Propagation method; it tries to propagate
known links using observed node features with
pre-specified kernel (Kashima et al., 2009). If ob-
served node features are highly dimensional and
sparse, it is not trivial to construct the proper ker-
nel.
Besides the works related to link prediction, sev-
eral data-oriented approaches have been reported for
friendship prediction. The data used are;
1. Location data such as GPS coordinates: (Wang
et al., 2011), (Eagle et al., 2009), (Scellato et al.,
2011),
2. Bluetooth encounter data: (Quercia et al., 2010),
(Eagle et al., 2009),
3. Call records: (Wang et al., 2011), (Mirisaee et al.,
2010).
Although location trajectory and encounter data
show a strong correlation to friendship, they cap-
ture only the relationships that yield frequent physi-
cal contacts. Call records are hamstrung by privacy
issues and so are impractical for this purpose.
The authors’ basic idea to overcoming the one-
class setting and the sparsity of friend networks is to
incorporate rich user information for friendship pre-
diction. Here, rich user information means applica-
tion execution and web browsing histories. Applica-
tion execution and web browsing are universal activi-
ties for any smartphone user, and so it is reasonable to
expect those histories to be available for most users,
unlike friendship links. Since standard operating sys-
tems can create the logs needed, it is also practical in
terms of deployment. Moreover, application execu-
tion and web browsing histories are potentially infor-
mative enough, since recent research (Fujimoto et al.,
2011) showed that user’s interests can be extracted
from the web browsing history. However, note that
we need to extract interests from histories, since they
are expressed by items and are not directly observ-
able.
Yang et al. proposed a strategy similar to that of
the FIP model in (Yang et al., 2011); it enables the
incorporation of user–item interaction into user–user
modeling. However, they assume that the item is the
key point of interest and focus in obtaining latent fea-
tures from the observed node (i.e. user and item) in-
formation, not from user–item interaction, because of
its different problem setting.
Here, the authors propose to employ matrix factor-
ization to extract latent user features from each user’s
application execution and Internet access records.
The authors believe matrix factorization on user–item
interaction is promising as a method to identify latent
features, since it is successful in collaborativefiltering
by extracting users’ and items’ latent features from
KDIR2012-InternationalConferenceonKnowledgeDiscoveryandInformationRetrieval
200