Filtering Relevant Facebook Status Updates for Users of Mobile Devices

Stephan Baumann

, Rafael Schirru

1,2

and Joachim Folz

German Research Center for Artiﬁcial Intelligence, Trippstadter Straße 122, 67663 Kaiserslautern, Germany

University of Kaiserslautern, Gottlieb-Daimler-Straße 47, 67663 Kaiserslautern, Germany

Keywords:

Content-based Information Filtering, Social Networks, Mobile Devices, Classiﬁer-based Approach.

Abstract:

In recent years, social networking sites such as Twitter, Facebook, and Google+ have become popular. Many

people are already used to accessing their individual news feeds ubiquitously also on mobile devices. However

the number of status updates in these feeds is usually high thus making the identiﬁcation of relevant updates a

tedious task. In this paper we present an approach to identify the relevant status updates in a user’s Facebook

news feed. The algorithm combines simple features based on the interactions with status updates together with

more sophisticated metrics from the ﬁeld of Social Network Analysis as input for a Support Vector Machine.

Optionally the feature space can be extended by a topic model in order to improve the classiﬁcation accuracy.

A ﬁrst evaluation conducted as live user experiment suggests that the approach can lead to satisfying results

for a large number of users.

1 INTRODUCTION

Online social networks such as Facebook, Twitter,

and Google+ have experienced an enormous growth

in their user bases in the last years. They are ubiqui-

tous and people have become used to accessing them

on a variety of different devices such as PCs, tablets,

and smartphones. With an average of 130 friends

per user

and the introduction of frictionless sharing

many users of the Facebook platform experience an

information overload, making the manual identiﬁca-

tion of relevant status updates in their news stream a

tedious task. Accessing the platform from mobile de-

vices with limited screen size aggravates the problem.

Facebook does advanced ranking for personalized

placement of objects in the news feed of a user using a

proprietary and trademarked algorithm. The EdgeR-

ank TM is based on the afﬁnity of a user to objects

and subjects in the Facebook system as well as the

weight of objects and relevance over time, i.e. decay.

Several U.S. patents describe rather vaguely how the

algorithm combines afﬁnity, weight and decay. The

critics of this approach claim that Facebook is using

EdgeRank TM in order to control the ﬂow of informa-

tion to optimize their own business model. Facebook

does not offer API access for third-party developers

or researchers.

Therefore in this paper we present our approach

http://www.facebook.com/press/info.php?statistics

to identify relevant status updates in a user’s Face-

book news feed including algorithmic details. The

method combines simple features based on the inter-

actions with status updates together with more sophis-

ticated metrics from the ﬁeld of Social Network Anal-

ysis as input for a Support Vector Machine. Option-

ally a topic model can be added to the feature space.

2 RELATED WORK

In order to determine the status updates that are rel-

evant for a particular user, we consider the extrac-

tion of opinion leaders (friends with a high authority

in the user’s network) as well as the identiﬁcation of

the ﬂow of information (content that is shared in the

user’s social network) as important aspects. In order

to measure the importance or authoritative power of

linked objects in a network the well-known PageR-

ank (Page et al., 1998) and Kleinberg’s HITS algo-

rithm (Kleinberg, 1998) are de-facto standards. While

PageRank is using the Random-Surfer model to com-

pensate real-world effects in large networks, HITS is

well-suited for computations of its hub and authority

values in smaller networks. For this reason we chose

HITS over PageRank for the implementation of rele-

vance in our approach. Further we ﬁnd the following

work particularly relevant for our method:

(Roch, 2005) identiﬁed two different sets of at-

353

Baumann S., Schirru R. and Folz J..

Filtering Relevant Facebook Status Updates for Users of Mobile Devices.

DOI: 10.5220/0004188003530356

In Proceedings of the 5th International Conference on Agents and Artiﬁcial Intelligence (ICAART-2013), pages 353-356

ISBN: 978-989-8565-38-9

 2013 SCITEPRESS (Science and Technology Publications, Lda.)

tributes that characterize opinion leaders. The at-

tributes in the ﬁrst set are speciﬁc for the person

(e.g., the sources of information he/she relies on). At-

tributes from the second set characterize the milieu

of a person. Roch’s research suggests that opinion

leaders have an information advantage relative to the

others in their environment.

(Boccara, 2008) investigates the formation of

opinions in populations under the inﬂuence of opin-

ion leaders. An agent-based model is used to simu-

late the adoption of opinions under several adoption

rules over time. An agent’s opinion is inﬂuenced by

its neighbors and its awareness, where a low aware-

ness means an individual is very likely to adopt new

opinions and vice versa. Opinion leaders are then se-

lected from those individuals with a maximum num-

ber of individuals they inﬂuence and with equal prob-

ability given one of two distinct opinions. They also

possess maximum awareness and thus never change

their opinion.

(Weng et al., 2010) present TwitterRank, an algo-

rithm identifying inﬂuential Twitter users. The algo-

rithm works as follows: First it extracts the topics

of tweets based on their content. In the second step

topic-speciﬁc relationship networks are constructed

among the twitterers. Finally, the TwitterRank algo-

rithm is applied which is an adaptation of the PageR-

ank algorithm taking the topical similarity between

twitterers and the link structure into account.

3 APPROACH

The approach presented in this section aims at identi-

fying the relevant status updates in a user’s Facebook

news feed. We set the terminology as follows: The

user is the person whose news feed is currently be-

ing analyzed. The people in the user’s network are

referred to as friends. Expressing a “like” for a status

update or commenting on a status are referred to as

interactions.

3.1 Crawling the User’s News Feed

In the ﬁrst step we crawl the status updates and their

associated interactions (comments and likes) from the

user’s Facebook news feed and store it in a relational

database. We use the Graph API

in order to obtain

the required data. Currently our system supports sta-

tus updates of the following types: message, photo,

video, link, new friendship, and check-in.

http://developers.facebook.com/docs/reference/api/

3.2 Feature Extraction

From the user’s news feed we extract features that are

associated with a friend, e.g, her hub and authority

values. We call these features friend-speciﬁc features

as they are the same for all status updates of the re-

spective friend. Further features are extracted that are

speciﬁc for each status update, e.g., the number of in-

teractions a particular status update has. These fea-

tures are called status-speciﬁc features. The features

will be described in more detail subsequently.

3.2.1 Interaction Percentage

The interaction percentage is a friend-speciﬁc feature.

It determines the relative amount of interactions a user

had with the status updates of a particular friend. Let

be the number of status updates friend f has posted

in the user’s crawled news feed and let I

be the num-

ber of status updates of friend f the user has interacted

with. The interaction percentage F

is calculated as

follows:

(1)

3.2.2 Hub and Authority Values

For each friend we calculate the hub and authority

scores following (Kleinberg, 1998). To determine

these values we interpret the user and his/her friends

as nodes in a directed network. Each interaction on a

status update is interpreted as an edge from the orig-

inator of the interaction to the publisher of the status

update. Figure 1 shows an example of actions and

interactions in a user’s personal network. Following

Kleinberg an authority in such a network is a node

with a high amount of inﬂuence, i.e., a node having

many incoming edges. In our scenario this corre-

sponds to a large amount of interactions on the per-

son’s status updates (cf. friends on the right hand side

of Figure 1). A hub on the other hand is considered a

node that links to authoritative nodes. In our scenario

this means that a hub often interacts with inﬂuential

persons in the user’s network (cf. friends on the left

hand side of Figure 1).

3.2.3 First Publisher

The ﬁrst publisher feature (F

f p

) is a Boolean ﬂag that

indicates whether a friend has contributed one or more

status updates as the ﬁrst person in a user’s network

that have been shared by another person afterwards. It

is intended to measure the information advantage of a

Facebook friend (Roch, 2005). F

f p

= 1 if the friend is

a ﬁrst publisher and F

f p

= 0 otherwise. Obviously, the

ICAART2013-InternationalConferenceonAgentsandArtificialIntelligence

354

Friend A

Friend B

Friend C

Friend D

post

comment or like

status update

Friends(

News(Feed(

Friends(

A(User‘s(Personal(Network(

Figure 1: Example of actions and interactions in a user’s

personal network.

ﬁrst publisher feature belongs to the friend-speciﬁc

features.

3.2.4 Interaction Count

This feature measure the number of interactions a sta-

tus update has received from the people in the user’s

network. The interaction count is a status-speciﬁc fea-

ture.

Please note that interactions outside the user’s net-

work are not taken into account. For that reason sta-

tus updates of celebrities or companies the user fol-

lows do not necessarily have a high interaction count,

when not many of the user’s friends follow the respec-

tive pages and interact with their published content.

3.2.5 Topic Model

From the status updates in the user’s Facebook news

feed we calculate a topic model using Latent Dirichlet

Allocation (LDA) (Blei et al., 2003). Before apply-

ing the algorithm each status update is preprocessed

comprising stop word removal and stemming for En-

glish and German updates. Further string preprocess-

ing steps are conducted (e.g., terms are converted to

lower case characters, punctuation characters are re-

moved). Then the LDA algorithm is run. Each identi-

ﬁed topic is represented as a dimension in the feature

space with a boolean value indicating whether the sta-

tus update that is currently being analyzed is associ-

ated with the topic or not.

3.3 Training the Classiﬁers

We use Support Vector Machines (SVMs) (Cortes and

Vapnik, 1995) with an RBF kernel in order to classify

Facebook status updates as relevant or irrelevant. The

software library used for this purpose was the Java

implementation of LIBSVM.

We trained two differ-

ent conﬁgurations of the SVM using the features de-

scribed above. The features associated with the topic

model were however only used with the second con-

ﬁguration. All features were normalized to values be-

tween 0 and 1 before fed to the classiﬁer.

4 EVALUATION

4.1 Data Set

We conducted a live user evaluation experiment with

twelve participants that were between 19 and 48 years

old. For this purpose users with more than 100 friends

in their Facebook network were selected thus making

sure that enough status updates could be obtained for

each user.

We crawled the last 60 updates in each user’s news

feed and deleted those that did not have a text mes-

sage attached. For the experiment we aimed at having

50 status updates for each user. The data was split

into 30 instances for the training set and 20 instances

for the test set. If less than 50 status updates could

be obtained the training set was reduced accordingly.

For two users less than 40 updates with a message at-

tached were available. We excluded these users from

the experiment as the data was not sufﬁcient for a re-

liable analysis.

4.2 Procedure and Measures

First, the users had to rate the status updates from

their training set. The rating was on a binary scale

indicating whether a status update was relevant for

a participant or not. For those status updates with

additional information (e.g., photos, or videos) we

provided links where the users could access the ac-

tual content. After the training set was rated, a topic

model was created and the classiﬁers were trained as

described in Section 3.3.

When building the topic model we set the num-

ber of topics that had to be extracted to four. Each

topic was represented by one dimension in the input

matrix for the support vector machine. We selected

a relatively small topic number in order to avoid an

overstated inﬂuence of the topic model over the graph

based features. We set the document topic prior α to

0.1 and the topic word prior β to 0.01.

http://www.csie.ntu.edu.tw/∼cjlin/libsvm/

FilteringRelevantFacebookStatusUpdatesforUsersofMobileDevices

355

Then the predictions of the classiﬁers were calcu-

lated for the test set and stored in the data base. In

the next step the users rated the status updates in their

test set, again on a binary scale. For each classiﬁer we

determined the precision, recall and f-measure.

In the next section we compare the results of our

method using a feature space without the topic model

against the results using a feature space that includes

the topic model. It would have been worthwhile to

compare these results with Facebooks EdgeRank TM

algorithm, however to the best of our knowledge a

user’s news feed ﬁltered according to the EdgeRank

TM algorithm is not available via the Graph API.

4.3 Results

The SVM trained with the feature space including the

topic model made better results in terms of our se-

lected measures. Here our approach achieved an av-

erage precision of 0.51 at an average recall of 0.47

and an average f-measure of 0.48. It should be noted

that for two participants the precision and recall val-

ues were 0. When talking to those users after the ex-

periment, it turned out that they did not have many rel-

evant status updates in their training set thus making

the learning of a model with a high predictive accu-

racy difﬁcult. We assume that with more training data

the ﬁltering for these participants could be improved.

The results for the SVM trained with the feature

space without topic model were slightly worse. In

that setting we achieved an average precision of 0.46

at an average recall of 0.38 and an average f-measure

of 0.39. Using the reduced feature space three partic-

ipants had precision and recall values of 0.

For both settings six of the ten participants could

achieve an f-measure of 0.5 or better. The results for

each participant and the two feature spaces applied

are depicted in Table 1.

5 CONCLUSIONS

In this paper we presented an approach to identify

the relevant status updates in a user’s Facebook news

feed. The method combines features based on the

interactions with status updates together with met-

rics from the ﬁeld of Social Network Analysis, and

an LDA topic model as input for a Support Vector

Machine. A ﬁrst evaluation conducted as laboratory

study suggests that the approach can lead to satisfying

results for a large number of users. However a larger

ﬁeld study is needed to further prove the usefulness of

the ﬁltering algorithm.

Table 1: Results of the evaluation experiment (PRC: Preci-

sion, RCL: Recall, F: F-Measure). Each row represents the

results of one participant. The last row shows the column

averages.

W/o Topic Model Topic Model

PRC RCL F PRC RCL F

0.667 0.4 0.5 0.545 0.6 0.571

1 0.571 0.727 0.75 0.857 0.8

0 0 0 0.833 0.5 0.625

0 0 0 0 0 0

0.714 0.714 0.714 0.722 0.929 0.812

0.529 0.818 0.643 0.5 0.545 0.522

0 0 0 0 0 0

0.455 0.556 0.500 0.429 0.333 0.375

0.5 0.083 0.143 0.5 0.25 0.333

0.733 0.688 0.71 0.786 0.688 0.733

0.46 0.383 0.394 0.507 0.47 0.477

ACKNOWLEDGEMENTS

This research has been funded by the Investitionsbank

Berlin in the project “Voice2Social”, and co-ﬁnanced

by the European Regional Development Fund.

REFERENCES

Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003). Latent

dirichlet allocation. Journal of Machine Learning Re-

search, 3:993–1022.

Boccara, N. (2008). Models of opinion formation: inﬂuence

of opinion leaders. Int. J. Mod. Phys. C, 19(1):93–109.

Cortes, C. and Vapnik, V. (1995). Support-vector networks.

Mach. Learn., 20(3):273–297.

Kleinberg, J. M. (1998). Authoritative Sources in a Hyper-

linked Environment. In Proceedings of the 9th An-

nual ACM-SIAM Symposium on Discrete Algorithms,

pages 668–677. AAAI Press.

Page, L., Brin, S., Motwani, R., and Winograd, T. (1998).

The pagerank citation ranking: Bringing order to the

web. Technical report, Stanford University.

Roch, C. H. (2005). The dual roots of opinion leadership.

Journal of Politics, 67(1):110–131.

Weng, J., Lim, E.-P., Jiang, J., and He, Q. (2010). Twit-

terrank: ﬁnding topic-sensitive inﬂuential twitterers.

In Proceedings of the third ACM international con-

ference on Web search and data mining, WSDM ’10,

pages 261–270, New York, NY, USA. ACM.

ICAART2013-InternationalConferenceonAgentsandArtificialIntelligence

356