5 CONCLUSIONS
We offer a method of measuring similarity between
twitter users based on what they have written,
fitting their aggregated posts to tf-idf weighted
vectors and comparing them in vector space.
We use this data to measure similarity
between users as a function of the connectedness
between them. We find that users who are
connected in either the formal graph or the graph
derived from conversations are more similar than
unlinked users. We furthermore find that users who
conversed with each other are more similar than
users who are linked in the formal
“follower/following” social graph. We consider
connections in the sparser conversation graph to be
more meaningful and to represent stronger social
ties than formal links, and these results all indicate
a positive correlation between social connectedness
in a social graph, and similarity in terms of what
one posts.
Taking this analysis further, we use the
natural weighting of the conversation graph to
analyse user similarity as a function of how
strongly connected they are. The conversation
graph is weighted, in that users in it converse with
each other one or many times. We find that the
similarity of twitter users correlates well with the
number of conversation actions between them, up
to a tipping point of around 15 conversations,
whereafter the similarity between users begins to
decline (though the sparseness of data for users who
conversed more than 15 times may account for this
abberation).
The document-analysis approach we used to
investigate user similarity, borrowed from the field
of information retrieval, holds promise as a method
of comparing the relative efficacy of graph-based
algorithms in common social network analysis
fields, such as community detection and
recommendation systems.
REFERENCES
Asur S., Huberman B. A., 2010, Predicting the future
with social media, IEEE/WIC/ACM International
Conference on Web Intelligence and Intelligent Agent
Technology
Backstrom, L., Kumar, R., Marlow, C., Novak, J., 2008.
Preferential behaviour in online groups. In
Proceedings of the international conference on Web
search and web data mining (WSDM) 2008
Barabasi, A. L., Albert, R., 1999, Emergence of scaling in
random networks, Science 286, pp 509-512
Bush, V., 1939. Mechanization and the record, Vannevar
Bush Papers, Library of Congress [U.S.A.] Box 138,
speech article book file
Bush, V., Wang, J., 1945, Atlantic Monthly 176 pp101-
108
Conroy, J., Griffith, J. 2010 Machine learning techniques
for sentiment analysis of Super Bowl commercials,
The 21st National Conference on Artificial
Intelligence and Cognitive Science (AICS), NUI
Galway, Ireland
Cummins, R., O'Riordan, C., 2007 An axiomatic
comparison of learned term-weighting schemes in
information retrieval: clarifications and extensions,
Artificial Intelligence Review 28
de Chowdury, M., Lin, Y. R., Sundaram, H., Candan, K.
S., Lexing, X., Kelliher, A.. 2010. How does the data
sampling strategy impact the discovery of information
diffusion in social media. Fourth International AAAI
Conference on Weblogs and Social Media.
Dong, A., Zhang, R., Kolari, P., Bai, J., Diaz, F., Chang,
Y., Time is of the essence: improving recency ranking
using Twitter data. In Proceedings of WWW '10
Proceedings of the 19th international conference on
World wide web ACM New York
Granovetter, J. M. 1973. The strength of weak ties.
American Journal of Sociology 78(6)
Huberman B. A., Romero, D. M. Wu, F., 2009 Social
networks that matter: twitter under the microscope,
First Monday 14
Kumar, R., Mahdin, M., McGlohan, 2011, Dynamics of
Conversations, ACM Special Interest Group on
Knowledge Discovery and Data Mining (KDD10).
Washington DC
Liu, Y-Y., Slotine, J-J., Barabasi, A. L. 2011,
Controllability of complex networks, Nature, Volume
473 Number 7346
Livnel, A., Simmons, M. P., Adarl, E., Adamic, L.A.,
2011, The Party is Over Here: Structure and Content
in the 2010 Election, ICWSM 2011
Luhn, H. P. 1957. A statistical approach to the
mechanized encoding and searching of literary
information, IBM Journal of Research and
Development 1:4, 309-317
Magnani, M., Montesi, D., Nunziante, G., Rossi, L.,
2011, Conversation retrieval from Twitter, Lecture
Notes in Computer Science Volume 6611/2011, 780-
783
Milgram, S., 1967. The small world problem. Psychology
Today 2:60-67
Newman M. E. J, 2003, The structure and function of
complex networks. SIAM Review 45, pp 167-256
Raghavan, P., Schütze, H. 2008. Introduction to
Information Retrieval, Cambridge University Press
pp 117-120, 121-124
Ritter, A. Cherry, C., Dolan, B., 2010, Unsupervised
modeling of twitter conversations, HLT '10: Human
Language Technologies: The 2010 Annual
Conference of the North American Chapter of the
Association for Computational Linguistics
MEASURING TWITTER USER SIMILARITY AS A FUNCTION OF STRENGTH OF TIES
269