MEASURING TWITTER USER SIMILARITY AS A FUNCTION OF STRENGTH OF TIES

John Conroy, Josephine Griffith, Colm O’Riordan

Abstract

Users of online social networks reside in social graphs, where any given user-pair may be connected or unconnected. These connections may be formal or inferred social links; and may be binary or weighted. We might expect that users who are connected by a social tie are more similar in what they write than unconnected users, and that more strongly connected pairs of users are more similar again than less-strongly connected users, but this has never been formally tested. This work describes a method for calculating the similarity between twitter social entities based on what they have written, before examining the similarity between twitter user-pairs as a function of how tightly connected they are. We show that the similarity between pairs of twitter users is indeed positively correlated with the strength of the tie between them.

References

  1. Asur S., Huberman B. A., 2010, Predicting the future with social media, IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology
  2. Backstrom, L., Kumar, R., Marlow, C., Novak, J., 2008. Preferential behaviour in online groups. In Proceedings of the international conference on Web search and web data mining (WSDM) 2008
  3. Barabasi, A. L., Albert, R., 1999, Emergence of scaling in random networks, Science 286, pp 509-512
  4. Bush, V., 1939. Mechanization and the record, Vannevar Bush Papers, Library of Congress [U.S.A.] Box 138, speech article book file
  5. Bush, V., Wang, J., 1945, Atlantic Monthly 176 pp101- 108
  6. Conroy, J., Griffith, J. 2010 Machine learning techniques for sentiment analysis of Super Bowl commercials, The 21st National Conference on Artificial Intelligence and Cognitive Science (AICS), NUI Galway, Ireland
  7. Cummins, R., O'Riordan, C., 2007 An axiomatic comparison of learned term-weighting schemes in information retrieval: clarifications and extensions, Artificial Intelligence Review 28
  8. de Chowdury, M., Lin, Y. R., Sundaram, H., Candan, K. S., Lexing, X., Kelliher, A.. 2010. How does the data sampling strategy impact the discovery of information diffusion in social media. Fourth International AAAI Conference on Weblogs and Social Media.
  9. Dong, A., Zhang, R., Kolari, P., Bai, J., Diaz, F., Chang, Y., Time is of the essence: improving recency ranking using Twitter data. In Proceedings of WWW 7810 Proceedings of the 19th international conference on World wide web ACM New York
  10. Granovetter, J. M. 1973. The strength of weak ties. American Journal of Sociology 78(6)
  11. Huberman B. A., Romero, D. M. Wu, F., 2009 Social networks that matter: twitter under the microscope, First Monday 14
  12. Kumar, R., Mahdin, M., McGlohan, 2011, Dynamics of Conversations, ACM Special Interest Group on Knowledge Discovery and Data Mining (KDD10). Washington DC
  13. Liu, Y-Y., Slotine, J-J., Barabasi, A. L. 2011, Controllability of complex networks, Nature, Volume 473 Number 7346
  14. Livnel, A., Simmons, M. P., Adarl, E., Adamic, L.A., 2011, The Party is Over Here: Structure and Content in the 2010 Election, ICWSM 2011
  15. Luhn, H. P. 1957. A statistical approach to the mechanized encoding and searching of literary information, IBM Journal of Research and Development 1:4, 309-317
  16. Magnani, M., Montesi, D., Nunziante, G., Rossi, L., 2011, Conversation retrieval from Twitter, Lecture Notes in Computer Science Volume 6611/2011, 780- 783
  17. Milgram, S., 1967. The small world problem. Psychology Today 2:60-67
  18. Newman M. E. J, 2003, The structure and function of complex networks. SIAM Review 45, pp 167-256
  19. Raghavan, P., Schütze, H. 2008. Introduction to Information Retrieval, Cambridge University Press pp 117-120, 121-124
  20. Ritter, A. Cherry, C., Dolan, B., 2010, Unsupervised modeling of twitter conversations, HLT 7810: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
  21. Romero, D. M., Meeder, B., Kleinberg, J., 2011. Differences in the mechanics of information diffusion across topics: idioms, political hashtags and complex contagion on twitter. In Proceedings of the 20th Intl. Conference on World Wide Web WWW 2011
  22. Salton, G., Wong, A., Yang, C. S. 1997 A vector space model for automatic indexing. Readings in information retrieval. Morgan Kaufman publishers.
  23. Salton, G., 1991. Developments in automatic text retrieval. Science 253 pp 974-980
  24. Singhal, A., 2001, Modern information retrieval: a brief overview, Bulletin of the IEEE computer society technical committee on data engineering
  25. Soucy, P. 2005. Beyond TFIDF weighting for text categorization in the vector space model. In Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJACAI 2005)
  26. Watts, D. J., Strogatz, S.H., 1998, Collective dynamics of'small-world' networks. Nature Volume 393, pp 330-442
  27. Wilson, C., Boe, B., Sala, A., Puttaswamy, P. N., Zhao, B., 2009, User interactions in social networks and their implications, ACM EuroSys
  28. Zheng, Z. 2010. Time is of the essence: improving recency ranking using Twitter data. In Proceedings of WWW 7810 Proceedings of the 19th international conference on World wide web ACM New York
  29. Zipf, G. K. 1932. Selected studies of the principle of relative frequency in language. Harvard University Press.
Download


Paper Citation


in Harvard Style

Conroy J., Griffith J. and O’Riordan C. (2011). MEASURING TWITTER USER SIMILARITY AS A FUNCTION OF STRENGTH OF TIES . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2011) ISBN 978-989-8425-79-9, pages 254-262. DOI: 10.5220/0003661902620270


in Bibtex Style

@conference{kdir11,
author={John Conroy and Josephine Griffith and Colm O’Riordan},
title={MEASURING TWITTER USER SIMILARITY AS A FUNCTION OF STRENGTH OF TIES},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2011)},
year={2011},
pages={254-262},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003661902620270},
isbn={978-989-8425-79-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2011)
TI - MEASURING TWITTER USER SIMILARITY AS A FUNCTION OF STRENGTH OF TIES
SN - 978-989-8425-79-9
AU - Conroy J.
AU - Griffith J.
AU - O’Riordan C.
PY - 2011
SP - 254
EP - 262
DO - 10.5220/0003661902620270