KANGAROO: A DISTRIBUTED SYSTEM FOR SNA - Social Network Analysis in Huge-Scale Networks

Wu Bin, Dong Yuxiao, Qin Lei, Ke Qing, Wang Bai

Abstract

Social network analysis is the mapping and measuring of relationships and flows between people, groups, computers and other information or knowledge entities. The continued exponential growth in the scale of social networks is giving birth to a new challenge to social network analysis. The scale of these graphs, in some cases, is millions of nodes and billions of edges. In this paper, we present a distributed system, KANGAROO, for huge scale social network based on two main computing models which are for finding common neighbour and maximal clique. KANGAROO is implemented on the top of the Hadoop platform, the open source version of MapReduce. This system implements most algorithms of social network analysis, including basic statistics, community detection, link prediction and network evolution etc. based on the MapReduce computing framework. More than anything else, KANGAROO is applied to a real-world huge scale social network. The application scenarios, including degree distribution, linear projection algorithm for community detection and community visualization of presentation layer, demonstrate KANGAROO is efficient, scalable and effective.

References

  1. U. Kang. Charalampos E. Tsourakakis, Christos Faloutsos. 2009. PEGASUS: A Peta-Scale Graph Ming System - Implementation and Observations. In ICDM2009, Ninth IEEE International Conference on Data Mining.
  2. Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, Grzegorz Czajkowski. 2010. Pregel: A System for Large-Scale Graph Processing. In SIGMOD2010, ACM SIGMOD International Conference on Management of Data.
  3. Shengqi Yang, Bai Wang, Haizhou Zhao and Bin Wu. 2009. Efficient Dense Structure Mining using MapReduce. In ICDM2009, Ninth IEEE International Conference on Data Mining workshop on Large-scale Data Mining.
  4. A. L. Barabasi and R. Albert. 1999. Emergence of scaling in random networks. In Science, 286(5439):509-512.
  5. D. J. Watts and S.H. Strogatz. 1998. Collective dynamics of small-world networks. In Nature, 393(6684):440-442.
  6. Linyuan Lv, Tao Zhou. 2010. Link Prediction in Complex Networks: A Survey. In arXiv:1010.0725v1 [Physics and Society (physics.soc-ph)] 4 Oct 2010.
  7. Shengqi Yang, Bai Wang, Haizhou Zhao, Yuan Gao, Bin Wu. 2009. DisTec: Towards a Distributed System for Telecom computing. In International Conference on Cloud Computing 2009.
  8. Bin Wu, Shengqi Yang, Haizhou Zhao, Yuan Gao and Lijun Suo. 2009. CosDic: towards a Comprehensive System for Knowledge Discovery in Large-scale data. In The 2009 IEEE/WIC/ACM International Conference on Web Intelligence 2009.
  9. J. Dean and S. Ghemawat. 2004. Mapreduce: Simplified data processing on large clusters. In OSDI 2004
  10. L. da F. Costa, F. A. Rodrigues, G. Travieso, P. R. Villas Boas. 2005. Characterization of Complex Networks: A Survey of measurements. In Condensed Matter/0505185
  11. P. J. Flory. 1941. Molecular size distribution in three-dimensional polymers. i. gelation. In Journal of the American Chemical Society, 63:3083-3090
  12. A. Rapoport. 1953. Contribution to the theory of random and biased nets. In Bulletin of Mathematical Biophysics, 19:257-277, 1957.
  13. P. Erdos and A.Renyi. 1961. On the strength of connectedness of a random graph. In Acta Mathematica Scientia Hungary, 12:261-267, 1961.
  14. Valdis Kredbs 2004. Valdis Krebs' website for Inflow, a software-based SNA tool. In http://www.orgnet.com/sna.html
  15. XiaoPing Liao, Wei Ren, Guiying Yan. 2009. A Linear Projection Approach for Resolving Community Structure. In The Third International Symposium on Optimization and Systems Biology 2009.
Download


Paper Citation


in Harvard Style

Bin W., Yuxiao D., Lei Q., Qing K. and Bai W. (2011). KANGAROO: A DISTRIBUTED SYSTEM FOR SNA - Social Network Analysis in Huge-Scale Networks . In Proceedings of the 1st International Conference on Cloud Computing and Services Science - Volume 1: CLOSER, ISBN 978-989-8425-52-2, pages 404-409. DOI: 10.5220/0003387304040409


in Bibtex Style

@conference{closer11,
author={Wu Bin and Dong Yuxiao and Qin Lei and Ke Qing and Wang Bai},
title={KANGAROO: A DISTRIBUTED SYSTEM FOR SNA - Social Network Analysis in Huge-Scale Networks},
booktitle={Proceedings of the 1st International Conference on Cloud Computing and Services Science - Volume 1: CLOSER,},
year={2011},
pages={404-409},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003387304040409},
isbn={978-989-8425-52-2},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 1st International Conference on Cloud Computing and Services Science - Volume 1: CLOSER,
TI - KANGAROO: A DISTRIBUTED SYSTEM FOR SNA - Social Network Analysis in Huge-Scale Networks
SN - 978-989-8425-52-2
AU - Bin W.
AU - Yuxiao D.
AU - Lei Q.
AU - Qing K.
AU - Bai W.
PY - 2011
SP - 404
EP - 409
DO - 10.5220/0003387304040409