Streaming Networks Sampling using top-K Networks

Rui Sarmento, Mário Cordeiro, João Gama

Abstract

The combination of top-K network representation of the data stream with community detection is a novel approach to streaming networks sampling. Keeping an always up-to-date sample of the full network, the advantage of this method, compared to previous, is that it preserves larger communities and original network distribution. Empirically, it will also be shown that these techniques, in conjunction with community detection, provide effective ways to perform sampling and analysis of large scale streaming networks with power law distributions.

References

  1. Ahmed, N. K., Duffield, N., Neville, J., and Kompella, R. (2014). Graph sample and hold: A framework for big-graph analytics. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 7814, pages 1446-1455, New York, NY, USA. ACM.
  2. Ahmed, N. K., Neville, J., and Kompella, R. R. (2012). Space-efficient sampling from social activity streams.
  3. In Fan, W., Bifet, A., 0001, Q. Y., and Yu, P. S., editors, BigMine, pages 53-60. ACM.
  4. Barabasi, A.-L. (2005). The origin of bursts and heavy tails in human dynamics. Nature, (435):207-211.
  5. Blondel, V. D., Guillaume, J.-L., Lambiotte, R., and Lefebvre, E. (2008). Fast unfolding of communities in large networks. arxiv.org. Paper which discusses the theory behind the BPLL/Louvain community detection algorithm.
  6. Charikar, M., Chen, K., and Farach-Colton, M. (2002). Finding frequent items in data streams. In Proceedings of the 29th International Colloquium on Automata, Languages and Programming, ICALP 7802, pages 693-703, London, UK, UK. Springer-Verlag.
  7. Cormode, G. and Muthukrishnan, S. (2005). What's hot and what's not: tracking most frequent items dynamically. ACM Trans. Database Syst., 30(1):249-278.
  8. Demaine, E. D., Ló pez-Ortiz, A., and Munro, J. I. (2002). Frequency estimation of internet packet streams with limited space. In Algorithms-ESA 2002, pages 348- 360. Springer.
  9. Gama, J. (2010). Knowledge Discovery from Data Streams. Chapman & Hall/CRC, 1st edition.
  10. Gillespie, C. S. (2014). Fitting heavy tailed distributions: the poweRlaw package. R package version 0.20.5.
  11. Goodman, L. A. (1961). Snowball Sampling. The Annals of Mathematical Statistics, 32(1).
  12. Granovetter, M. (1976). Network sampling: Some first steps. American Journal of Sociology, 81:1267-1303.
  13. H übler, C., Kriegel, H.-P., Borgwardt, K. M., and Ghahramani, Z. (2008). Metropolis algorithms for representative subgraph sampling. In ICDM, pages 283-292. IEEE Computer Society.
  14. Hu, P. and Lau, W. C. (2013). A survey and taxonomy of graph sampling. CoRR, abs/1308.5865.
  15. Leskovec, J. and Faloutsos, C. (2006). Sampling from large graphs. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 7806, pages 631-636, New York, NY, USA. ACM.
  16. Manku, G. S. and Motwani, R. (2002). Approximate frequency counts over data streams. In Proceedings of the 28th International Conference on Very Large Data Bases.
  17. Metwally, A., Agrawal, D., and El Abbadi, A. (2005). Efficient computation of frequent and top-k elements in data streams. In Proceedings of the 10th International Conference on Database Theory, ICDT'05, pages 398-412, Berlin, Heidelberg. Springer-Verlag.
  18. Papagelis, M., Das, G., and Koudas, N. (2013). Sampling online social networks. IEEE Transactions on Knowledge and Data Engineering, 25(3):662-676.
Download


Paper Citation


in Harvard Style

Sarmento R., Cordeiro M. and Gama J. (2015). Streaming Networks Sampling using top-K Networks . In Proceedings of the 17th International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 978-989-758-096-3, pages 228-234. DOI: 10.5220/0005341402280234


in Bibtex Style

@conference{iceis15,
author={Rui Sarmento and Mário Cordeiro and João Gama},
title={Streaming Networks Sampling using top-K Networks},
booktitle={Proceedings of the 17th International Conference on Enterprise Information Systems - Volume 1: ICEIS,},
year={2015},
pages={228-234},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005341402280234},
isbn={978-989-758-096-3},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 17th International Conference on Enterprise Information Systems - Volume 1: ICEIS,
TI - Streaming Networks Sampling using top-K Networks
SN - 978-989-758-096-3
AU - Sarmento R.
AU - Cordeiro M.
AU - Gama J.
PY - 2015
SP - 228
EP - 234
DO - 10.5220/0005341402280234