over the 135 days of the available data. We also ob-
served that there was some decreasing of matching el-
ements when consecutive days represented transition
from workdays to weekend days or vice versa. This
is expected since the behaviour of major intervenients
in the network favour higher activity in working days.
5 CONCLUSIONS
The top-k application is a suitable approach to our
data that presents a power law distribution. This en-
ables the focus on the influential individuals and dis-
cards isolated connections. The use of Space-Saving
algorithm to sample top-K elements in a network is
able to keep the original network’s power law fea-
tures. The Louvain Method enables the generation of
representative communities with the most active ele-
ments in the network. This method for evolving net-
works sampling enables the use of a common com-
modity computer for massive network analysis. Fu-
ture work will use Ahmed et al. method and compare
it with our method for community detection. We also
have the objective of testing the method with real-time
data streaming systems.
ACKNOWLEDGMENTS
This work was supported by Sibila and Smart-
grids research projects (NORTE-07-0124-FEDER-
000056/59), financed by North Portugal Regional Op-
erational Programme (ON.2 O Novo Norte), under
the National Strategic Reference Framework (NSRF),
through the Development Fund (ERDF), and by na-
tional funds, through Fundac¸
˜
ao para a Ci
ˆ
encia e a Tec-
nologia (FCT), and by European Commission through
the project MAESTRA (Grant number ICT-2013-
612944); The financial support given by the project
number 18450 through the ”SI I&DT Individual” pro-
gram by QREN and delivered to WeDo Business As-
surance.
REFERENCES
Ahmed, N. K., Duffield, N., Neville, J., and Kompella,
R. (2014). Graph sample and hold: A framework
for big-graph analytics. In Proceedings of the 20th
ACM SIGKDD International Conference on Knowl-
edge Discovery and Data Mining, KDD ’14, pages
1446–1455, New York, NY, USA. ACM.
Ahmed, N. K., Neville, J., and Kompella, R. R. (2012).
Space-efficient sampling from social activity streams.
In Fan, W., Bifet, A., 0001, Q. Y., and Yu, P. S., edi-
tors, BigMine, pages 53–60. ACM.
Barabasi, A.-L. (2005). The origin of bursts and heavy tails
in human dynamics. Nature, (435):207–211.
Blondel, V. D., Guillaume, J.-L., Lambiotte, R., and Lefeb-
vre, E. (2008). Fast unfolding of communities in large
networks. arxiv.org. Paper which discusses the theory
behind the BPLL/Louvain community detection algo-
rithm.
Charikar, M., Chen, K., and Farach-Colton, M. (2002).
Finding frequent items in data streams. In Proceed-
ings of the 29th International Colloquium on Au-
tomata, Languages and Programming, ICALP ’02,
pages 693–703, London, UK, UK. Springer-Verlag.
Cormode, G. and Muthukrishnan, S. (2005). What’s hot
and what’s not: Tracking most frequent items dynam-
ically. ACM Trans. Database Syst., 30(1):249–278.
Demaine, E. D., L
´
opez-Ortiz, A., and Munro, J. I. (2002).
Frequency estimation of internet packet streams with
limited space. In Algorithms-ESA 2002, pages 348–
360. Springer.
Gama, J. (2010). Knowledge Discovery from Data Streams.
Chapman & Hall/CRC, 1st edition.
Gillespie, C. S. (2014). Fitting heavy tailed distributions:
the poweRlaw package. R package version 0.20.5.
Goodman, L. A. (1961). Snowball Sampling. The Annals
of Mathematical Statistics, 32(1).
Granovetter, M. (1976). Network sampling: Some first
steps. American Journal of Sociology, 81:1267–1303.
H
¨
ubler, C., Kriegel, H.-P., Borgwardt, K. M., and Ghahra-
mani, Z. (2008). Metropolis algorithms for represen-
tative subgraph sampling. In ICDM, pages 283–292.
IEEE Computer Society.
Hu, P. and Lau, W. C. (2013). A survey and taxonomy of
graph sampling. CoRR, abs/1308.5865.
Leskovec, J. and Faloutsos, C. (2006). Sampling from large
graphs. In Proceedings of the 12th ACM SIGKDD In-
ternational Conference on Knowledge Discovery and
Data Mining, KDD ’06, pages 631–636, New York,
NY, USA. ACM.
Manku, G. S. and Motwani, R. (2002). Approximate fre-
quency counts over data streams. In Proceedings of
the 28th International Conference on Very Large Data
Bases.
Metwally, A., Agrawal, D., and El Abbadi, A. (2005). Ef-
ficient computation of frequent and top-k elements
in data streams. In Proceedings of the 10th Inter-
national Conference on Database Theory, ICDT’05,
pages 398–412, Berlin, Heidelberg. Springer-Verlag.
Papagelis, M., Das, G., and Koudas, N. (2013). Sampling
online social networks. IEEE Transactions on Knowl-
edge and Data Engineering, 25(3):662–676.
ICEIS2015-17thInternationalConferenceonEnterpriseInformationSystems
234