# Graph Community Discovery Algorithms in Neo4j with a Regularization-based Evaluation Metric

### Andreas Kanavos, Georgios Drakopoulos, Athanasios Tsakalidis

#### Abstract

Community discovery is central to social network analysis as it provides a natural way for decomposing a social graph to smaller ones based on the interactions among individuals. Communities do not need to be disjoint and often exhibit recursive structure. The latter has been established as a distinctive characteristic of large social graphs, indicating a modularity in the way humans build societies. This paper presents the implementation of four established community discovery algorithms in the form of Neo4j higher order analytics with the Twitter4j Java API and their application to two real Twitter graphs with diverse structural properties. In order to evaluate the results obtained from each algorithm a regularization-like metric, balancing the global and local graph self-similarity akin to the way it is done in signal processing, is proposed.

#### References

- Adams, R. A. and Fournier, J. J. (2003). Sobolev spaces, volume 140. Academic press.
- Agichtein, E., Castillo, C., Donato, D., Gionis, A., and Mishne, D. (2008). Finding high-quality content in social media. In Web Search and Data Mining conference (WSDM), pages 183-194. ACM.
- Attouch, H. and Azé, D. (1993). Approximation and regularization of arbitrary functions in Hilbert spaces by the Lasry-Lions method. In Annales de l'IHP Analyse non linéaire, volume 10, pages 289-312.
- Benzi, M. and Boito, P. (2010). Quadrature rule-based bounds for functions of adjacency matrices. Linear Algebra and its Applications, 433(3):637-652.
- Blondel, V. D., Guillaume, J.-L., Lambiotte, R., and Lefebvre, E. (2008). Fast unfolding of community hierarchies in large networks. Journal of Statistical Mechanics: Theory and Experiment, P1000.
- Brin, S. and Page, L. (1998). The PageRank citation ranking: Bringing order to the web. Stanford Digital Library.
- Carrington, P. J., Scott, J., and Wasserman, S. (2005). Models and methods in social network analysis. Cambridge University Press.
- Drakopoulos, G., Baroutiadi, A., and Megalooikonomou, V. (2015a). Higher order graph centrality measures for Neo4j. In Conference of Information, Intelligence, Systems, and Applications (IISA).
- Drakopoulos, G., Kanavos, A., Makris, C., and Megalooikonomou, V. (2015b). On converting community detection algorithms for fuzzy graphs in Neo4j. In International Workshop on Combinations of Intelligent Methods and Applications, CIMA 2015.
- Drakopoulos, G., Kanavos, A., Makris, C., and Megalooikonomou, V. (2016). Comparing algorithmic principles for fuzzy graph communities over Neo4j. In Advances in Combining Intelligent Methods, pages 47- 73.
- Drakopoulos, G. and Megalooikonomou, V. (2016). Regularizing large biosignals with finite differences. In International Conference of Information, Intelligence, Systems, and Applications (IISA).
- Fortunato, S. (2010). Community detection in graphs. Physics Reports, 486:75-174.
- Girosi, F., Jones, M., and Poggio, T. (1995). Regularization theory and neural networks architectures. Neural computation, 7(2):219-269.
- Girvan, M. and Newman, M. (2002). Community structure in social and biological networks. Proceedings of the National Academy of Sciences, 99(2):7821-7826.
- Johansen, T. A. (1997). On Tikhonov regularization, bias and variance in nonlinear system identification. Automatica, 33(3):441-446.
- Jurczyk, P. and Agichtein, E. (2007). Discovering authorities in question answer communities by using link analysis. In Conference of Information and Knowledge Management (CIKM), pages 919-922.
- Kafeza, E., Kanavos, A., Makris, C., and Vikatos, P. (2014). T-PICE: Twitter personality based influential communities extraction system. In IEEE International Congress on Big Data, pages 212-219.
- Kanavos, A., Perikos, I., Vikatos, P., Hatzilygeroudis, I., Makris, C., and Tsakalidis, A. (2014). Conversation emotional modeling in social networks. In International Conference on Tools with Artificial Intelligence (ICTAI), pages 478-484.
- Kernighan, B. and Lin, S. (1970). An efficient heuristic procedure for partitioning graphs. The Bell System Technical Journal, 49(1):291-307.
- Kleinberg, J. M. (1998). Authoritative sources in a hyperlinked environment. In Symposium of Discrete Algorithms (SODA), pages 668-677.
- Langville, A. and Meyer, C. (2006). Google's PageRank and Beyond: The Science of Search Engine Rankings. Princeton University Press.
- Newman, M. E. (2004a). Detecting community structure in networks. The European Physical Journal BCondensed Matter and Complex Systems, 38(2):321- 330.
- Newman, M. E. (2004b). Fast algorithm for detecting community structure in networks. Physical Review E, 69(6).
- Newman, M. E. (2010). Networks: An Introduction. Oxford University Press.
- Pal, A. and Counts, S. (2011). Identifying topical authorities in microblogs. In Web Search and Data Mining (WSDM), pages 45-54.
- Panzarino, O. (2014). Learning Cypher. PACKT publishing.
- Pons, P. and Latapy, M. (2005). Computing communities in large networks using random walks.
- Robinson, I., Webber, J., and Eifrem, E. (2013). Graph Databases. O'Reilly.
- Scott, J. (2000). Social Network Analysis: A Handbook. SAGEPublications Ltd.
- Shi, J. and Malik, J. (2000). Normalized Cuts and Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8):888-905.
- Vogel, C. R. (2002). Computational Methods for Inverse Problems. SIAM.
- Weng, J., Lim, E.-P., Lim, J., and Jiang, Q. H. (2010). Twitterrank: Finding topic-sensitive influential twitterers. In Web Search and Data Mining (WSDM), pages 261- 270.

#### Paper Citation

#### in Harvard Style

Kanavos A., Drakopoulos G. and Tsakalidis A. (2017). **Graph Community Discovery Algorithms in Neo4j with a Regularization-based Evaluation Metric** . In *Proceedings of the 13th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,* ISBN 978-989-758-246-2, pages 403-410. DOI: 10.5220/0006382104030410

#### in Bibtex Style

@conference{webist17,

author={Andreas Kanavos and Georgios Drakopoulos and Athanasios Tsakalidis},

title={Graph Community Discovery Algorithms in Neo4j with a Regularization-based Evaluation Metric},

booktitle={Proceedings of the 13th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,},

year={2017},

pages={403-410},

publisher={SciTePress},

organization={INSTICC},

doi={10.5220/0006382104030410},

isbn={978-989-758-246-2},

}

#### in EndNote Style

TY - CONF

JO - Proceedings of the 13th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,

TI - Graph Community Discovery Algorithms in Neo4j with a Regularization-based Evaluation Metric

SN - 978-989-758-246-2

AU - Kanavos A.

AU - Drakopoulos G.

AU - Tsakalidis A.

PY - 2017

SP - 403

EP - 410

DO - 10.5220/0006382104030410