SPACED: A Novel Deep Learning Method for Community Detection in

Social Networks

Mohammed Tirichine

1 a

, Nassim Ameur

1 b

, Younes Boukacem

1 c

, Hatem M. Abdelmoumen

1 d

Hodhaifa Benouaklil

1 e

, Samy Ghebache

, Boualem Hamroune

, Malika Bessedik

1,2 f

Fatima Benbouzid-Si Tayeb

1,2 g

and Riyadh Baghdadi

3 h

Ecole Nationale Sup

erieure d’Informatique (ESI), BP 68M - 16270 Oued Smar, Algiers, Algeria

Laboratoire des M

ethodes de Conception de Syst

emes (LMCS), Algeria

New York University Abu Dhabi, Abu Dhabi, U.A.E.

{km tirichine, kn ameur, ky boukacem, kh abdelmoumen, kh benouaklil, ks ghebache, kb hamroune, m bessedik,

Keywords:

Social Network, Community Detection, Node Embedding, Community Embedding, Deep Learning.

Abstract:

Community detection is a landmark problem in social network analysis. To address this challenge, we pro-

pose SPACED: Spaced Positional Autoencoder for Community Embedding Detection, a deep learning-based

approach designed to effectively tackle the complexities of community detection in social networks. SPACED

generates neighborhood-aware embeddings of network nodes using an autoencoder architecture. These em-

beddings are then reﬁned through a mixed learning strategy with generated community centers, making them

more community-aware. This approach helps unravel network communities through an appropriate cluster-

ing strategy. Experimental evaluations across synthetic and real-world networks, as well as comparisons with

state-of-the-art methods, demonstrate the high competitiveness and often superiority of SPACED for commu-

nity detection while maintaining reasonable time complexities.

1 INTRODUCTION

Social networks, connecting vast numbers of indi-

viduals and entities, have become pivotal in shaping

communication, information dissemination, and so-

cial interactions, making the study of their structures

and dynamics increasingly crucial. One of the most

important social structures that characterizes these

networks is their community structure. Although a

formal deﬁnition of a community in network analy-

sis does not exist, a widely accepted deﬁnition de-

scribes a community as a subset of social agents, typ-

ically represented as graph nodes, between which in-

teractions, typically represented as graph edges, oc-

cur more ”densely” than with the rest of the net-

https://orcid.org/0009-0003-9205-6158

https://orcid.org/0009-0009-1120-6286

https://orcid.org/0009-0001-5896-3227

https://orcid.org/0009-0006-1459-2723

https://orcid.org/0009-0002-1239-2606

https://orcid.org/0000-0002-1007-9096

https://orcid.org/0000-0001-7032-8544

https://orcid.org/0000-0002-9350-3998

work. This dense intra-community connectivity in-

dicates that members within the community are more

closely related or interact more frequently with each

other than with nodes outside the community. As a re-

sult, such clusters naturally form identiﬁable groups,

known as communities, within the network structure

(Figure 1). It immediately appears from this deﬁni-

tion that discovering communities inside a network,

which is commonly referred to as the community de-

tection problem, has important applications such as

targeted advertising, functional group identiﬁcation,

or even terrorist threat prevention (Karatas¸ and S¸ahin,

2018).

The last decades have witnessed an increase in the

use of machine learning (ML) and deep learning (DL)

based methods to tackle this problem due to their abil-

ity to handle efﬁciently high dimensional data spaces

such as graphs and uncover intricate patterns on them

such as community structures. One of the mainstream

approaches in the class of DL methods is to embed the

graph nodes into a vectorial space in such a way as to

reﬂect their community proximities, i.e. nodes more

likely to be in the same community according to the

Tirichine, M., Ameur, N., Boukacem, Y., Abdelmoumen, H. M., Benouaklil, H., Ghebache, S., Hamroune, B., Bessedik, M., Tayeb, F. B.-S. and Baghdadi, R.

SPACED: A Novel Deep Learning Method for Community Detection in Social Networks.

DOI: 10.5220/0013070100003825

In Proceedings of the 20th International Conference on Web Information Systems and Technologies (WEBIST 2024), pages 141-152

ISBN: 978-989-758-718-4; ISSN: 2184-3252

141

Figure 1: An undirected graph representation of a social

network used for community detection.

network data are associated with nearby embeddings

in the vectorial space, thus transforming the commu-

nity detection problem on the network into a cluster-

ing problem on the vectorial space representation.

Aligning with this class of methods, in this pa-

per, we propose SPACED, a model that aims at gen-

erating meaningful node embeddings and community

centers that uncover the community structure of so-

cial networks based solely on their topologies, i.e. the

undirected connections between the network agents.

It performs this by generating initial encodings of

the nodes based on their neighbourhoods, embedding

them using an auto-encoder architecture and generat-

ing initial community centre embeddings which will

help improve the node embeddings.

The rest of this paper is organized as follows. Sec-

tion 2 presents the related works outlining the differ-

ent sources that SPACED inspires from. Then, section

3 details the proposed solution. Section 4 presents the

obtained results on the selected benchmarks and their

signiﬁcance. Lastly, section 5 concludes the paper,

summarizing our ﬁndings and suggesting directions

for future research.

2 RELATED WORKS

Various sequence-based node embedding techniques

such as Line (Tang et al., 2015), Node2Vec (Grover

and Leskovec, 2016) or DeepWalk (Perozzi et al.,

2014) base their construction of node embeddings on

the ”proximity” of the nodes inferred by their co-

occurrence on various sampling methods on the net-

work such as truncated random walks, second-order

random walks etc., reﬂecting on the idea that nodes

of the same community are more likely to appear

simultaneously in small sub-regions of the network.

The embeddings generated by this class of techniques

have been applied for community detection with com-

petitive results as shown in (Tandon et al., 2021).

Also, the SkipGram model (Mikolov et al., 2013)

which inspired some of the methods in this class have

in the same way motivated parts of SPACED archite-

cure.

Using auto-encoders in community detection has

gained signiﬁcant attention recently. Reference(Xie

et al., 2019) introduced CDDTA, which employs

a deep auto-encoder for nonlinear feature extrac-

tion in networks and incorporates unsupervised trans-

fer learning to reﬁne representations. Reference

(Tian et al., 2014) proposed the GraphEncoder

method, which uses graph neural networks alongside

auto-encoders to encode structural information into

low-dimensional embeddings, preserving topological

properties and enhancing clustering accuracy in large-

scale networks. Reference (Bhatia and Rani, 2019)

proposed DeCom, which integrates modularity-based

community detection with ensemble clustering, com-

bining multiple clustering results into a consensus so-

lution to enhance robustness and stability in commu-

nity detection.

For the works that consider performing node em-

bedding in conjunction with community embedding

with the goal to create better community aware node

embeddings, (Rozemberczki et al., 2019) proposed

GEMSEC, which aims to mutually enhance both node

and community embedding processes. GEMSEC em-

ploys sequence-based node embedding techniques,

where nodes occurring closely in random walks are

embedded close to each other. The optimization ob-

jective combines the negative log-likelihood of ob-

served neighbourhood samples with a clustering cost

and is solved using a variant of mini-batch gradient

descent. Another line of work in this category are

comE (Cavallari et al., 2017) and its improved ver-

sion comeE+ (Cavallari et al., 2019) which automat-

ically detects the number of communities. They con-

sider that community embeddings should reﬂect the

member nodes distribution in the feature space and

thus propose embedding the communities as probabil-

ity distributions by ﬁtting them to a Gaussian mixture

model. The results obtained by comE/comE+ proved

the beneﬁt of considering community-aware proxim-

ities in the embeddings construction process.

WEBIST 2024 - 20th International Conference on Web Information Systems and Technologies

142

Figure 2: SPACED general architecture.

3 PROPOSED SOLVING

APPROACH

A social network can be modelled by a graph G =

(V,E) where V is the set of nodes representing in-

dividuals and E is the set of edges indicating inter-

actions between individuals. The goal is to ﬁnd the

community structure within this graph which is a par-

tition C = {C

,. . . ,C

} of K communities on the

node set V .

As stated in the introduction, our method attempts

to solve this problem by ﬁnding a node embedding

that better represents the community structure. A

node embedding is a mapping f : V → R

of nodes to

an embedding space of dimension e. A successful em-

bedding can then turn the problem into a straightfor-

ward point cloud clustering. The SPACED approach

follows a four-phase pipeline to achieve this as shown

in ﬁgure 2:

• Phase 1. Graph Processing to generate initial

high-dimensional encodings for graph nodes.

• Phase 2. Node embedding to obtain an initial

low-dimensional node embedding using an au-

toencoder architecture.

• Phase 3. Community centers/leaders initializa-

tion aiming to create embeddings for community

centers/leaders, in the same space as the nodes,

positioning them at the best possible position in-

side the point cloud formed by the node embed-

ding.

• Phase 4. Mixed learning where the model si-

multaneously improves node embedding while

bringing them and the community centers/leaders

closer to each other, thus, producing a more

community-aware node embedding with more

condensed clusters eventually (Rozemberczki

et al., 2019).

The following sections will thoroughly explore these

phases, along with their different variants.

3.1 Graph Processing

The ﬁrst problem encountered in all community de-

tection methods is how to read the graph and explore

its topological structure most optimally, here we tried

to use a data structure that can better represent the

neighbourhood levels of a node while emphasizing

the differences between these levels. So, in this pa-

per, we propose two data structures called the Views

Matrix and Walks Matrix.

3.1.1 Views Matrix

To produce this structure, the graph is seen as a

Markov chain model, where each node is a state and

the probability of moving from a node to one of its

neighbours is uniform between all neighbours N(i)

(since the graph is unweighted). So, P is an N × N

SPACED: A Novel Deep Learning Method for Community Detection in Social Networks

143

matrix (Eq. 1) representing the probability of moving

from node i to node j.

i j

(

|N(i)|

if j ∈ N(i)

0 otherwise

(1)

So the matrix P can represent ﬁrst-order proxim-

ity in a probabilistic format. To obtain other levels

of neighbourhood we can easily use the property of

Markov chains where P

represents the probability of

going from node i to j in two steps and so on. So, for

a max depth d and for all k ∈ [1, d] we would obtain:

(k)

= P

(2)

But you can notice here that the return probabil-

ity is always considered, for this we propose another

variant where we use this formula instead:

(k)

i j











0 if k = −1 or (k = 0 and i ̸= j) or P

(k−2)

i j

̸= 0

1 if k = 0 and i = j

(k−1)

× P)

i j

otherwise

(3)

Now, we need to differentiate between the differ-

ent levels of neighborhood, so we add some weights

to each level matrix P

(k)

. We use two approaches:

(1) Arithmetic approach where we weighted the lev-

els by subtracting one for each level {d − 0, d − 1, ...,

d −(d −1)}; (2) Harmonic approach where we divide

by an incrementing number each level {

, ...,

Finally these resulting matrices P

(k)

are concate-

nated along their columns resulting in a |V |×(d ∗|V|)

matrix as follows:

M =



(1)

(2)

·· · P

(d)



(4)

3.1.2 The Walks Matrix

In an attempt to imitate random walks without per-

forming them, we exploited the already constructed

views matrix by combining them with a weighted

mean using the same weights used earlier. This cre-

ates for each node i a distribution resembling the por-

tion of a node appearance in different random walks

of varying lengths started from the node i, which cre-

ates some sort of context around each node.

W =

∑

k=1

(k)

∑

k=1

(5)

3.2 Node Embedding

We proposed an autoencoder architecture that tries to

lower each level of neighbourhood to the embedding

space of size e individually and then learn the relation

between them. To achieve this, we created a neural

network with 3 hidden layers and an input and out-

put. For the input we use the views matrix, where

each row represents a sample (which is a node of the

graph), thus, a node is represented by a vector of size

d ∗ |V |. The ﬁrst hidden layer, called the Level em-

bedding layer, lowers each level independently to an

embedding of size e so the output of this layer is a

vector of size d ∗ e. The second layer combines the

embedding of each level to a ﬁnal embedding, here we

densely link each dimension from the different levels

to their respective dimension in the ﬁnal embedding,

thus, learning for each dimension a weights vector of

size |V| for a total of d ∗ |V | weights for all the dimen-

sions. The last layer is a symmetric layer identical

to the Level embedding layer. Followed by an out-

put layer symmetric to that of the input layer. This

architecture is demonstrated in ﬁgure 3

3.3 Communities Centers / Leaders

Initialization

This phase aims to position community cen-

ters/leaders as optimally as possible inside the point

cloud of nodes. Community centers initialization

means putting independent points that sit at the center

of the communities. In this category, we propose two

methods: A neural network architecture, called Com-

munity Embedder, which uses a custom loss func-

tion (11) for the above-mentioned purpose, and the

KMeans method.

Community leaders initialization on the other

hand focuses on choosing a node as the leader of

its community. Here we crafted two other methods:

Leaders SA which is a simulated annealing for choos-

ing the optimal combination of nodes that minimizes

the same loss function used earlier (11), and a method

named Walk Leaders that exploits the Walks matrix

and uses a heuristic we deﬁned to determine the lead-

ers.

3.3.1 Community Embedder

It’s a simple neural network containing only densely

connected input and output layers. The input layer

receives a one-hot encoded vector denoting the com-

munity to embed (so the size of this layer is K) and

lowers it to the embedding space represented by the

output layer (evidently of size e). To deﬁne the loss

function for this model we needed to put some theo-

ries that describe an optimal position of a community

center, we came up with four, each having a loss term

and then combined them into one loss function as fol-

lows:

First, we naively suppose that a good partitioning

WEBIST 2024 - 20th International Conference on Web Information Systems and Technologies

144

Figure 3: The autoencoder architecture.

of a point cloud gives each community an equal share

of nodes so each community in this case would get

|V |

nodes. So, if we call I

the number of nodes that

chose community center c we can have a loss function

that increases as I

gets higher (Dominance) or lower

(Subdominance) and approaches 0 if the number is

approximately

|V |

, which we can achieve with (6).



∗ K − N

N(K − 1)



(6)

For calculating I

we need to ensure that the func-

tion that calculates I

is differentiable. To do this we

use the Boltzmann Operator for a SmoothMin func-

tion (7) (α should be negative and large in absolute

value), which can help determine the minimum dis-

tance for each node to all the other centers producing

an almost-one-hot matrix specifying for each node the

community it chose. This matrix can then easily be

used to get an approximate I

SmoothMin({x

}) =

∑

i=1

αx

∑

i=1

αx

(7)

Second, inspired by the Modularity metric we

want to ensure that the distance between internal

nodes is much lower compared to the distance be-

tween internal nodes and external nodes:

∑

i, j∈c

∥ f

− f

∥

∑

i∈c, j /∈c

∥ f

− f

∥

(8)

Third, to easily approach the nodes to their com-

munity center we ﬁrst need to bring the community

center closer to them, so we add another term that pe-

nalizes the distance between a community center and

the nodes of its community:

= 1 − e

−

∑

i∈c

(∥µ

− f

∥

)

(9)

Fourth, we assume that community centers are

generally positioned far from each other, so we pe-

nalize their closeness:

= e

−

∑

′

∈K,c

′

̸=c

(∥µ

′

−µ

∥

)

(10)

The loss function is then a weighted sum of the

previous losses:

Loss = αL1 + βL2 + γL3 + δL4 (11)

Where α, β, γ and δ are weights that sum up to 1

and denote the importance of each term.

3.3.2 Leaders SA

This method uses the Simulated Annealing meta-

heuristic (Kirkpatrick et al., 1983) to minimize as best

as possible the loss function (11). It’s important to

note that here we don’t need the derivative of the loss

function so we can use the exact way to calculate it

by directly using argmin instead of using SmoothMin.

We start by creating an initial random set of leader

nodes S. We get a neighbor of a solution S by choos-

ing a random node from it which we’ll be replaced

by a randomly chosen node from V − S. This way

SPACED: A Novel Deep Learning Method for Community Detection in Social Networks

145

Figure 4: Mixed learning architecture.

we can execute Simulated Annealing to progressively

improve the initial solution.

3.3.3 Walk Leaders

We propose a heuristic (Algorithm 1) to extract com-

munity leaders and automatically detect the number

of communities K.

Since the walks matrix closely represents the dis-

tribution of nodes in hypothetical varying random

walks starting from each node (we call such a distri-

bution the context of the node), we can assume that the

leader of a node i is more likely to be among the nodes

with the most appearances in the context of i, so we

start by picking for each node the l leading nodes in

its context (we call them potential leaders, see line 1).

The goal here is to choose the minimal set of nodes

that contains at least one potential leader of each

node. So, for selecting the leader of each node we

ﬁrst start with the potential leaders having the max-

imum number of appearances (see line 8), and then

we only take the ﬁrst one that is already chosen by a

previous node (denoting the popularity of the leader

among previous nodes, see line 11), in case no max

count leader is already chosen by a previous node we

choose the one with more appearances in the context

of the node (see line 13). After iterating through all

the nodes we end up with a current leaders vector,

but this vector can still have some leader nodes that

don’t choose themselves as leaders which isn’t co-

herent. So, we used the Reﬁnement of the idx vector

method proposed by (Taheri and Bouyer, 2020a) to

ﬁx this conﬂict (see line 15).

Algorithm 1: Walk Leaders.

Input: walks matrix W , leaders window l

Output: number of leaders K, walk leaders WL

1 PL ← PotentialLeaders(W, l);

2 if l == 1 then

3 current leaders ← PL;

4 else

5 current leaders ← ∅;

6 for each node do

7 max count ← max({PLCount(p) | p ∈ PL[node]});

8 max count leaders ← {p ∈ PL[node] | PLCount(p) =

max count};

9 popular leaders ←

max count leaders ∩ current leaders;

10 if popular leaders ̸= ∅ then

11 current leader ←

ﬁrst element of popular leaders;

12 else

13 current leader ←

ﬁrst element of max count leaders;

14 current leaders[node] ← current leader;

15 Reﬁne current leaders vector;

16 return UniqueCount(current leaders), current leaders;

The result of this method is a vector

current leaders assigning for each node i a leader

node, so to get the number of communities we count

the number of unique values in current leaders.

3.4 Mixed Learning

The aim of this phase is to enhance node embeddings

to represent community-aware high-order proximities

between nodes, i.e. nodes that are in the same com-

munity will have embeddings that are close to each

other. In this work, such an aim is achieved by im-

proving both node embeddings and community cen-

WEBIST 2024 - 20th International Conference on Web Information Systems and Technologies

146

Figure 5: Node embedder architecture.

ters/leaders in one learning process by bringing them

closer to each other, which will form eventually accu-

rate clusters. In both approaches that we’ll present in

the next sub-sections, the community embedder will

either start from the weights it learned if used to ini-

tialize the centers, or will receive the centers/leaders

embedding coordinates as initial weights.

3.4.1 Autoencoder-Based

One ﬁrst approach to implement the learning process

is to use the same autoencoder that was used to learn

initial node embeddings. The autoencoder weights at

the end of the initial node embedding phase are used

as initial weights in this phase. The autoencoder and

the Community Embedder are now trained at the same

time. For a given node, the input to the autoencoder is

its row in the level matrix while the input to the Com-

munity Embedder is the chosen community for that

node as a one-hot vector. The training loss is a com-

bined loss between the clustering loss (11) (using the

output from the Community Embedder and the em-

beddings from the autoencoder) and the reconstruc-

tion error of the autoencoder as shown in ﬁgure 4.

3.4.2 Node Embedder-Based

Another approach proposed in this work is to use a

new architecture to enhance node embeddings which

we named ”Node Embedder”. This architecture in-

spired by Skip-gram (Mikolov et al., 2013) has four

layers: an input layer, embedding layer, context layer,

and output layer. A node encoded as a one-hot vec-

tor of size |V | is passed to the input layer which is

connected densely to the embedding layer of size e.

The embedding layer is densely connected to the con-

text layer. The context layer contains c vectors of size

|V| (so its size is c ∗ |V|). c represents the context

size. Each vector is activated with a Softmax func-

tion. This layer represents eventually the nodes that

are in the context of the current node as one-hot vec-

tors. Then the context layer is connected to the output

layer via a one-dimensional convolutional kernel such

that the i

node of each vector in the context layer is

connected to the i

node in the output layer. Figure 5

summarizes this architecture.

The Node Embedder and Community Embedder

are trained at the same time. The input for the Node

Embedder is a one-hot vector representing the node,

and in the same way as the autoencoder-based ap-

proach, the input for the Community Embedder is the

chosen community for that node as a one-hot vector.

The training loss is a combination of the clustering

loss (11) (using the output of the Community Em-

bedder and the embeddings from the Node Embed-

der) and a mean squared error between the output of

the Node Embedder and the corresponsding row from

the walk matrix. Figure 4 details the architecture ex-

plained in this paragraph.

SPACED: A Novel Deep Learning Method for Community Detection in Social Networks

147

3.5 Clustering

For community detection, one can use any clustering

algorithm on the ﬁnal node embeddings. Extracted

clusters represent the detected communities. In this

work, community centers/leaders can be used to de-

tect communities by assigning each node to the clos-

est community center. The results were also tested

with Kmeans and Afﬁnity Propagation.

4 COMPUTATIONAL RESULTS

AND DISCUSSION

This section presents the results of computational ex-

periments assessing the performance and effective-

ness of SPACED for community detection in social

networks. All algorithms and tests were developed

in Python using TensorFlow and executed on a com-

puter running Windows 11 Pro, equipped with 16 GB

of RAM, an Intel(R) Core(TM) i5-10310U CPU at

1.70GHz, and an integrated Intel(R) UHD Graphics.

Our experiments cover both synthetic and real-

world networks. The real-world datasets included:

Table 1: Characteristics of the tested real-world bench-

marks.

Dataset

Comm-

unities

Nodes Edges

Karate

(Zachary, 1977)

2 34 78

Dolphins

(Lusseau et al., 2003)

2 62 159

Polbooks

[V. Krebs, unpublished]

3 105 441

Football

(Girvan and Newman, 2002)

12 115 613

(Leskovec et al., 2007)

42 1005 25571

For the synthetic networks we used the

Lancichinetti-Fortunato-Radicchi (LFR) bench-

mark. The LFR datasets are characterized by the

mixing parameter µ which controls the level of

(a) Dolphins. (b) Polbooks. (c) Email.

(d) Football. (e) Karate.

Figure 6: NMI performance on real-world datasets.

WEBIST 2024 - 20th International Conference on Web Information Systems and Technologies

148

community overlap and noise (the higher the µ value

the more the noise). The LFR datasets comprised 128

nodes and 1024 edges, for the mixing parameter we

tested on the values : 0.00, 0.25, 0.40, 0.45, and 0.50.

To evaluate the effectiveness of our solution, we

employed the Normalized Mutual Information (NMI)

metric which is a widely used measure in literature.

NMI measures the similarity between the true com-

munity structure and the detected community struc-

ture, providing a score between 0 and 1, where 1 in-

dicates perfect matching.

4.1 SPACED Performance Analysis on

Real-World Datasets

The performance of our method on real-world

datasets was assessed under both approaches, AD k

(Automatic Detection of the number of communities

”k”) and NAD k (No-Automatic Detection, ”k” is pre-

deﬁned). As shown in Table 2, our results demon-

strated good accuracy in detecting the number of com-

munities which is a signiﬁcant advantage over many

traditional methods that require this parameter to be

known a priori. Also as summarized in ﬁgure 7,

the community assignments produced by SPACED

showed good NMI results both for the AD k and the

NAD k modes. For instance, on the Karate network,

our method achieved an NMI of 1.0, perfectly match-

ing the ground truth.

Figure 7: AD k vs. NAD k.

Figures 6a to 6e illustrate the performance in

terms of NMI for the Karate, Dolphins, Polbooks,

Football, and Email networks respectively, compared

against several well-known community detection al-

gorithms, including Louvain (Blondel et al., 2008b),

Table 2: Real-world datasets: detected vs. ground truth

number of communities.

Dataset Ground Truth k Detected k

Karate 2 2

Dolphins 2 4

Polbooks 3 5

Football 12 15

Email 42 39

Infomap (Rosvall and Bergstrom, 2012), GraphTrees

(Dalleau et al., 2020a), and others (see Appendix).

Our method, particularly the NAD k variant, consis-

tently performed well across different datasets.

Also, SPACED has demonstrated reasonable exe-

cution times, with a direct correlation to network size.

Figure 8 presents a comparative analysis of execution

times (in seconds) across various real-world datasets,

further illustrating that the rate of increase slows as

network size grows. This demonstrates SPACED’s

strong potential for efﬁcient scalability to larger net-

works.

Figure 8: Execution times in seconds for real-world

datasets.

However, the method faces certain limitations in

memory usage, primarily due to its reliance on the ad-

jacency matrix, which becomes impractical for large-

scale networks and may lead to excessive memory

consumption. This issue can be mitigated through

technical optimizations, such as storing only non-zero

values, as adjacency matrices in large-scale networks

are typically sparse. Additionally, techniques like

graph partitioning and distributed processing can be

explored to further optimize the performance on large

SPACED: A Novel Deep Learning Method for Community Detection in Social Networks

149

(a) NMI performance. (b) Execution times in seconds.

Figure 9: SPACED results on synthetic datasets.

networks without compromising accuracy.

4.2 SPACED Performance Analysis on

Synthetic Datasets

The synthetic datasets were generated using the LFR

benchmark with varying levels of mixing parameters.

Figure 9a illustrates the comparative performance of

our method between the AD k and NAD k variants on

the synthetic datasets for increasing values of the mix-

ing parameter, and Table 3 shows the detected number

of communities within the LFR networks. The rela-

tively high NMI values indicate the good performance

and robustness of our model, with an expected NMI

value decrease as the mixing parameter increases due

to the induced community noise. Also our method

demonstrated its strength in correctly identifying the

number of communities.

Table 3: Synthetic datasets: detected vs. ground truth num-

ber of communities.

Dataset Ground Truth k Detected k

LFR-0.00 4 5

LFR-0.25 4 4

LFR-0.40 4 4

LFR-0.45 4 3

LFR-0.50 4 4

Our method showed stable execution times inde-

pendently from the graph complexity, making it also

usable for graphs with a complex structure. The line

chart in ﬁgure 9b shows a comparative study of the

execution times in seconds for every synthetic dataset

in an increasing order of complexity.

5 CONCLUSION

In this paper, we introduced SPACED, a deep learning

model designed to uncover the community structure

of networks solely from their topology by generating

community-aware node embeddings. We evaluated

its performance on several widely recognized bench-

marks and compared it to other established methods,

showcasing its competitiveness in the ﬁeld.

Future enhancements for SPACED include explor-

ing and reﬁning the various pipeline variants to iden-

tify the optimal conﬁguration. Additionally, while

SPACED has shown strong stability and effectiveness

on small to medium-sized datasets, it has not yet been

tested on extremely large datasets. Thus, future work

will need to address challenges in memory usage and

execution time to ensure scalability.

REFERENCES

Bhatia, V. and Rani, R. (2019). A distributed overlapping

community detection model for large graphs using

autoencoder. Future Generation Computer Systems,

94:16–26.

Blondel, V. D., Guillaume, J.-L., Lambiotte, R., and Lefeb-

WEBIST 2024 - 20th International Conference on Web Information Systems and Technologies

150

vre, E. (2008a). Fast unfolding of communities in

large networks. Journal of Statistical Mechanics:

Theory and Experiment, 2008(10):P10008.

Blondel, V. D., Guillaume, J.-L., Lambiotte, R., and

Lefebvre, E. (2008b). Louvain algorithm. Jour-

nal of Statistical Mechanics: Theory and Experiment,

2008(10):P10008.

Cai, B., Wang, Y., Zeng, L., Hu, Y., and Li, H. (2020).

Edge classiﬁcation based on convolutional neural net-

works for community detection in complex network.

Physica A: statistical mechanics and its applications,

556:124826.

Cavallari, S., Cambria, E., Cai, H., Chang, K. C.-C., and

Zheng, V. W. (2019). Embedding both ﬁnite and inﬁ-

nite communities on graphs [application notes]. IEEE

computational intelligence magazine, 14(3):39–50.

Cavallari, S., Zheng, V. W., Cai, H., Chang, K. C.-C., and

Cambria, E. (2017). Learning community embed-

ding with community detection and node embedding

on graphs. In Proceedings of the 2017 ACM on Con-

ference on Information and Knowledge Management,

pages 377–386.

Dalleau, K., Couceiro, M., and Smail-Tabbone, M. (2020a).

Computing vertex-vertex dissimilarities using random

trees: Application to clustering in graphs. In Ad-

vances in Intelligent Data Analysis XVIII, pages 132–

144, Berlin, Heidelberg. Springer-Verlag.

Dalleau, K., Couceiro, M., and Smail-Tabbone, M. (2020b).

Computing vertex-vertex dissimilarities using random

trees: Application to clustering in graphs. In Advances

in Intelligent Data Analysis XVIII, pages 132–144.

Girvan, M. and Newman, M. E. (2002). Community struc-

ture in social and biological networks. Proceedings of

the national academy of sciences, 99(12):7821–7826.

Gong, M., Liu, J., Ma, L., Cai, Q., and Jiao, L. (2014).

Novel heuristic density-based method for community

detection in networks. Physica A: Statistical Mechan-

ics and its Applications, 403:71–84.

Grover, A. and Leskovec, J. (2016). node2vec: Scal-

able feature learning for networks. In Proceedings

of the 22nd ACM SIGKDD international conference

on Knowledge discovery and data mining, pages 855–

864.

Guo, W.-F. and Zhang, S.-W. (2016). A general method of

community detection by identifying community cen-

ters with afﬁnity propagation. Physica A: Statistical

Mechanics and its Applications, 447:508–519.

He, D., You, X., Feng, Z., Jin, D., Yang, X., and Zhang,

W. (2018). A network-speciﬁc markov random ﬁeld

approach to community detection. In Proceedings of

the AAAI Conference on Artiﬁcial Intelligence, vol-

ume 32.

Jin, D., Liu, Z., Li, W., He, D., and Zhang, W. (2019a).

Graph convolutional networks meet markov random

ﬁelds: Semi-supervised community detection in at-

tribute networks. In Proceedings of the AAAI confer-

ence on artiﬁcial intelligence, volume 33, pages 152–

159.

Jin, D., You, X., Li, W., He, D., Cui, P., Fogelman-

Souli

e, F., and Chakraborty, T. (2019b). Incorpo-

rating network embedding into markov random ﬁeld

for better community detection. In Proceedings of

the AAAI Conference on Artiﬁcial Intelligence, vol-

ume 33, pages 160–167.

Jin, D., Zhang, B., Song, Y., He, D., Feng, Z., Chen, S., Li,

W., and Musial, K. (2020). Modmrf: A modularity-

based markov random ﬁeld method for community de-

tection. Neurocomputing, 405:218–228.

Karatas¸, A. and S¸ahin, S. (2018). Application areas of com-

munity detection: A review. In 2018 International

congress on big data, deep learning and ﬁghting cy-

ber terrorism (IBIGDELFT), pages 65–70. IEEE.

Karrer, B. and Newman, M. E. (2011). Stochastic block-

models and community structure in networks. Physi-

cal review E, 83(1):016107.

Kirkpatrick, S., Gelatt Jr, C. D., and Vecchi, M. P.

(1983). Optimization by simulated annealing. science,

220(4598):671–680.

Leskovec, J., Kleinberg, J., and Faloutsos, C. (2007).

Graph evolution: Densiﬁcation and shrinking diame-

ters. ACM transactions on Knowledge Discovery from

Data (TKDD), 1(1):2–es.

Lusseau, D., Schneider, K., Boisseau, O. J., Haase, P.,

Slooten, E., and Dawson, S. M. (2003). The bot-

tlenose dolphin community of doubtful sound features

a large proportion of long-lasting associations. Behav-

ioral Ecology and Sociobiology, 54(4):396–405.

Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013).

Efﬁcient estimation of word representations in vector

space. arXiv preprint arXiv:1301.3781.

Paim, E. C., Bazzan, A. L. C., and Chira, C. (2020). De-

tecting communities in networks: a decentralized ap-

proach based on multiagent reinforcement learning. In

2020 IEEE Symposium Series on Computational Intel-

ligence (SSCI), pages 2225–2232.

Perozzi, B., Al-Rfou, R., and Skiena, S. (2014). Deepwalk:

Online learning of social representations. In Proceed-

ings of the 20th ACM SIGKDD international confer-

ence on Knowledge discovery and data mining, pages

701–710.

Rosvall, M. and Bergstrom, C. T. (2012). Infomap algo-

rithm. Physical Review E, 86(1):016108.

Rozemberczki, B., Davies, R., Sarkar, R., and Sutton, C.

(2019). Gemsec: Graph embedding with self clus-

tering. In Proceedings of the 2019 IEEE/ACM inter-

national conference on advances in social networks

analysis and mining, pages 65–72.

Shi, X., Lu, H., He, Y., and He, S. (2015). Community de-

tection in social network with pairwisely constrained

symmetric non-negative matrix factorization. In Pro-

ceedings of the 2015 IEEE/ACM International Con-

ference on Advances in Social Networks Analysis and

Mining 2015, pages 541–546.

Sui, S.-K., Li, J.-P., Zhang, J.-G., and Sui, S.-J. (2016).

The community detection based on SVM algorithm.

In 2016 13th International Computer Conference on

Wavelet Active Media Technology and Information

Processing (ICCWAMTIP).

Taheri, S. and Bouyer, A. (2020a). Community detection in

SPACED: A Novel Deep Learning Method for Community Detection in Social Networks

151

social networks using afﬁnity propagation with adap-

tive similarity matrix. Big data, 8(3):189–202.

Taheri, S. and Bouyer, A. (2020b). Community detec-

tion in social networks using afﬁnity propagation with

adaptive similarity matrix. Big Data, 8(3):189–202.

PMID: 32397731.

Tandon, A., Albeshri, A., Thayananthan, V., Alhalabi, W.,

Radicchi, F., and Fortunato, S. (2021). Community

detection in networks using graph embeddings. Phys-

ical Review E, 103(2):022316.

Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., and Mei,

Q. (2015). Line: Large-scale information network em-

bedding. In Proceedings of the 24th international con-

ference on world wide web, pages 1067–1077.

Tian, F., Gao, B., Cui, Q., Chen, E., and Liu, T.-Y. (2014).

Learning deep representations for graph clustering. In

Proceedings of the AAAI Conference on Artiﬁcial In-

telligence, volume 28.

Xie, Y., Gong, M., Wang, S., and Yu, B. (2018). Commu-

nity discovery in networks with deep sparse ﬁltering.

Pattern Recognition, 81:50–59.

Xie, Y., Wang, X., Jiang, D., and Xu, R. (2019). High-

performance community detection in social networks

using a deep transitive autoencoder. Elsevier. Article

history: Received 18 December 2018; Revised 8 April

2019; Accepted 10 April 2019; Available online 17

April 2019.

Zachary, W. W. (1977). An information ﬂow model for con-

ﬂict and ﬁssion in small groups. Journal of anthropo-

logical research, pages 452–473.

APPENDIX

References for the Compared Against

Methods

DC-SBM (Karrer and Newman, 2011)

NetMRF(He et al., 2018)

ModMRF(Jin et al., 2020)

GMRF (Jin et al., 2019b)

SVM (Sui et al., 2016)

APAS (Taheri and Bouyer, 2020b)

ComNet-R(Cai et al., 2020)

MRFasGCN(Jin et al., 2019a)

DSFCD(Xie et al., 2018)

RL (Paim et al., 2020)

Louvain (Blondel et al., 2008a)

Infomap (Rosvall and Bergstrom, 2012)

GraphTrees (Dalleau et al., 2020b)

CDMIC (Guo and Zhang, 2016)

CMDR (Gong et al., 2014)

PCSNMF(Shi et al., 2015).

WEBIST 2024 - 20th International Conference on Web Information Systems and Technologies

152