is not feasible to analyse the whole set of communi-
ties and given that the majority of communities is not
interesting from the business point of view, it is im-
portant to incorporate a community selection step in
the process. In the proposed methodology, we per-
form community selection based on two criteria: (i)
the size of communities, and (ii) a RFM (Recency,
Frequency, Monetary) model.
In real-world large-scale social networks, such as
telecom graphs, the size distribution of the detected
communities usually follows a power-law (Nanavati
et al., 2008). A power-law is a relation between two
variables that occurs when one is the power of the
other, drawn from a probability distribution p (x) =
βx
−α
, where α is the constant scaling parameter of the
distribution. In the particular case of communities,
this phenomenon is associated with the sharp positive
asymmetric distribution of the number of elements in
each community. There are usually a large number
of communities containing only a few elements and
only a small number of communities with consider-
able size.
In order to test the hypothesis "the distribution of
the size of communities follows a power-law", we run
several tests using the poweRlaw package, available in
the R software (Team, 2008). Afterwards, to estimate
where to cut the distribution, we first sort the detected
communities by decreasing size. Then, we analyse
the decay in the number of elements covered in the
network, for each percentage of considered commu-
nities. The cut is made where a natural break of the
distribution occurs, according to the so-called elbow
method. Although this is an empirical procedure, it
is efficient in selecting the communities representing
the largest portion of the network.
The RFM (Recency, Frequency, Monetary) model
(Birant, 2011) is a marketing strategy used to quan-
titatively determine the best customers of a company
according to the following three components: (i) re-
cency, which in the context of our framework refers
to the average time elapsed since the last time an el-
ement of a community used the service (e.g., average
time elapsed between calls); (ii) frequency, which is
given by the average number of times the elements of
each community used the service during a given time
span (e.g., average number of calls made in one week)
and, (iii) monetary, which is given by the total amount
spent in the service by the elements of the community
(e.g., phone bill).
To implement the model, the values of each one of
these components are computed for each community
and, then, converted into a score from 1 to 5. For in-
stance, the 20% communities with higher frequency
are given a score 5 in the frequency component, the
next 20% communities are given a score of 4, and so
on. An overall score is then assigned to each commu-
nity, by summing up the scores obtained in each com-
ponent (i.e., recency, frequency and monetary). In
a weighted version of the model (Miglautsch, 2000),
each component is multiplied by a weight, which re-
flects its relative importance.
In the particular case of communities in large-
scale networks, the RFM analysis supports the de-
cision about which communities may be discarded
from the study, and which communities are worth
analysing from the business point of view. In other
words, this model provides a simple way to extract the
most active and profitable communities of customers,
which are the ones more likely to spread positive in-
fluence in the network.
The communities selected based on the previous
criteria are monitored over time, by applying the
MECnet framework. This temporal tracking relies on
a prior mapping step, as previously explained in Sec-
tion 2. The identification of temporal instances of the
same community, based on the matching criterion, al-
lows the further identification of transitions and the
characterisation of a community’s life-cycle.
After analysing how communities evolve over
time, it is important to characterise this evolution
by creating a profile for each dynamic community.
The task of community profiling is important to un-
derstand how one community differs from another,
as well as to understand the underlying logic of the
partitions identified by the community detection al-
gorithm. This profiling step is important to compa-
nies because it enables a better understanding of the
dynamic community analysis, thus allowing them to
more easily act upon these customer communities.
Communities are described in operational and rela-
tional terms. While the operational description is
related to business variables, the relational descrip-
tion is based on classical measures from social net-
work analysis (see Oliveira and Gama (2012) for an
overview). In order to find the general profile of
communities, several attributes from the elements that
compose them are studied, such as frequency and du-
ration of calls, as well as node centrality measures.
The profiling task is then performed using decision
trees, which can be linearized into interpretable deci-
sion rules.
Communities are classified into one of three
classes: growth, stagnation or decay, according to
their evolution over the time span. The type of evolu-
tion exhibited by each community relies on the anal-
ysis of their life-cycle (see Sections 2 and 3) and is
based on changes in the size of communities (i.e.,
number of elements) over the time span: Growth, if
ICEIS2015-17thInternationalConferenceonEnterpriseInformationSystems
238