t has one chance to make its inactive neighbors active
in step t + 1 with a probability of p. Probabilities of
nodes activating other nodes can be assigned individ-
ually to each pair. So, for example, node i will activate
node j with a probability of p
ij
. Like Kempe, et al.,
we consider only a single probability that applies to
every linked pair, for simplicity.
The greedy approach used by Kempe, et al., starts
by finding the best node to activate using a brute force
method. A node is activated and then the diffusion
model is applied many times (in our tests 1000 iter-
ations). After testing all of the nodes, the one that
activated the most nodes is chosen. Then each of the
remaining nodes is added to the first node and the sim-
ulations are run again to find the best node to add to
the first one. This process continues until k nodes are
chosen.
Various enhancements and improvements have
been made to the greedy approach. Bharathi, et al.
(Bharathi et al., 2007) extended the approach to ac-
count for multiple, competing innovations. The de-
gree of a node is the number of outgoing links, i.e.
the number of friends to which it is connected. While
it is very fast to select the k nodes with the largest de-
gree, this has been shown to be inferior to the greedy
approach. However, Chen, et al. (Chen et al., 2009)
used degree heuristics to improve the running time of
the greedy algorithm. Narayanam, et al. (Narayanam
and Narahari, 2011), use the Shapley valuefrom game
theory as an heuristic to improve the running time of
the greedy approach.
The work in influence maximization is primarily
concerned with maximizing only the raw number of
nodes activated. We suggest that it be extended to fo-
cus on the number of communities covered as well. A
community is covered if one of the nodes in the com-
munity is activated. Our approach will be to choose
the initial set of nodes using the communities found
using the community finding algorithms.
2.3 Community Finding
The process of community finding in a network is
similar to clustering in data mining. In clustering,
the goal is to group the instances together in such a
way as to minimize the distances within groups and
maximize the distances between groups. Clustering
normally uses a distance function between every pair
of instances. Community finding algorithms use the
link structure where two nodes are either linked or
not. The goal differs depending on the algorithm, but
generally it is to maximize the number of links within
communities and to minimize the number of links be-
tween communities.
Many community finding algorithms have come
from the area of graph theory. Graph theory studies
the mathematical properties of graphs. Two examples
from graph theory will illustrate the power and limi-
tations.
First is the minimum spanning tree (MST) ap-
proach. Any fully connected graph can be converted
to a tree (a graph with no cycles) using a breadth-first
or depth-first search. The links of this minimum span-
ning tree can be removed to separate the graph into
groups of nodes. The second method is called Min-
Cut. In this method, a graph is analyzed to find the
minimum number of links that can be removed or cut
in order to separate the graph into two groups. Re-
peating this procedure will separate the graph into as
many groups as desired. While MST and MinCut can
be used to find communities in practice they are not
used often. The problem with these methods is that
they tend to form small, satellite communities around
a large connected component or in the case of MST,
form groups arbitrarily.
Others have successfully used modifications of
graph theory metrics to find communities. Newman
and Girvan (Newman and Girvan, 2004) proposed an
algorithm based on betweenness, a measure of traf-
fic through a network. Between every two nodes in a
connected graph, one can find a shortest path. The be-
tweenness for a link in a graph is the number of times
it is used for the shortest path for all pairs of nodes in
the graph. While this has shown excellent results the
shortcoming is that it is extremely slow.
Spectral clustering (Shi and Malik, 2000) converts
a graph to a set of features by taking the eigenvec-
tors of the LaPlacian matrix and then uses kmeans (a
well known data mining clustering technique) to form
communities. This popular method has been shown
to be equivalent to normalized cut, a more sophisti-
cated version of MinCut which produces more bal-
anced communities.
In data mining, the agglomerative approach to
clustering (Jain and Dubes, 1988), begins with every
instance in its own cluster by itself. Then clusters are
merged together based on a particular distance for-
mula. This approach (Porter et al., 2009) has also
been applied to networks, where nodes are assigned to
their own community (called singletons) and the com-
munities stepwise joined based on reducing the num-
ber of between-community links. Another method
has recently been proposed (Tang et al., 2010) where,
instead of starting with singletons, it starts by form-
ing neighborhoodcommunities around each node and
then joining communities to minimize overlap. This
approach achieves ego-centric communities.
DiscoveringInfluentialNodesinSocialNetworksthroughCommunityFinding
405