are replaced by Nodes 4 and 8, respectively. Again,
all such nodes have neighbors with high affinity val-
ues, but the latter nodes have larger affinity values
than the former ones. Analogous considerations can
be done from α = 0.5 to α = 0.6, where the only effect
is the inversion between Nodes 4 and 8 in the ranking.
4 RESULTS
The main goal of our experimental validation has
been to verify on large OSN datasets if the choice of k
best targets according to the measures introduced here
is effective. To this aim, a first important problem to
be solved has been the construction of the input OSN.
Indeed, while a number of social network graphs are
publicly available, the same is not true for network
users profiles. In the following of this section, we
first discuss these aspects related to OSN construc-
tion, and then present some results we have obtained
by applying our approach on datasets coming from
the real world. The proposed approach has been im-
plemented in Java 1.8 under Apache Spark 1.6. To
this respect, the use of Big Data Technologies allow
to exploit the software tool also on very large OSNs.
4.1 Network Construction
OSN graphs are available for example from Standford
website (https://snap.stanford.edu/data/).
We have considered the twitter-2010 OSN from
that repository, having 90, 908 vertices and 443, 399
edges. Unfortunately, the available OSNs consist
only on the graph topology, no information about
user interests and profiles are publicly available. Web
scraping has been used here in order to collect and
extract useful contents for user profiles characteri-
zation. In particular, we have avoided to associate
randomly the information obtained by web scraping
to nodes in the considered OSN graph, due to the
fact that a random association would have altered
the natural mechanism according to which users in
the same neighbors have similar interests. In order
to mimic such a mechanism, which is important for
our approach (indeed the introduced measures aim at
detecting neighbor nodes with similar interests), we
have proceeded as follows.
We have first randomly selected 20 seed nodes
from the twitter-2010 OSN and 20 web-pages fo-
cused on different topics (cooking, fashion, cars, etc.).
Indeed, with a certain margin of simplification, we
have assumed that a user profile may be obtained
by scraping the contents of a web-page on a specific
topic. Then, a visit in depth of the OSN has been per-
formed starting from each of the seeds and stopping
when the entire network was visited. For each new
node to be visited, a new web-page has been visited
as well, following the cross-page links on the consid-
ered web-pages.
4.2 Experimental Validation
Our experimental analysis has been devoted to under-
stand to what extent our approach is effective, in or-
der to identify the k most convenient nodes in the in-
put OSN to which distribute the advertisement. As
already explained, the main aim here is to optimize
two different aspects when identifying the best tar-
gets, that is, the fact that interests of considered users
are related to the campaign contents, and the fact that
they have “friends” on the OSN potentially interested
to the distributed advertisements. We have considered
the web-pages associated to four brands, listed in Ta-
ble 4.
Table 4: The considered brands and their associated web-
pages.
Brand Web-page
AlphaRomeo www.alfaromeo.it
Amarelli www.amarelli.it
Carpisa www.carpisa.it
KikoCosmetic www.kikocosmetics.com
We have considered the OSN constructed as de-
scribed in the previous section and we have computed,
for each of the four brands, the different values of
affinity and utility (with α = 0, 25;0, 5;0, 75) for all
nodes in the network. Then, we have ranked them
in descending order, according to each of these mea-
sures. We have supposed that the number of target
nodes is k = 100 and we have fixed to 0.6 the mini-
mum value of affinity between user and brand profiles
in order a user to be considered a possible target.
The obtained results have been compared with a
random choice of the k nodes to which distribute the
advertisement. For 100 different times, 100 nodes
have been extracted from the set of vertices V and
the affinity between their and brand profiles have been
computed at each time. The obtained results for
the different brands do not present significant differ-
ences, therefore we illustrate only those regarding Al-
phaRomeo in Table 5. In particular, the considered
method is specified in the first column of the table,
and for the Random generation we have considered
the average of obtained results. For each method, the
number of nodes presenting an affinity value larger
than the chosen threshold when the first k nodes in
the corresponding ranking is chosen is shown in the
third column. It is interesting to observe that, with
Identifying the k Best Targets for an Advertisement Campaign via Online Social Networks
197