4.1 Overview of Experiments
The vector representations generated with RaDE were
evaluated on information retrieval and visualization
tasks on 7 datasets of multiple domains (e.g. images,
texts and social networks) and each of these datasets
have different characteristics (e.g. dense or sparse
graphs, weighted or not weighted graphs, large or low
scale graphs).
The datasets of images and texts are not networks
by itself. In order to be able to apply Network Rep-
resentation Learning methods on these datasets, it
is necessary to generate a graph from the original
dataset. For this task, two main steps are necessary: i)
extracting the feature vectors of samples in dataset; ii)
calculating distances between the extracted features
for generating a graph weighted by these distances. In
this work the graphs generated for this kind of dataset
were complete
1
and the Euclidean distance was used
to calculate the weights.
Other category of datasets used to evaluate the
proposed approach was datasets that are networks by
itself. The selected network datasets are not origi-
nally weighted and for being able to apply RaDE, the
datasets must be weighted. For doing that we used
an approach based on shared neighborhood between
nodes.
The vector representations generated by RaDE
were compared with the vector representations gen-
erated by 4 other Network Representation Learning
methods which have characteristics different from
each other. The implementation provided by OpenNE
library
2
was used for executing the experiments and
they were executed on a machine with a Intel Xeon
E5-2660 @ 2.0Ghz processor, 64GB of RAM and
Arch Linux x86 64, kernel version 5.0.7 OS.
4.2 Datasets
We evaluated the effectiveness of RaDE on 7 datasets
of multiple domains. MPEG-7 (Latecki et al., 2000),
Oxford17Flowers (Nilsback and Zisserman, 2006)
and Corel5k (Liu et al., 2010) are image datasets
where each sample is described in function of its
extracted features. The features for Oxford17Flowers
(Nilsback and Zisserman, 2006) and Corel5k (Liu
et al., 2010) were extracted using the descriptors that
presented the highest MAP according to experimental
results presented on (Valem and Pedronette, 2019).
For MPEG-7 (Latecki et al., 2000), the features were
1
Note, however, that the proposed method does not re-
quire the graph to be complete.
2
https://github.com/thunlp/OpenNE
extracted using a contour descriptor (Pedronette
and da Silva Torres, 2010).
The weighted graphs, for MPEG-7 (Latecki et al.,
2000), Oxford17Flowers (Nilsback and Zisserman,
2006), Corel5k (Liu et al., 2010), 20NewsGroup
(Lang, 1995) and Iris (Dua and Graff, 2017), were
generated using the Euclidean distances between fea-
tures vector of each sample. For datasets that orig-
inally are networks, such as BlogCatalog (Zafarani
and Liu, 2009) and Wiki
3
, it was necessary an strat-
egy for assigning the weights, since they are not orig-
inally weighted. We assumed that the more neighbors
are shared between a pair of nodes, the more similar
they are to each other. The approach used for assign-
ing the weights is described in Equation 8,
w
i, j
=
1
1 + N (i, m) ∩ N ( j, m)
, (8)
where N (i, m) is the m nearest neighbors of a node
N
i
.
The details about each dataset evaluated on this
work are exposed bellow:
• MPEG-7 (Latecki et al., 2000): 1,400 images, di-
vided into 70 balanced classes, each one contain-
ing 20 samples. The features of the images were
extracted by CFD (Pedronette and da Silva Torres,
2010) which is a contour based descriptor.
• Oxford17Flowers (Nilsback and Zisserman,
2006): 1,360 images of 17 different species of
flowers, each one containing 80 different images.
Each image is described in function of 2,048
features, which were extracted using ResNet152
4
which is a residual neural network pre-trained
on ImageNet Dataset (Deng et al., 2009).
• Corel5k (Liu et al., 2010): 5,000 miscellaneous
images (e.g. fireworks, trees, boats, tiles, etc).
This dataset is divided into 50 categories, with
100 images each. Each image is described in
function of 1,000 features, which were extracted
using a DualPathNetwork92
5
.
• Iris (Dua and Graff, 2017): A dataset widely used
in pattern recognition task. It contains 150 sam-
ples of flowers, divided into 3 balanced classes.
Each sample is described in function of the petal
and sepal width and petal and sepal length.
• BlogCatalog3 (Zafarani and Liu, 2009): A social
network that contains 10,312 nodes and 333,983
edges. Each node represents a blogger and each
3
https://github.com/thunlp/OpenNE
4
https://github.com/Cadene/pretrained-models.pytorch
5
https://github.com/Cadene/pretrained-models.pytorch
VISAPP 2020 - 15th International Conference on Computer Vision Theory and Applications
146