Table 3: Distribution of class instantiation in DBPedia and in our extracted bases.
Base Person Place Organisation Work Other
DBPedia 23,12% 17,06% 5,48% 9,57% 44,76%
Academic
Journals
2,76% 1,78% 7,59% 41,29% 46,58%
Berlin 48,66% 8,09% 12,10% 5,63% 25,52%
Cat 34,21% 2,63% 2,63% 2,63% 57,89%
Elvis 9,54% 0,87% 1,95% 75,27% 12,37%
Legal Case 0,95% 0,00% 0,05% 0,08% 98,92%
Moon 0,40% 97,11% 0,13% 0,07% 2,28%
Paris 53,37% 2,18% 11,46% 5,08% 27,91%
Potato 3,23% 0,00% 9,68% 0,00% 87,10%
Tarantino’s
Movies
41,74% 0,41% 4,55% 7,02% 46,28%
Zola’s Books 1,47% 1,47% 0,00% 19,12% 77,94%
For instance, if a statement containing a blank
node b
1
as object appears at a rank n, then all state-
ments containing b
1
as subject must appear at rank
n, instead of appearing at rank n + 1 in a naive
approach. These groups of statements are called
Concise Bounded Description, proposed in (Stickler,
2005), and are considered as an optimal form of de-
scription of a resource. Figure 4 represents an extrac-
tion from the seed composed by the singleton {URI
1
}
to rank 1, considering the blank node between the
seed and both URI
4
and URI
5
as a 3-ary relation. So,
the rank 1 extraction contains the URI
2
to URI
5
.
4 USE-CASE: DATA
GENERATION FOR
BENCHMARKING
For some uses, especially for benchmarking, it is in-
teresting to locally handle data from the Linked Data
Cloud and specially only needed sub-part from this
big cloud to a given use-case. There is a limited of-
fer of benchmarks to evaluate query methods or en-
gines on the Linked Data. Most of these benchmarks
use dumps from the most known bases of the Linked
Data Cloud (Bail et al., 2012; Schmidt et al., 2011)
or use random generator (Schmidt et al., 2008). So,
the choice is either using huge bases with real data
or manageable bases with fake data. Using real data
at an acceptable scale for a “standard” computer
4
,
as base dumps often contain at least several million
triples.
In a previous work, to test a new approach for
querying a set of distant RDF bases, as presented in
(Raimbault and Maillot., 2013), we needed a mini-
Linked Data Cloud where each base is composed with
4
Less than 10 cores, less than 10Gb RAM.
real data on a specific domain, is small enough to be
processed in our experimental environment, and has a
SPARQL endpoint.
We present here the experience of an extraction
use-case we used in this previous work, according to
the method presented in Section 3. To evaluate our
approach we needed to evaluate different method of
querying the Linked Data Cloud. For practical rea-
son we chose to only use DBPedia, the biggest multi-
domain base in the Linked Data Cloud, to have the
same ontology for every base. Even with the same on-
tology we aimed to keep the “structural” differences
between each base as those between each base of the
Linked Data Cloud. These differences resided in the
instantiation distribution in each class for each base,
i.e. in each specialized domain, there is more indi-
viduals of the classes representatives of the domain
subject (e.g. Animal in Life Science, Document in
Publication, etc.) than others.
We extracted with 2 as maximum rank, 10 differ-
ent bases for our tests
5
. The seeds were chosen to
represent some specific domains of the Linked Data
Cloud:
• The Cat class (in the “Life Sciences” domain)
• The Scientific Journal class (in the “Publications”
domain)
• The Legal Case class (in the “Government” do-
main)
• The Potato individual (in the “Life Sciences” do-
main)
• The Moon individual
• The singer Elvis Presley (in the “Media” domain)
• The city of Paris (in the “Geographic” domain)
• The city of Berlin (in the “Geographic” domain)
5
by using our tool 10 times
ICAART2014-InternationalConferenceonAgentsandArtificialIntelligence
340