7 CONCLUSIONS AND FUTURE
WORK
This paper presented a metadata focused crawler for
Linked Data. The crawler starts with a small set T of
generic RDF terms and enriches T by identifying
subclasses, equivalences (
sameAs property) and
related terms (
seeAlso property).
In general, the metadata focused crawler intro-
duced in this paper helps simplify the triplification
and linkage processes, thereby contributing to the
dissemination of Linked Data. Indeed, the results of
the crawler may be used: to recommend ontologies
to be adopted in the triplification process; to recom-
mend triplesets to be used in the linkage process;
and to increase the quality of VoID descriptions.
Finally, the overall crawling process is open to
several improvements. For example, we may use
summarization techniques to automatically select the
initial set of terms. We may also optimize the crawl-
ing process by combining the crawling queries into a
single query and by using caching to avoid band-
width issues.
ACKNOWLEDGEMENTS
This work was partly funded by CNPq, under grants
160326/2012-5, 303332/2013-1 and 57128/2009-9,
and by FAPERJ, under Grants E-26/170028/2008
and E-26/103.070/2011.
REFERENCES
Alexander, K. Cyganiak, R., Hausenblas, M., Zhao, J.,
2009. Describing linked datasets - on the design and
usage of void, the ‘vocabulary of interlinked datasets’.
Proc. Workshop on Linked Data on the Web
(LDOW’09), Madrid, Spain.
Bizer, C., Heath, T, Berners-Lee, T., 2009. Linked Data -
The Story So Far, Int’l. Journal on Semantic Web and
Info. Sys., 5 (3), pp. 1-22.
Brickley, D., Guha, R.V. (eds.), 2004. RDF Vocabulary
Description Language 1.0: RDF Schema. W3C Rec-
ommendation 10 February 2004.
Ding, l., Pan, R., Finin, T., Joshi, A., Peng, y., Kolari, P.,
2005. Finding and ranking knowledge on the semantic
web. Proc. 4th Int’l. Conf. on the Semantic Web,
Springer-Verlag, pp. 156-170.
Fionda, V., Gutierrez, C., Pirró, G., 2012. Semantic navi-
gation on the web of data: specification of routes, web
fragments and actions. Proc. 21st Int’l. Conf. on
World Wide Web, pp. 281-290.
Isele, R., Harth, A., Umbrich, J., Bizer, C., 2010. LDspi-
der: An open-source crawling framework for the Web
of Linked Data. Proc. Int’l. Semantic Web Conf.
(Posters), Shanghai, China.
Leme, L.A.P.P., Lopes, G.R., Nunes, B.P., Casanova,
M.A., Dietze, S., 2013. Identifying candidate datasets
for data interlinking. Proc. 13th Int’l. Conf. on Web
Engineering, Aalborg, Denmark (July 8-12, 2013), pp.
354-366.
Lopes, G.R., Leme, L.A.P.P., Nunes, B.P., Casanova,
M.A., Dietze, S., 2013. Recommending Tripleset In-
terlinking through a Social Network Approach. Proc.
14th Int’l. Conf. on Web Information System Engi-
neering, Nanjing, China (Oct. 13-15, 2013), pp. 149-
161.
Manola, F., Miller, E., 2004. RDF Primer, W3C Recom-
mendation 10 February 2014.
Martínez-Romero, M., Vázquez-Naya, J., Munteanu, C.,
Pereira, J., Pazos, A., 2010. An approach for the auto-
matic recommendation of ontologies using collabora-
tive knowledge. Proc. 14th Int’l. Conf. on Knowledge-
based and Intelligent Information and Engineering
Systems, Part II, Springer, pp. 74-81.
Nikolov, A., d'Aquin, M., 2011. Identifying Relevant
Sources for Data Linking using a Semantic Web In-
dex. Proc. Workshop on Linked Data on the Web.
Volume 813 of CEUR Workshop Proceedings, CEUR-
WS.org.
Nikolov, A., d'Aquin, M., Motta, E., 2012. What should I
link to? Identifying relevant sources and classes for
data linking. Proc. Joint Int’l. Semantic Technology
Conference, pp. 284-299.
Prud’hommeaux, E., Seaborne, A., 2008. SPARQL Query
Language for RDF, W3C Recommendation 15 January
2009.
Saint-Paul, R., Raschia, G., Mouaddib, N., 2005. General
purpose database summarization. Proc. 31st Int’l.
Conf. on Very Large Data Bases. VLDB Endowment,
pp. 733-744.
W3C OWL Working Group, 2012. OWL 2 Web Ontology
Language Document Overview (Second Edition).
W3C Recommendation 11 December 2012.
Wang, J., Wen, J., Lochovsky, F., Ma, W., 2004. Instance-
based schema matching for web databases by domain-
specific query probing. Proc. 30th Int’l. Conf. on Very
Large Data Bases. Vol. 30. VLDB Endowment, pp.
408-419.
ICEIS2014-16thInternationalConferenceonEnterpriseInformationSystems
498