A Metadata Focused Crawler for Linked Data
Raphael do Vale Amaral Gomes, Marco A. Casanova, Giseli Rabello Lopes, Luiz André P. Paes Leme
2014
Abstract
The Linked Data best practices recommend publishers of triplesets to use well-known ontologies in the triplication process and to link their triplesets with other triplesets. However, despite the fact that extensive lists of open ontologies and triplesets are available, most publishers typically do not adopt those ontologies and link their triplesets only with popular ones, such as DBpedia and Geonames. This paper presents a metadata crawler for Linked Data to assist publishers in the triplification and the linkage processes. The crawler provides publishers with a list of the most suitable ontologies and vocabulary terms for triplification, as well as a list of triplesets that the new tripleset can be most likely linked with. The crawler focuses on specific metadata properties, including subclass of, and returns only metadata, hence the classification “metadata focused crawler”.
References
- Alexander, K. Cyganiak, R., Hausenblas, M., Zhao, J., 2009. Describing linked datasets - on the design and usage of void, the 'vocabulary of interlinked datasets'. Proc. Workshop on Linked Data on the Web (LDOW'09), Madrid, Spain.
- Bizer, C., Heath, T, Berners-Lee, T., 2009. Linked Data - The Story So Far, Int'l. Journal on Semantic Web and Info. Sys., 5 (3), pp. 1-22.
- Brickley, D., Guha, R.V. (eds.), 2004. RDF Vocabulary Description Language 1.0: RDF Schema. W3C Recommendation 10 February 2004.
- Ding, l., Pan, R., Finin, T., Joshi, A., Peng, y., Kolari, P., 2005. Finding and ranking knowledge on the semantic web. Proc. 4th Int'l. Conf. on the Semantic Web, Springer-Verlag, pp. 156-170.
- Fionda, V., Gutierrez, C., Pirró, G., 2012. Semantic navigation on the web of data: specification of routes, web fragments and actions. Proc. 21st Int'l. Conf. on World Wide Web, pp. 281-290.
- Isele, R., Harth, A., Umbrich, J., Bizer, C., 2010. LDspider: An open-source crawling framework for the Web of Linked Data. Proc. Int'l. Semantic Web Conf. (Posters), Shanghai, China.
- Leme, L.A.P.P., Lopes, G.R., Nunes, B.P., Casanova, M.A., Dietze, S., 2013. Identifying candidate datasets for data interlinking. Proc. 13th Int'l. Conf. on Web Engineering, Aalborg, Denmark (July 8-12, 2013), pp. 354-366.
- Lopes, G.R., Leme, L.A.P.P., Nunes, B.P., Casanova, M.A., Dietze, S., 2013. Recommending Tripleset Interlinking through a Social Network Approach. Proc. 14th Int'l. Conf. on Web Information System Engineering, Nanjing, China (Oct. 13-15, 2013), pp. 149- 161.
- Manola, F., Miller, E., 2004. RDF Primer, W3C Recommendation 10 February 2014.
- Martínez-Romero, M., Vázquez-Naya, J., Munteanu, C., Pereira, J., Pazos, A., 2010. An approach for the automatic recommendation of ontologies using collaborative knowledge. Proc. 14th Int'l. Conf. on Knowledgebased and Intelligent Information and Engineering Systems, Part II, Springer, pp. 74-81.
- Nikolov, A., d'Aquin, M., 2011. Identifying Relevant Sources for Data Linking using a Semantic Web Index. Proc. Workshop on Linked Data on the Web. Volume 813 of CEUR Workshop Proceedings, CEURWS.org.
- Nikolov, A., d'Aquin, M., Motta, E., 2012. What should I link to? Identifying relevant sources and classes for data linking. Proc. Joint Int'l. Semantic Technology Conference, pp. 284-299.
- Prud'hommeaux, E., Seaborne, A., 2008. SPARQL Query Language for RDF, W3C Recommendation 15 January 2009.
- Saint-Paul, R., Raschia, G., Mouaddib, N., 2005. General purpose database summarization. Proc. 31st Int'l. Conf. on Very Large Data Bases. VLDB Endowment, pp. 733-744.
- W3C OWL Working Group, 2012. OWL 2 Web Ontology Language Document Overview (Second Edition). W3C Recommendation 11 December 2012.
- Wang, J., Wen, J., Lochovsky, F., Ma, W., 2004. Instancebased schema matching for web databases by domainspecific query probing. Proc. 30th Int'l. Conf. on Very Large Data Bases. Vol. 30. VLDB Endowment, pp. 408-419.
- 47 umbel:MusicalComposition
- 48 schema:MusicRecording
- 49 freebase:en.Album
- 50 opencyc:Music
- 51 opencyc:Album
- 52 nerdeurocom:Album
- 53 schema:MusicAlbum
- 54 dbpedia:Sophomore_Album
- 55 dbpedia:Musician
- 56 umbel:MusicalPerformer
- 57 umbel:Rapper
- 58 dbpedia:Instrumentalist
- 59 dbpedia:BackScene
- 60 dbpedia:MusicGenre
- 61 freebase:en.Album 36 items from lastfm 2 items from twitter
- ? Terms retrieved by swget or crawler: Retrieved terms: 99 Relevant terms that were retrieved (identified by “Y” in column “MV”): 66
- ? Terms retrieved by swget: Retrieved terms: 46 Relevant terms that were retrieved (identified by rows with the pattern (Y,Y,-)): 16 Precision = 16 / 46 = 0.35 Recall = 16 / 66 = 0.24
- ? Terms retrieved by the crawler: Retrieved terms: 63 Relevant terms that were retrieved (identified by rows with the pattern (Y,-,Y)): 60 Precision = 60 /63 = 0.95 Recall = 60/66 = 0.91
Paper Citation
in Harvard Style
do Vale Amaral Gomes R., A. Casanova M., Rabello Lopes G. and André P. Paes Leme L. (2014). A Metadata Focused Crawler for Linked Data . In Proceedings of the 16th International Conference on Enterprise Information Systems - Volume 2: ICEIS, ISBN 978-989-758-028-4, pages 489-500. DOI: 10.5220/0004867904890500
in Bibtex Style
@conference{iceis14,
author={Raphael do Vale Amaral Gomes and Marco A. Casanova and Giseli Rabello Lopes and Luiz André P. Paes Leme},
title={A Metadata Focused Crawler for Linked Data},
booktitle={Proceedings of the 16th International Conference on Enterprise Information Systems - Volume 2: ICEIS,},
year={2014},
pages={489-500},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004867904890500},
isbn={978-989-758-028-4},
}
in EndNote Style
TY  - CONF 
JO  - Proceedings of the 16th International Conference on Enterprise Information Systems - Volume 2: ICEIS,
TI  - A Metadata Focused Crawler for Linked Data
SN  - 978-989-758-028-4
AU  - do Vale Amaral Gomes R. 
AU  - A. Casanova M. 
AU  - Rabello Lopes G. 
AU  - André P. Paes Leme L. 
PY  - 2014
SP  - 489
EP  - 500
DO  - 10.5220/0004867904890500