DSP generation, such as Abele (2016) and Assaf et
al. (2015), an approach which provides more
detailed information about datasets, including
descriptive, structural, and quality metadata is not
found. In addition, some of them do not use
vocabulary terms associated to the metadata
provided by the profile. This allows to assign more
meaning and a representation of the metadata which
facilitates its consumption.
6 CONCLUSIONS
In this work, we have presented an approach for the
generation of semantically enriched Dataset Profiles.
To help matters, a DSP composed of descriptive,
structural and quality metadata is proposed. During
the DSP generation process, some metadata are
extracted from the datasets, and, additionally, the
dataset domain is identified and domain
vocabularies are suggested. Furthermore, the process
includes the generation of structural metadata and
quality metadata, which proposes two IQ criteria to
be measured as relevant and additional information.
The main idea of providing enriched DSPs is to
facilitate the communication between dataset
publishers and consumers (humans and machines).
In order to evaluate the proposed approach, a
prototype has been implemented. It provides an
automatic DSP generation process. The tool assists
data producers who wish to make DSPs available to
certain datasets. Dataset consumers can also
generate a DSP, without the need of prior knowledge
about the data.
The experiments used datasets from different
knowledge domains. They demonstrated that the
proposed strategy produces good results, by
allowing the generation of new metadata.
Improvements were also observed with respect to
the quality of the datasets after the DSP generation.
As future works, we consider to include user
feedback and other IQ criteria (e.g., completeness,
correctness), to link the approach to an existing
dataset catalog, and also to include in the DSP the
recommendation of vocabularies for each identified
structural metadata. New experiments with expert
users and datasets belonging to a wider range of
domains will also be accomplished.
REFERENCES
Abele, A., 2016. Linked Data Profiling: Identifying the
Domain of Datasets Based on Data Content and
Metadata, In: 25th International Conference
Companion on World Wide Web. Canada, p. 287-291.
Assaf, A., Senart, A., Troncy, R., 2016. An Objective
Assessment Framework & Tool for Linked Data:
Enriching Dataset Profiles with Quality Indicators, In:
IJSWIS, International Journal on Semantic Web and
Information Systems, Special Issue on Dataset
Profiling and Federated Search for Linked Data, Vol.
12, N°3, 2016, ISSN: 1552-6283
Assaf, A., Troncy, R., Senart, A., 2015. Roomba: An
extensible framework to validate and build dataset
profiles, In: 24th International Conference on World
Wide Web, Italy, p. 159-162.
Baeza-Yates, R., Ribeiro-Neto, B., 1999. Modern
Information Retrieval. Addison-Wesley, First Edition.
Clarke, M., Harley, P., 2014. How smart is your content?
Using semantic enrichment to improve your user
experience and your bottom line, Science Editor, Vol.
37, N° 2, p. 40–44.
Ellefi, M. B., Bellahsene, Z., Scharffe, F., Todorov, K.,
2014. Towards Semantic Dataset Profiling In:
International Workshop on Dataset Profiling &
Federated Search for Linked Data co-located with the
11th Extended Semantic Web Conference. Greece.
Ellefi, M. B., Bellahsene, Z., Todorov, K., 2015.
Datavore: a vocabulary recommender tool assisting
Linked Data modeling, In: 14th International
Semantic Web Conference, Posters and
Demonstrations Track, United States.
Flemming, A. (2011). Quality Characteristics of Linked
Data Publishing Datasources. Master's Thesis,
Humboldt-Universität zu Berlin, Institut für
Informatik.
Heath T., Bizer C., 2011. Linked Data: Evolving the Web
into a Global Data Space, 1st edition. Synthesis
Lectures on the Semantic Web: Theory and
Technology, 1:1, 1-136. Morgan & Claypool.
Kaggle platform, 2018. Available at https://www.
kaggle.com. Last access on june, 20
th
.
Lalithsena, S., Hitzler, P., Sheth, A. P., Jain, P., 2013.
Automatic Domain Identification for Linked Open
Data. In: IEEE/WIC/ACM International Joint
Conferences on Web Intelligence and Intelligent Agent
Technologies, United States, p. 205-212.
Lóscio, B. F., Burle, C., Calegari, N., 2017. Data on the
web best practices. W3C, Version: https://www.
w3.org/TR/2017/REC-dwbp-20170131/ Last Acess:
march 20, 2018.
LOV, 2018. Linked Open Vocabulary Repository.
Available at https://lov.okfn.org/dataset/lov/. Last
access on June 20
th
.
Naumann, F., Rolker, C., 2000. Assessment methods for
information quality criteria In: IQ, 5th Conference on
International Quality. United States, p. 148-162.
Ouksili, H., Kedad, Z., Lopes, S., 2014. Theme
Identification in RDF Graphs, In: MEDI, International
Conference on Model and Data Engineering. Cyprus,
p. 321-329.
Pipino, L. L., Lee, Y. W., Wang, R. Y. (2002) Data
Quality Assessment. In: Communications of the ACM