validating this on a couple of other use-cases with e.g.
more detailed application profiles would be interest-
ing to see the limitations and the computational ef-
ficiency. Comparing the “FAIRness” of datasets by
utilizing or not utilizing this method could addition-
ally show the effectiveness. Furthermore, the current
similarity determination is very simple. Therefore, it
might be interesting how different similarity measures
compare against each other in the approach. Things
like checking if a label only contains certain words
or going further and determining similarity based on
a created word embedding might all be valid options
which could improve the method and ease its use. The
method could furthermore be extended to addition-
ally utilize RML mapping definitions, this was in this
work however a bit out of scope. Lastly, the current
implementation and results utilize labels which are fit-
ting based on the defined name in the HDF5 file which
could be improved by a custom mapping property.
ACKNOWLEDGEMENTS
The work was partially funded with resources granted
by Deutsche Forschungsgemeinschaft (DFG, Ger-
man Research Foundation) – Project-ID 432233186
– AIMS.
REFERENCES
AtomGraph (2019). JSON2RDF. https://github.com/
AtomGraph/JSON2RDF.
Ben De Meester, Pieter Heyvaert, and Thomas Delva
(2021). RDF Mapping Language (RML).
Das, S., Sundara, S., and Cyganiak, R. (2012). R2RML:
RDB to RDF mapping language. W3C recommenda-
tion, W3C. https://www.w3.org/TR/2012/REC-r2rml-
20120927/.
Gr
¨
onewald, M., Mund, P., Bodenbrenner, M., Fuhrmans,
M., Heinrichs, B., M
¨
uller, M. S., Pelz, P. F., Marius,
P., Preuß, N., Schmitt, R. H., and St
¨
acker, T. (in press
2021). Mit AIMS zu einem Metadatenmanagement
4.0: FAIRe Forschungsdaten ben
¨
otigen interoperable
Metadaten.
Harrow, I., Balakrishnan, R., Jimenez-Ruiz, E., Jupp, S.,
Lomax, J., Reed, J., Romacker, M., Senger, C., Splen-
diani, A., Wilson, J., and Woollard, P. (2019). On-
tology mapping for semantically enabled applications.
Drug Discovery Today, 24(10):2068–2075.
Heinrichs, B. and Politze, M. (2020). Moving Towards a
General Metadata Extraction Solution for Research
Data with State-of-the-Art Methods. 12th Interna-
tional Conference on Knowledge Discovery and In-
formation Retrieval.
Iglesias, E., Jozashoori, S., Chaves-Fraga, D., Collarana,
D., and Vidal, M.-E. (2020). SDM-RDFizer. Pro-
ceedings of the 29th ACM International Conference
on Information & Knowledge Management.
Kontokostas, D. and Knublauch, H. (2017). Shapes Con-
straint Language (SHACL). W3C recommenda-
tion, W3C. https://www.w3.org/TR/2017/REC-shacl-
20170720/.
Labra Gayo, J., Prud’hommeaux, E., Boneva, I., and Kon-
tokostas, D. (2017). Validating rdf data. Synthesis
Lectures on the Semantic Web: Theory and Technol-
ogy, 7:1–328.
Ledvinka, M. and Kremen, P. (2020). A comparison
of object-triple mapping libraries. Semantic Web,
11:483–524.
Mattmann, C. and Zitting, J. (2011). Tika in Action.
Perego, A., Beltran, A. G., Albertoni, R., Cox, S., Brown-
ing, D., and Winstanley, P. (2020). Data Catalog
Vocabulary (DCAT) - Version 2. W3C recommen-
dation, W3C. https://www.w3.org/TR/2020/REC-
vocab-dcat-2-20200204/.
Politze, M., Claus, F., Brenger, B. D., Yazdi, M. A., Hein-
richs, B., and Schwarz, A. (2020). How to Manage IT
Resources in Research Projects? Towards a Collabo-
rative Scientific Integration Environment. European
journal of higher education IT, 1(2020/1):5.
Prabhune, A., Stotzka, R., Sakharkar, V., Hesser, J., and
Gertz, M. (2018). MetaStore: an adaptive meta-
data management framework for heterogeneous meta-
data models. Distributed and Parallel Databases,
36(1):153–194.
Souza, D., Freire, F., and Freire, C. (2017). Enhanc-
ing JSON to RDF Data Conversion with Entity Type
Recognition.
TopQuadrant (2021). Importing Data using Active Data
Shapes. http://www.datashapes.org/active/import.
html#json.
Verborgh, R. and Taelman, R. (2020). Ldflex: A read/write
linked data abstraction for front-end web developers.
In Pan, J. Z., Tamma, V., d’Amato, C., Janowicz, K.,
Fu, B., Polleres, A., Seneviratne, O., and Kagal, L.,
editors, The Semantic Web – ISWC 2020, pages 193–
211, Cham. Springer International Publishing.
W3C (2012). OWL 2 Web Ontology Language Doc-
ument Overview (Second Edition). Technical re-
port, W3C. https://www.w3.org/TR/2012/REC-owl2-
overview-20121211/.
W3C (2021). ConverterToRdf - W3C Wiki. https://www.
w3.org/wiki/ConverterToRdf.
Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J. J.,
Appleton, G., Axton, M., Baak, A., Blomberg, N.,
Boiten, J.-W., da Silva Santos, L. B., Bourne, P. E.,
Bouwman, J., Brookes, A. J., Clark, T., Crosas, M.,
Dillo, I., Dumon, O., Edmunds, S., and Evelo, Chris
T. ... Mons, B. (2016). The FAIR Guiding Principles
for scientific data management and stewardship. Sci-
entific data, 3:160018.
Wood, D., Lanthaler, M., and Cyganiak, R. (2014). RDF 1.1
Concepts and Abstract Syntax. W3C recommenda-
tion, W3C. https://www.w3.org/TR/2014/REC-rdf11-
concepts-20140225/.
Automatic General Metadata Extraction and Mapping in an HDF5 Use-case
179