Authors:
Muhammad Asif Suryani
1
;
Ewa Burwicz-Galerne
2
;
Brigitte Mathiak
1
;
Klaus Wallmann
3
and
Matthias Renz
4
Affiliations:
1
GESIS - Leibniz-Institute for the Social Sciences, 50667 Cologne, Germany
;
2
MARUM - Center for Marine Environmental Sciences, University of Bremen, 28359 Bremen, Germany
;
3
GEOMAR Helmholtz Centre for Ocean Research Kiel, 24148 Kiel, Germany
;
4
Institute of Informatik, Christian-Albrechts-Universität zu Kiel, 24118 Kiel, Germany
Keyword(s):
Information Extraction, Tabular Data, Research Data Management, Marine Science Publication, Heterogeneous Information Network, Data Modeling.
Abstract:
Scientific publications serve as a source of disseminating information across research communities, often containing diverse data elements such as plain-text, tables, and figures. Tables in particular offer a structured presentation of essential research data, enabling efficient information access. Automatic extraction of tabular data alongside contextual information from scientific publications can significantly enhance research work-flows and integrate more research data into scholarly research cycle, particularly supporting Research Data Management (RDM). In marine geology, the researchers conduct expeditions at oceanographic locations and accumulate substantial amounts of valuable data such as Sedimentation Rate (SR), Mass Accumulation Rate (MAR) alongside relevant contextual information, often enriched with spatio-temporal context in tables of publications. These expeditions are costly and time intensive, emphasizing on the value of making such data more accessible and reusable.
This paper introduces an end to end approach to extract and model heterogeneous tabular data from marine geology publications. Our approach extracts metadata and tabular content from publications, modeling them into a Heterogeneous Information Network (HIN). The network uncovers hidden relationships and patterns across multiple documents, offering new insights and facilitating enhanced data referencing. Experimental results and exploration on marine geology datasets demonstrate the effectiveness of our approach, showcasing its potential to support research data management and data driven scientific exploration.
(More)