Extracting and Modeling Tabular Data from Marine Geology Publications into a Heterogeneous Information Network
Muhammad Asif Suryani, Ewa Burwicz-Galerne, Brigitte Mathiak, Klaus Wallmann, Matthias Renz
2025
Abstract
Scientific publications serve as a source of disseminating information across research communities, often containing diverse data elements such as plain-text, tables, and figures. Tables in particular offer a structured presentation of essential research data, enabling efficient information access. Automatic extraction of tabular data alongside contextual information from scientific publications can significantly enhance research work-flows and integrate more research data into scholarly research cycle, particularly supporting Research Data Management (RDM). In marine geology, the researchers conduct expeditions at oceanographic locations and accumulate substantial amounts of valuable data such as Sedimentation Rate (SR), Mass Accumulation Rate (MAR) alongside relevant contextual information, often enriched with spatio-temporal context in tables of publications. These expeditions are costly and time intensive, emphasizing on the value of making such data more accessible and reusable. This paper introduces an end to end approach to extract and model heterogeneous tabular data from marine geology publications. Our approach extracts metadata and tabular content from publications, modeling them into a Heterogeneous Information Network (HIN). The network uncovers hidden relationships and patterns across multiple documents, offering new insights and facilitating enhanced data referencing. Experimental results and exploration on marine geology datasets demonstrate the effectiveness of our approach, showcasing its potential to support research data management and data driven scientific exploration.
DownloadPaper Citation
in Harvard Style
Suryani M., Burwicz-Galerne E., Mathiak B., Wallmann K. and Renz M. (2025). Extracting and Modeling Tabular Data from Marine Geology Publications into a Heterogeneous Information Network. In Proceedings of the 14th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM; ISBN 978-989-758-730-6, SciTePress, pages 453-460. DOI: 10.5220/0013389900003905
in Bibtex Style
@conference{icpram25,
author={Muhammad Suryani and Ewa Burwicz-Galerne and Brigitte Mathiak and Klaus Wallmann and Matthias Renz},
title={Extracting and Modeling Tabular Data from Marine Geology Publications into a Heterogeneous Information Network},
booktitle={Proceedings of the 14th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM},
year={2025},
pages={453-460},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013389900003905},
isbn={978-989-758-730-6},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 14th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM
TI - Extracting and Modeling Tabular Data from Marine Geology Publications into a Heterogeneous Information Network
SN - 978-989-758-730-6
AU - Suryani M.
AU - Burwicz-Galerne E.
AU - Mathiak B.
AU - Wallmann K.
AU - Renz M.
PY - 2025
SP - 453
EP - 460
DO - 10.5220/0013389900003905
PB - SciTePress