Data Discovery and Indexing for Semi-Structured Scientific Data

Kaushik Jagini, Yifan Zhang, Yichen Guo, Julian Goddy, Dale Stansberry, Joshua Agar, Jeff Heflin

2024

Abstract

There is a need for powerful, user-friendly tools for scientific data management and discovery. We present an architecture based on DataFed and Elasticsearch that allows scientists to easily share data they produce and a novel interface that allows other scientists to easily discover data of interest. This interface supports summary-level information about a collection of datasets that can be easily refined using schema-free search. We extend the recent idea of cell-centric search to semi-structured data, describe the architecture of the system, present a use case from the context of materials science, and evaluate the efficacy of the system.

Download


Paper Citation


in Harvard Style

Jagini K., Zhang Y., Guo Y., Goddy J., Stansberry D., Agar J. and Heflin J. (2024). Data Discovery and Indexing for Semi-Structured Scientific Data. In Proceedings of the 26th International Conference on Enterprise Information Systems - Volume 2: ICEIS; ISBN 978-989-758-692-7, SciTePress, pages 264-271. DOI: 10.5220/0012706000003690


in Bibtex Style

@conference{iceis24,
author={Kaushik Jagini and Yifan Zhang and Yichen Guo and Julian Goddy and Dale Stansberry and Joshua Agar and Jeff Heflin},
title={Data Discovery and Indexing for Semi-Structured Scientific Data},
booktitle={Proceedings of the 26th International Conference on Enterprise Information Systems - Volume 2: ICEIS},
year={2024},
pages={264-271},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012706000003690},
isbn={978-989-758-692-7},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 26th International Conference on Enterprise Information Systems - Volume 2: ICEIS
TI - Data Discovery and Indexing for Semi-Structured Scientific Data
SN - 978-989-758-692-7
AU - Jagini K.
AU - Zhang Y.
AU - Guo Y.
AU - Goddy J.
AU - Stansberry D.
AU - Agar J.
AU - Heflin J.
PY - 2024
SP - 264
EP - 271
DO - 10.5220/0012706000003690
PB - SciTePress