Metadata Management for Textual Documents in Data Lakes

Pegdwendé N. Sawadogo; Tokio Kibata; Jérôme Darmont

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

Metadata Management for Textual Documents in Data Lakes

Topics: Coupling and Integrating Heterogeneous Data Sources; Data Mining; Data Warehouses and OLAP; Databases in the Cloud; Non-Relational Databases

In Proceedings of the 21st International Conference on Enterprise Information Systems - Volume 1: ICEIS, 72-83, 2019 , Heraklion, Crete, Greece

Authors: Pegdwendé N. Sawadogo ¹ ; Tokio Kibata ² and Jérôme Darmont ¹

Affiliations: ¹ Université de Lyon, Lyon 2, ERIC EA 3083, 5 avenue Pierre Mendès France, F69676, Bron and France ; ² Université de Lyon, Ecole Centrale de Lyon, 36 avenue Guy de Collongue, F69134, Ecully and France

Keyword(s): Data Lakes, Textual Documents, Metadata Management, Data Ponds.

Related Ontology Subjects/Areas/Topics: Artificial Intelligence ; Coupling and Integrating Heterogeneous Data Sources ; Data Mining ; Data Warehouses and OLAP ; Databases and Information Systems Integration ; Enterprise Information Systems ; Non-Relational Databases ; Sensor Networks ; Signal Processing ; Soft Computing

Abstract: Data lakes have emerged as an alternative to data warehouses for the storage, exploration and analysis of big data. In a data lake, data are stored in a raw state and bear no explicit schema. Thence, an efficient metadata system is essential to avoid the data lake turning to a so-called data swamp. Existing works about managing data lake metadata mostly focus on structured and semi-structured data, with little research on unstructured data. Thus, we propose in this paper a methodological approach to build and manage a metadata system that is specific to textual documents in data lakes. First, we make an inventory of usual and meaningful metadata to extract. Then, we apply some specific techniques from the text mining and information retrieval domains to extract, store and reuse these metadata within the COREL research project, in order to validate our proposals.

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 216.73.216.108

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Sawadogo, P. N., Kibata, T. and Darmont, J. (2019). Metadata Management for Textual Documents in Data Lakes. In Proceedings of the 21st International Conference on Enterprise Information Systems - Volume 1: ICEIS; ISBN 978-989-758-372-8; ISSN 2184-4984, SciTePress, pages 72-83. DOI: 10.5220/0007706300720083

@conference{iceis19,
author={Pegdwendé N. Sawadogo and Tokio Kibata and Jérôme Darmont},
title={Metadata Management for Textual Documents in Data Lakes},
booktitle={Proceedings of the 21st International Conference on Enterprise Information Systems - Volume 1: ICEIS},
year={2019},
pages={72-83},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0007706300720083},
isbn={978-989-758-372-8},
issn={2184-4984},
}

TY - CONF

JO - Proceedings of the 21st International Conference on Enterprise Information Systems - Volume 1: ICEIS
TI - Metadata Management for Textual Documents in Data Lakes
SN - 978-989-758-372-8
IS - 2184-4984
AU - Sawadogo, P.
AU - Kibata, T.
AU - Darmont, J.
PY - 2019
SP - 72
EP - 83
DO - 10.5220/0007706300720083
PB - SciTePress