Fuzzy Metadata Strategies for Enhanced Data Integration

Hiba Khalid, Esteban Zimanyi, Robert Wrembel

2018

Abstract

The problem of data integration is one of the most debated issues in the general field of data management. Data Integration is typically accompanied by a concept of conflict management. The problem’s root arises from different data sources and the probability of how each data source corresponds to another. Metadata is also another important yet, highly overlooked concept in these research areas. In this paper we propose the idea of leveraging metadata as a binding source in the process of integration. The research technique relies on exploiting textual metadata from different sources by using Fuzzy logic as a coherence measure. A framework methodology has been devised for understanding the power of textual metadata. The framework operates on multiple data sources typically a data source set can contain ‘n’ number of datasets. In case of considering two data sources the sources can be titled as primary and secondary. The primary secondary source is the accepting data source and thus contains more enriched metadata. The secondary sources are the requesting sources for integration and are also guided by textual data summaries, keywords, analysis reports etc. The Fuzzy MD framework operates on finding similarities between primary and secondary metadata sources using fuzzy matching and string exploration. The model then provides the probable answer for each set’s association with the primary accepting source. The framework relies on origin of words and relative associativity rather than the common approach of manual metadata enrichment. This not only resolves the argument of manual metadata enrichment, it also provides a hidden solution for generating metadata from scratch as a part of the integration and analysis process.

Download


Paper Citation


in Harvard Style

Khalid H., Zimanyi E. and Wrembel R. (2018). Fuzzy Metadata Strategies for Enhanced Data Integration.In Proceedings of the 7th International Conference on Data Science, Technology and Applications - Volume 1: DATA, ISBN 978-989-758-318-6, pages 83-90. DOI: 10.5220/0006905200830090


in Bibtex Style

@conference{data18,
author={Hiba Khalid and Esteban Zimanyi and Robert Wrembel},
title={Fuzzy Metadata Strategies for Enhanced Data Integration},
booktitle={Proceedings of the 7th International Conference on Data Science, Technology and Applications - Volume 1: DATA,},
year={2018},
pages={83-90},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006905200830090},
isbn={978-989-758-318-6},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 7th International Conference on Data Science, Technology and Applications - Volume 1: DATA,
TI - Fuzzy Metadata Strategies for Enhanced Data Integration
SN - 978-989-758-318-6
AU - Khalid H.
AU - Zimanyi E.
AU - Wrembel R.
PY - 2018
SP - 83
EP - 90
DO - 10.5220/0006905200830090