Schema Matching with Frequent Changes on Semi-Structured Input Files: A Machine Learning Approach on Biological Product Data

Oliver Schmidts, Bodo Kraft, Ines Siebigteroth, Albert Zündorf

Abstract

For small to medium sized enterprises matching schemas is still a time consuming manual task. Even expensive commercial solutions perform poorly, if the context is not suitable for the product. In this paper, we provide an approach based on concept name learning from known transformations to discover correspondences between two schemas. We solve schema matching as a classification task. Additionally, we provide a named entity recognition approach to analyze, how the classification task relates to named entity recognition. Benchmarking against other machine learning models shows that when choosing a good learning model, schema matching based on concept name similarity can outperform other approaches and complex algorithms in terms of precision and F1-measure. Hence, our approach is able to build the foundation for improved automation of complex data integration applications for small to medium sized enterprises.

Download


Paper Citation


in Harvard Style

Schmidts O., Kraft B., Siebigteroth I. and Zündorf A. (2019). Schema Matching with Frequent Changes on Semi-Structured Input Files: A Machine Learning Approach on Biological Product Data.In Proceedings of the 21st International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 978-989-758-372-8, pages 208-215. DOI: 10.5220/0007723602080215


in Bibtex Style

@conference{iceis19,
author={Oliver Schmidts and Bodo Kraft and Ines Siebigteroth and Albert Zündorf},
title={Schema Matching with Frequent Changes on Semi-Structured Input Files: A Machine Learning Approach on Biological Product Data},
booktitle={Proceedings of the 21st International Conference on Enterprise Information Systems - Volume 1: ICEIS,},
year={2019},
pages={208-215},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0007723602080215},
isbn={978-989-758-372-8},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 21st International Conference on Enterprise Information Systems - Volume 1: ICEIS,
TI - Schema Matching with Frequent Changes on Semi-Structured Input Files: A Machine Learning Approach on Biological Product Data
SN - 978-989-758-372-8
AU - Schmidts O.
AU - Kraft B.
AU - Siebigteroth I.
AU - Zündorf A.
PY - 2019
SP - 208
EP - 215
DO - 10.5220/0007723602080215