loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Authors: Leonard Traeger 1 ; 2 ; Andreas Behrend 2 and George Karabatis 1

Affiliations: 1 Department of Information Systems, University of Maryland, Baltimore County, U.S.A. ; 2 Institute of Computer and Communication Technology (ICCT), Technical University of Cologne, Germany

Keyword(s): Data Cleaning and Integration, Entity Resolution, Entity Linkage, Data Quality, Deep Learning.

Abstract: Linking multiple entities to a real-world object is a time-consuming and error-prone task. Entity Resolution (ER) includes techniques for vectorizing entities (signature), grouping similar entities into partitions (blocking), and matching entity pairs based on specified similarity thresholds (filtering). This paper introduces scoping as a new and integral phase in multi-sourced ER with potentially increased heterogeneity and more unlinkable entities. Scoping reduces the space of candidate entity pairs by ranking, detecting, and removing unlinkable entities through outlier algorithms and reusable self-supervised autoencoders, leaving intact the set of true linkages. Evaluations on multi-sourced schemas show that autoencoders perform best in schemas relevant to each other, where they reduce entity collections to 77% and still contain all linkages.

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 18.191.132.38

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Traeger, L., Behrend, A. and Karabatis, G. (2024). Scoping: Towards Streamlined Entity Collections for Multi-Sourced Entity Resolution with Self-Supervised Agents. In Proceedings of the 26th International Conference on Enterprise Information Systems - Volume 1: ICEIS; ISBN 978-989-758-692-7; ISSN 2184-4992, SciTePress, pages 107-115. DOI: 10.5220/0012607500003690

@conference{iceis24,
author={Leonard Traeger and Andreas Behrend and George Karabatis},
title={Scoping: Towards Streamlined Entity Collections for Multi-Sourced Entity Resolution with Self-Supervised Agents},
booktitle={Proceedings of the 26th International Conference on Enterprise Information Systems - Volume 1: ICEIS},
year={2024},
pages={107-115},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012607500003690},
isbn={978-989-758-692-7},
issn={2184-4992},
}

TY - CONF

JO - Proceedings of the 26th International Conference on Enterprise Information Systems - Volume 1: ICEIS
TI - Scoping: Towards Streamlined Entity Collections for Multi-Sourced Entity Resolution with Self-Supervised Agents
SN - 978-989-758-692-7
IS - 2184-4992
AU - Traeger, L.
AU - Behrend, A.
AU - Karabatis, G.
PY - 2024
SP - 107
EP - 115
DO - 10.5220/0012607500003690
PB - SciTePress