Matching Entities from Multiple Sources with Hierarchical Agglomerative Clustering
Alieh Saeedi, Alieh Saeedi, Lucie David, Erhard Rahm, Erhard Rahm
2021
Abstract
We propose extensions to Hierarchical Agglomerative Clustering (HAC) to match and cluster entities from multiple sources that can be either duplicate-free or dirty. The proposed scheme is comparatively evaluated against standard HAC as well as other entity clustering approaches concerning efficiency and efficacy criteria. All proposed algorithms can be run in parallel on a distributed cluster to improve scalability to large data volumes. The evaluation with diverse datasets shows that the new approach can utilize duplicate-free sources and achieves better match quality than previous methods.
DownloadPaper Citation
in Harvard Style
Saeedi A., David L. and Rahm E. (2021). Matching Entities from Multiple Sources with Hierarchical Agglomerative Clustering. In Proceedings of the 13th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2021) - Volume 2: KEOD; ISBN 978-989-758-533-3, SciTePress, pages 40-50. DOI: 10.5220/0010649600003064
in Bibtex Style
@conference{keod21,
author={Alieh Saeedi and Lucie David and Erhard Rahm},
title={Matching Entities from Multiple Sources with Hierarchical Agglomerative Clustering},
booktitle={Proceedings of the 13th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2021) - Volume 2: KEOD},
year={2021},
pages={40-50},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010649600003064},
isbn={978-989-758-533-3},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 13th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2021) - Volume 2: KEOD
TI - Matching Entities from Multiple Sources with Hierarchical Agglomerative Clustering
SN - 978-989-758-533-3
AU - Saeedi A.
AU - David L.
AU - Rahm E.
PY - 2021
SP - 40
EP - 50
DO - 10.5220/0010649600003064
PB - SciTePress