SJClust: Towards a Framework for Integrating Similarity Join Algorithms and Clustering

Leonardo Andrade Ribeiro; Alfredo Cuzzocrea; Karen Aline Alves Bezerra; Ben Hur Bahia do Nascimento

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

SJClust: Towards a Framework for Integrating Similarity Join Algorithms and Clustering

Topics: Coupling and Integrating Heterogeneous Data Sources; Performance Evaluation and Benchmarking; Query Languages and Query Processing

In Proceedings of the 18th International Conference on Enterprise Information Systems - Volume 1: ICEIS, 75-80, 2016 , Rome, Italy

Authors: Leonardo Andrade Ribeiro ¹ ; Alfredo Cuzzocrea ² ; Karen Aline Alves Bezerra ³ and Ben Hur Bahia do Nascimento ³

Affiliations: ¹ Universidade Federal de Goiás, Brazil ; ² University of Trieste and ICAR-CNR, Italy ; ³ Universidade Federal de Lavras, Brazil

Keyword(s): Data Integration, Data Cleaning, Duplicate Identification, Set Similarity Joins, Clustering.

Related Ontology Subjects/Areas/Topics: Coupling and Integrating Heterogeneous Data Sources ; Databases and Information Systems Integration ; Enterprise Information Systems ; Performance Evaluation and Benchmarking ; Query Languages and Query Processing

Abstract: A critical task in data cleaning and integration is the identification of duplicate records representing the same real-world entity. A popular approach to duplicate identification employs similarity join to find pairs of similar records followed by a clustering algorithm to group together records that refer to the same entity. However, the clustering algorithm is strictly used as a post-processing step, which slows down the overall performance and only produces results at the end of the whole process. In this paper, we propose SjClust, a framework to integrate similarity join and clustering into a single operation. Our approach allows to smoothly accommodating a variety of cluster representation and merging strategies into set similarity join algorithms, while fully leveraging state-of-the-art optimization techniques.

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 216.73.216.59

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Ribeiro, L. A., Cuzzocrea, A., Bezerra, K. A. A. and Nascimento, B. H. B. (2016). SJClust: Towards a Framework for Integrating Similarity Join Algorithms and Clustering. In Proceedings of the 18th International Conference on Enterprise Information Systems - Volume 1: ICEIS; ISBN 978-989-758-187-8; ISSN 2184-4992, SciTePress, pages 75-80. DOI: 10.5220/0005868700750080

@conference{iceis16,
author={Leonardo Andrade Ribeiro and Alfredo Cuzzocrea and Karen Aline Alves Bezerra and Ben Hur Bahia do Nascimento},
title={SJClust: Towards a Framework for Integrating Similarity Join Algorithms and Clustering},
booktitle={Proceedings of the 18th International Conference on Enterprise Information Systems - Volume 1: ICEIS},
year={2016},
pages={75-80},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005868700750080},
isbn={978-989-758-187-8},
issn={2184-4992},
}

TY - CONF

JO - Proceedings of the 18th International Conference on Enterprise Information Systems - Volume 1: ICEIS
TI - SJClust: Towards a Framework for Integrating Similarity Join Algorithms and Clustering
SN - 978-989-758-187-8
IS - 2184-4992
AU - Ribeiro, L.
AU - Cuzzocrea, A.
AU - Bezerra, K.
AU - Nascimento, B.
PY - 2016
SP - 75
EP - 80
DO - 10.5220/0005868700750080
PB - SciTePress