loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Paper Unlock

Authors: Taoxin Peng 1 and Calum Mackay 2

Affiliations: 1 Edinburgh Napier University, United Kingdom ; 2 KANA, United Kingdom

Keyword(s): String Matching, Data Quality, Record Matching, Record Linkage, Data Warehousing.

Related Ontology Subjects/Areas/Topics: Coupling and Integrating Heterogeneous Data Sources ; Data Warehouses and OLAP ; Databases and Information Systems Integration ; Enterprise Application Integration ; Enterprise Information Systems ; Performance Evaluation and Benchmarking

Abstract: Data quality is a key to success for all kinds of businesses that have information applications involved, such as data integration for data warehouses, text and web mining, information retrieval, search engine for web applications, etc. In such applications, matching strings is one of the popular tasks. There are a number of approximate string matching techniques available. However, there is still a problem that remains unanswered: for a given dataset, how to select an appropriate technique and a threshold value required by this technique for the purpose of string matching. To challenge this problem, this paper analyses and evaluates a set of popular token-based string matching techniques on several carefully designed different datasets. A thorough experimental comparison confirms the statement that there is no clear overall best technique. However, some techniques do perform significantly better in some cases. Some suggestions have been presented, which can be used as guidance for r esearchers and practitioners to select an appropriate string matching technique and a corresponding threshold value for a given dataset. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 3.14.70.203

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Peng, T. and Mackay, C. (2014). Approximate String Matching Techniques. In Proceedings of the 16th International Conference on Enterprise Information Systems - Volume 3: ICEIS; ISBN 978-989-758-027-7; ISSN 2184-4992, SciTePress, pages 217-224. DOI: 10.5220/0004892802170224

@conference{iceis14,
author={Taoxin Peng. and Calum Mackay.},
title={Approximate String Matching Techniques},
booktitle={Proceedings of the 16th International Conference on Enterprise Information Systems - Volume 3: ICEIS},
year={2014},
pages={217-224},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004892802170224},
isbn={978-989-758-027-7},
issn={2184-4992},
}

TY - CONF

JO - Proceedings of the 16th International Conference on Enterprise Information Systems - Volume 3: ICEIS
TI - Approximate String Matching Techniques
SN - 978-989-758-027-7
IS - 2184-4992
AU - Peng, T.
AU - Mackay, C.
PY - 2014
SP - 217
EP - 224
DO - 10.5220/0004892802170224
PB - SciTePress