SSTR: Set Similarity Join over Stream Data
Lucas Pacífico, Leonardo Ribeiro
2020
Abstract
In modern application scenarios, large volumes of data are continuously generated over time at high speeds. Delivering timely analysis results from such massive stream of data imposes challenging requirements for current systems. Even worse, similarity matching can be needed owing to data inconsistencies, which is computationally much more expensive than simple equality comparisons. In this context, this paper presents SSTR, a novel similarity join algorithm for streams of sets. We adopt the concept of temporal similarity and exploit its properties to improve efficiency and reduce memory usage. We provide an extensive experimental study on several synthetic as well as real-world datasets. Our results show that the techniques we proposed significantly improve scalability and lead to substantial performance gains in most settings.
DownloadPaper Citation
in Harvard Style
Pacífico L. and Ribeiro L. (2020). SSTR: Set Similarity Join over Stream Data.In Proceedings of the 22nd International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 978-989-758-423-7, pages 52-60. DOI: 10.5220/0009420400520060
in Bibtex Style
@conference{iceis20,
author={Lucas Pacífico and Leonardo Ribeiro},
title={SSTR: Set Similarity Join over Stream Data},
booktitle={Proceedings of the 22nd International Conference on Enterprise Information Systems - Volume 1: ICEIS,},
year={2020},
pages={52-60},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0009420400520060},
isbn={978-989-758-423-7},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 22nd International Conference on Enterprise Information Systems - Volume 1: ICEIS,
TI - SSTR: Set Similarity Join over Stream Data
SN - 978-989-758-423-7
AU - Pacífico L.
AU - Ribeiro L.
PY - 2020
SP - 52
EP - 60
DO - 10.5220/0009420400520060