SSTR: Set Similarity Join over Stream Data

Lucas Pacífico, Leonardo Ribeiro

2020

Abstract

In modern application scenarios, large volumes of data are continuously generated over time at high speeds. Delivering timely analysis results from such massive stream of data imposes challenging requirements for current systems. Even worse, similarity matching can be needed owing to data inconsistencies, which is computationally much more expensive than simple equality comparisons. In this context, this paper presents SSTR, a novel similarity join algorithm for streams of sets. We adopt the concept of temporal similarity and exploit its properties to improve efficiency and reduce memory usage. We provide an extensive experimental study on several synthetic as well as real-world datasets. Our results show that the techniques we proposed significantly improve scalability and lead to substantial performance gains in most settings.

Download


Paper Citation


in Harvard Style

Pacífico L. and Ribeiro L. (2020). SSTR: Set Similarity Join over Stream Data.In Proceedings of the 22nd International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 978-989-758-423-7, pages 52-60. DOI: 10.5220/0009420400520060


in Bibtex Style

@conference{iceis20,
author={Lucas Pacífico and Leonardo Ribeiro},
title={SSTR: Set Similarity Join over Stream Data},
booktitle={Proceedings of the 22nd International Conference on Enterprise Information Systems - Volume 1: ICEIS,},
year={2020},
pages={52-60},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0009420400520060},
isbn={978-989-758-423-7},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 22nd International Conference on Enterprise Information Systems - Volume 1: ICEIS,
TI - SSTR: Set Similarity Join over Stream Data
SN - 978-989-758-423-7
AU - Pacífico L.
AU - Ribeiro L.
PY - 2020
SP - 52
EP - 60
DO - 10.5220/0009420400520060