Adaptive Granulation: Data Reduction at the Database Level

Hossein Haeri, Niket Kathiriya, Cindy Chen, Kshitij Jerath

2023

Abstract

In an era where data volume is growing exponentially, effective data management techniques are more crucial than ever. Traditional methods typically manage the size of large datasets by reducing or aggregating data using a pre-specified granularity. However, these methods often face challenges in retaining vital information when dealing with large and complex datasets, especially when such datasets reside in databases. We propose a novel and innovative approach called Adaptive Granulation that addresses this issue by performing data reduction or aggregation at the database level itself. A key concern that arises in the data reduction process is the potential trade-off between the reduction of data volume and the preservation of prediction accuracy. This is particularly relevant in scenarios where the primary goal is to leverage the reduced dataset for predictive modeling. Our method employs Allan variance, originally developed for frequency stability analysis of atomic clocks, to dynamically adjust the granularity of data aggregation based on the inherent structure and characteristics of the dataset. By minimizing bias across different scales, Adaptive Granulation effectively manages trade-offs between diverse aspects of the data such as underlying patterns, noise levels, and sampling density. This paper outlines the algorithmic strategies for implementing Adaptive Granulation at the database level and assesses its performance through the reduction of the training set size for a downstream regression task on a variety of real-world and synthetic datasets. The results indicate that our method can adaptively optimize granule sizes to effectively balance data patterns, noise levels, and sample densities across the entire data space. Adaptive Granulation thus represents a significant advancement for efficient data management and reduction in the big data era.

Download


Paper Citation


in Harvard Style

Haeri H., Kathiriya N., Chen C. and Jerath K. (2023). Adaptive Granulation: Data Reduction at the Database Level. In Proceedings of the 15th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 3: KMIS; ISBN 978-989-758-671-2, SciTePress, pages 29-39. DOI: 10.5220/0012190700003598


in Bibtex Style

@conference{kmis23,
author={Hossein Haeri and Niket Kathiriya and Cindy Chen and Kshitij Jerath},
title={Adaptive Granulation: Data Reduction at the Database Level},
booktitle={Proceedings of the 15th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 3: KMIS},
year={2023},
pages={29-39},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012190700003598},
isbn={978-989-758-671-2},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 15th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 3: KMIS
TI - Adaptive Granulation: Data Reduction at the Database Level
SN - 978-989-758-671-2
AU - Haeri H.
AU - Kathiriya N.
AU - Chen C.
AU - Jerath K.
PY - 2023
SP - 29
EP - 39
DO - 10.5220/0012190700003598
PB - SciTePress