Adaptive Granulation: Data Reduction at the Database Level
Hossein Haeri, Niket Kathiriya, Cindy Chen, Kshitij Jerath
2023
Abstract
In an era where data volume is growing exponentially, effective data management techniques are more crucial than ever. Traditional methods typically manage the size of large datasets by reducing or aggregating data using a pre-specified granularity. However, these methods often face challenges in retaining vital information when dealing with large and complex datasets, especially when such datasets reside in databases. We propose a novel and innovative approach called Adaptive Granulation that addresses this issue by performing data reduction or aggregation at the database level itself. A key concern that arises in the data reduction process is the potential trade-off between the reduction of data volume and the preservation of prediction accuracy. This is particularly relevant in scenarios where the primary goal is to leverage the reduced dataset for predictive modeling. Our method employs Allan variance, originally developed for frequency stability analysis of atomic clocks, to dynamically adjust the granularity of data aggregation based on the inherent structure and characteristics of the dataset. By minimizing bias across different scales, Adaptive Granulation effectively manages trade-offs between diverse aspects of the data such as underlying patterns, noise levels, and sampling density. This paper outlines the algorithmic strategies for implementing Adaptive Granulation at the database level and assesses its performance through the reduction of the training set size for a downstream regression task on a variety of real-world and synthetic datasets. The results indicate that our method can adaptively optimize granule sizes to effectively balance data patterns, noise levels, and sample densities across the entire data space. Adaptive Granulation thus represents a significant advancement for efficient data management and reduction in the big data era.
DownloadPaper Citation
in Harvard Style
Haeri H., Kathiriya N., Chen C. and Jerath K. (2023). Adaptive Granulation: Data Reduction at the Database Level. In Proceedings of the 15th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 3: KMIS; ISBN 978-989-758-671-2, SciTePress, pages 29-39. DOI: 10.5220/0012190700003598
in Bibtex Style
@conference{kmis23,
author={Hossein Haeri and Niket Kathiriya and Cindy Chen and Kshitij Jerath},
title={Adaptive Granulation: Data Reduction at the Database Level},
booktitle={Proceedings of the 15th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 3: KMIS},
year={2023},
pages={29-39},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012190700003598},
isbn={978-989-758-671-2},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 15th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 3: KMIS
TI - Adaptive Granulation: Data Reduction at the Database Level
SN - 978-989-758-671-2
AU - Haeri H.
AU - Kathiriya N.
AU - Chen C.
AU - Jerath K.
PY - 2023
SP - 29
EP - 39
DO - 10.5220/0012190700003598
PB - SciTePress