RELATIONAL SAMPLING FOR DATA QUALITY AUDITING AND DECISION SUPPORT
Bruno Cortes, José Nuno Oliveira
2004
Abstract
This paper presents a strategy for applying sampling techniques to relational databases, in the context of data quality auditing or decision support processes. Fuzzy cluster sampling is used to survey sets of records for correctness of business rules. Relational algebra estimators are presented as a data quality-auditing tool.
References
- T. Andreasen, H. Christiansen and H. Larsen, “Flexible Query Answering Systems”, ISBN 0-7923-8001-0, Kluwer Academic Publishers, 1997
- J. Bisbal and J. Grimson, “Generalising the Consistent Database Sampling Process”. ISAS Conference, 2000
- Bruno Cortes, “Amostragem Relacional”, MsC. Thesis, University of Minho, 2002
- P. Hass, J. Naughton et al., “Sampling-Based Estimation of the Number of Distinct Values of an Attribute”, 21st VLDB Conference, 1995
- Peter Haas and Arun Swami, “Sequential Sampling Procedures for Query Size Optimization”, ACM SIGMOD Conference, 1992
- L. Kaufman and P. Rousseeuw, “Finding Groups in Data - An Introduction to Cluster Analysis”, Wiley & Sons, Inc, 1990
- R. Lipton, J. Naughton et al., “Practical Selectivity Estimation through Adaptative Sampling”, ACM SIGMOD Conference, 1990
- F. Neves, J. Oliveira et al., “Converting Informal Metadata to VDM-SL: A Reverse Calculation Approach”, VDM workshop FM'99, France, 1999.
- José N. Oliveira, “SETS - A Data Structuring Calculus and Its Application to Program Development”, UNU/IIST, 1997
- Frank Olken, “Random Sampling from Databases”, PhD thesis, University of California, 1993
- J. Ranito, L. Henriques, L. Ferreira, F. Neves, J. Oliveira. “Data Quality: Do It Formally?” Proceedings of IASTED-SE'98, Las Vegas, USA, 1998.
- A. Shlosser, “On estimation of the size of the dictionary of a long text on the basis of sample”, Engineering Cybernetics 19, pp. 97-102, 1981
- Sun, Ling et at., “An Instant Accurate Size Estimation Method for Joins and Selection in a Retrieval-Intense Environment”, ACM SIGMOD Conference, 1993
- Hannu Toivonen, “Sampling Large Databases for Association Rules”, 22nd VLDB Conference, 1996
Paper Citation
in Harvard Style
Cortes B. and Nuno Oliveira J. (2004). RELATIONAL SAMPLING FOR DATA QUALITY AUDITING AND DECISION SUPPORT . In Proceedings of the Sixth International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 972-8865-00-7, pages 376-382. DOI: 10.5220/0002630403760382
in Bibtex Style
@conference{iceis04,
author={Bruno Cortes and José Nuno Oliveira},
title={RELATIONAL SAMPLING FOR DATA QUALITY AUDITING AND DECISION SUPPORT},
booktitle={Proceedings of the Sixth International Conference on Enterprise Information Systems - Volume 1: ICEIS,},
year={2004},
pages={376-382},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002630403760382},
isbn={972-8865-00-7},
}
in EndNote Style
TY - CONF
JO - Proceedings of the Sixth International Conference on Enterprise Information Systems - Volume 1: ICEIS,
TI - RELATIONAL SAMPLING FOR DATA QUALITY AUDITING AND DECISION SUPPORT
SN - 972-8865-00-7
AU - Cortes B.
AU - Nuno Oliveira J.
PY - 2004
SP - 376
EP - 382
DO - 10.5220/0002630403760382