A COMPLETENESS-AWARE DATA QUALITY PROCESSING APPROACH FOR WEB QUERIES

Sandra de F. Mendes Sampaio, Pedro R. Falcone Sampaio

2008

Abstract

Internet Query Systems (IQS) are information systems used to query the World Wide Web by finding data sources relevant to a given query and retrieving data from the identified data sources. They differ from traditional database management systems in that data to be processed need to be found by a search engine, fetched from remote data sources and processed taking into account issues such as the unpredictability of access and transfer rates, infinite streams of data, and the ability to produce partial results. Despite the powerful query functionality provided by internet query systems when compared to traditional search engines, their uptake has been slow partly due to the difficulty of assessing and filtering low quality data resulting from internet queries. In this paper we investigate how an internet query system can be extended to support data quality aware query processing. In particular, we illustrate the metadata support, XML-based data quality measurement method, algebraic query processing operators, and query plan structures of a query processing framework aimed at helping users to identify, assess, and filter out data regarded as of low completeness data quality for the intended use.

References

  1. Naughton, J., DeWitt, D., Maier, D., et al, 2001. The Niagara Internet Query System. IEEE Data Eng. Bull. 24(2), 27-33.
  2. Olson, J., 2003. Data Quality: the Accuracy Dimension, Morgan Kauffmann. 1st edition.
  3. Wiederhold, G., 1992. Mediators in the Architecture of Future Information Systems. IEEE Computer 25(3).
  4. Helfert, M., and E. von, Maur, 2001. A Strategy for Managing Data Quality in Data Warehouse Systems. In Proc. of Information Quality, 62-76.
  5. Wang, R., and S. E., Madnick, 1989. The Inter-Database Instance Identification Problem in Integrating Autonomous Systems. Proc. of ICDE, 46-55.
  6. Wang, R. Y., Reddy, M. P., and Kon, H. B., 1995. Toward Quality Data: An Attribute-Based Approach. Decision Support Systems, 13(3-4), 349-372.
  7. Sampaio, S. F. M., Dong, C., and Sampaio, P. R. F, 2005. Incorporating the Timeliness Quality Dimension in Internet Query Systems. WISE 2005 Workshops, LNCS 3807, 53-62.
  8. Dong, C., Sampaio, S. F. M., and Sampaio, P. R. F., 2006. Expressing and Processing Timeliness Quality Aware Queries: The DQ2L Approach. International Workshop on Quality of Information Systems, ER 2006 Workshops, LNCS 4231, 382-392.
  9. Naumann, F., Lesser, U., and Freytag, J., 1999. Qualitydriven Integration of Heterogeneous Information Systems. In Proc. of the 25th VLDB, 447-458.
  10. Mecella, M., Scannapieco, Et. Al.. The DaQuinCIS Broker: Querying Data and Their Quality in Cooperative Information Systems. LNCS 2800.
  11. Dong, C., Sampaio, S. F. M., and Sampaio, P. R. F., 2005. Building a Data Quality Aware Internet Query System for Health Care Applications. In Proceedings of IRMA Conference - Databases Track, San Diego, USA.
  12. Graefe, G., 1996. Iterators, Schedulers, and Distributedmemory Parallelism. In Software, Practice and Experience, 26(4), 427-452.
  13. Gertz, M., Ozsu, T., Saake, G., and Sattler, K., 2003. Data Quality on the Web. Germany, Dagstuhl Seminar.
  14. Pipino, L.L., Lee, Y.W. and Wang, R.Y., 2002. Data Quality Assessment. CACM(45),4 (virtual extension).
Download


Paper Citation


in Harvard Style

de F. Mendes Sampaio S. and R. Falcone Sampaio P. (2008). A COMPLETENESS-AWARE DATA QUALITY PROCESSING APPROACH FOR WEB QUERIES . In Proceedings of the Third International Conference on Software and Data Technologies - Volume 3: ICSOFT, ISBN 978-989-8111-53-1, pages 234-239. DOI: 10.5220/0001894802340239


in Bibtex Style

@conference{icsoft08,
author={Sandra de F. Mendes Sampaio and Pedro R. Falcone Sampaio},
title={A COMPLETENESS-AWARE DATA QUALITY PROCESSING APPROACH FOR WEB QUERIES},
booktitle={Proceedings of the Third International Conference on Software and Data Technologies - Volume 3: ICSOFT,},
year={2008},
pages={234-239},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001894802340239},
isbn={978-989-8111-53-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Third International Conference on Software and Data Technologies - Volume 3: ICSOFT,
TI - A COMPLETENESS-AWARE DATA QUALITY PROCESSING APPROACH FOR WEB QUERIES
SN - 978-989-8111-53-1
AU - de F. Mendes Sampaio S.
AU - R. Falcone Sampaio P.
PY - 2008
SP - 234
EP - 239
DO - 10.5220/0001894802340239