queries based on previous knowledge about the data
to be queried, considering quality dimensions such
as completeness, timeliness and accuracy. The
described approach, however, does not use XML as
the canonical data model and does not address
physical algebraic query plan implementation issues.
8 CONCLUSIONS AND FUTURE
WORK
With the ubiquitous growth, availability, and usage
of data on the web, addressing data quality
requirements in connection with web queries is
emerging as a key priority for database research
(Gertz, M., Ozsu, T., Saake, G., and Sattler, K.,
2003). There are two established approaches for
addressing data quality issues relating to web data:
data warehouse-based, where relevant data is
reconciled, cleansed and warehoused prior to
querying; and mediator-based where quality metrics
and thresholds relating to cooperative web data
sources are evaluated “on the fly” at query
processing and execution time. In this paper we
illustrate the query processing extensions being
engineered into the Niagara internet query system to
support mediator-based quality aware query
processing for the completeness data quality
dimension. We are also addressing the timeliness
dimension (Sampaio, S. F. M., Dong, C., and
Sampaio, P. R. F, 2005) and extending SQL with
data quality constructs to express data quality
requirements (Dong, C., Sampaio, S. F. M., and
Sampaio, P. R. F., 2006). The data quality aware
query processing extensions encompass metadata
support, an XML-based data quality measurement
method, algebraic query processing operators, and
query plan structures of a query processing
framework aimed at helping users to identify, assess,
and filter out data regarded as of low completeness
data quality for the intended use. As future plans we
intend to incorporate accuracy data quality support
into the framework and benchmark the quality/cost
query optimiser in connection with a health care
application (Dong, C., Sampaio, S. F. M., and
Sampaio, P. R. F., 2005).
REFERENCES
Naughton, J., DeWitt, D., Maier, D., et al, 2001. The
Niagara Internet Query System. IEEE Data Eng. Bull.
24(2), 27-33.
Olson, J., 2003. Data Quality: the Accuracy Dimension,
Morgan Kauffmann. 1st edition.
http://www.rcuk.ac.uk/escience. The UK e-Science
Programme.
Wiederhold, G., 1992. Mediators in the Architecture of
Future Information Systems. IEEE Computer 25(3).
Helfert, M., and E. von, Maur, 2001. A Strategy for
Managing Data Quality in Data Warehouse Systems.
In Proc. of Information Quality, 62-76.
Wang, R., and S. E., Madnick, 1989. The Inter-Database
Instance Identification Problem in Integrating
Autonomous Systems. Proc. of ICDE, 46-55.
Wang, R. Y., Reddy, M. P., and Kon, H. B., 1995. Toward
Quality Data: An Attribute-Based Approach. Decision
Support Systems, 13(3-4), 349-372.
Sampaio, S. F. M., Dong, C., and Sampaio, P. R. F, 2005.
Incorporating the Timeliness Quality Dimension in
Internet Query Systems. WISE 2005 Workshops,
LNCS 3807, 53-62.
Dong, C., Sampaio, S. F. M., and Sampaio, P. R. F., 2006.
Expressing and Processing Timeliness Quality Aware
Queries: The DQ2L Approach. International
Workshop on Quality of Information Systems, ER
2006 Workshops, LNCS 4231, 382-392.
Naumann, F., Lesser, U., and Freytag, J., 1999. Quality-
driven Integration of Heterogeneous Information
Systems. In Proc. of the 25th VLDB, 447-458.
Mecella, M., Scannapieco, Et. Al.. The DaQuinCIS
Broker: Querying Data and Their Quality in
Cooperative Information Systems. LNCS 2800.
Dong, C., Sampaio, S. F. M., and Sampaio, P. R. F., 2005.
Building a Data Quality Aware Internet Query System
for Health Care Applications. In Proceedings of IRMA
Conference - Databases Track, San Diego, USA.
Graefe, G., 1996. Iterators, Schedulers, and Distributed-
memory Parallelism. In Software, Practice and
Experience, 26(4), 427-452.
Gertz, M., Ozsu, T., Saake, G., and Sattler, K., 2003. Data
Quality on the Web. Germany, Dagstuhl Seminar.
Pipino, L.L., Lee, Y.W. and Wang, R.Y., 2002. Data
Quality Assessment. CACM(45),4 (virtual extension).
A COMPLETENESS-AWARE DATA QUALITY PROCESSING APPROACH FOR WEB QUERIES
239