Authors:
Sandra de F. Mendes Sampaio
1
and
Pedro R. Falcone Sampaio
2
Affiliations:
1
School of Computer Science, University of Manchester, United Kingdom
;
2
Manchester Business School, University of Manchester, United Kingdom
Keyword(s):
Data Quality, Internet Query Systems, Completeness, Query Processing.
Related
Ontology
Subjects/Areas/Topics:
Data Engineering
;
Data Management and Quality
;
Information Quality
Abstract:
Internet Query Systems (IQS) are information systems used to query the World Wide Web by finding data sources relevant to a given query and retrieving data from the identified data sources. They differ from traditional database management systems in that data to be processed need to be found by a search engine, fetched from remote data sources and processed taking into account issues such as the unpredictability of access and transfer rates, infinite streams of data, and the ability to produce partial results. Despite the powerful query functionality provided by internet query systems when compared to traditional search engines, their uptake has been slow partly due to the difficulty of assessing and filtering low quality data resulting from internet queries. In this paper we investigate how an internet query system can be extended to support data quality aware query processing. In particular, we illustrate the metadata support, XML-based data quality measurement method, algebraic qu
ery processing operators, and query plan structures of a query processing framework aimed at helping users to identify, assess, and filter out data regarded as of low completeness data quality for the intended use.
(More)