For many years the importance of data quality
has been ignored when designing and developing
databases in which organizations store their data.
Our proposal tries to integrate data quality notions
inside a DB development methodology in order to
open a new research work that fulfill this blank.
On the other hand, new technologies related to
XML have spread so widely due to the success of
Service Oriented Architectures that XML have
became the standard de facto to data exchange
among agents. This situation has provoked that new
approaches to semi-structured data storage optimiza-
tion have arisen. Inside this field, XML DBs have
been created with the goal of improving massive
storage of XML documents.
Our research work is centered in developing new
strategies for data quality treatment during XML
DBs development phase. To reach this target, we
have based on some Key Area Processes from the
CALDEA framework to define a methodology that
considers data quality as a basic aspect during the
creation of a XML DB.
The explained approach treats aspects related to
user quality requirements management, data source
quality assessment, data quality management during
the XML DB design phase and measure of different
characteristics of data stored in a XML DB.
ACKNOWLEDGEMENTS
This research is part of the FAMOSO and ESFINGE
projects supported by the Dirección General de
Investigación of the Spanish Ministerio de Ciencia y
Tecnología (Ministry of Science and
Technology)(TIC2003-07804-C05-03).
REFERENCES
Bray, T., Paoli, J. & Sperberg-McQueen, C. M., 1998.
Extensible Markup Language (XML) 1.0. W3C
Recommendation.
Caballero, I. & Piattini, M., 2007. Assessment and
Improvement of Data and Information Quality. IN AL-
HAKIM, L. (Ed.) Information Quality Management:
Theory and Applications. Hershey, PA, USA, Idea
Group Publishing.
English, L., 1999. Improving Data Warehouse and
Business Information Quality: Methods for reducing
costs and increasing Profits, New York, NY, USA,
Willey & Sons.
Eppler, M., 2001. A Generic Framework for Information
Quality in Knowledge-Intensive Processes. In
Proceeding of the Sixth International Conference on
Information Quality.
Fuggeta, A., 2000. Software Process: A Road Map. . In
FINKELSTEIN, A. (Ed.) In Twenty-Second
International Conference on Software Engineering
(ICSE'2000). Limerick, Ireland, ACM Press.
García, F., Bertoa, M. F., Calero, C., Vallecillo, A., Ruiz,
F., Piattini, M. & Genero, M., 2005. Toward a
consistent terminology for software measurement.
Information and Software Technology, 48, 631-644.
Huang, K. T., Lee, Y. W. & Wang, R. Y., 1999. Quality
Information and Knowledge, Upper Saddle River, NJ,
USA, Prentice-Hall.
Lee, Y. W., Pipino, L. L., Funk, J. D. & Wang, R. Y.,
2006. Journey to Data Quality, Cambridge, MA,
USA, Massachussets Institute of Technology.
Levitin, A. & Redman, T., 1995. Quality Dimensions of a
Conceptual View. Information Processing and
Management, 31(1), 81-88.
Marcos, E., Vela, B. & Cavero, J. M., 2001. Extending
UML for Object-Relational Database Design. In
Fourth Int. Conference on the Unified Modeling
Language, UML 2001. Toronto (Canada), Springer-
Verlag.
OASIS, 2006. ISO/IEC 26300:2006 Information
technology -- Open Document Format for Office
Applications (OpenDocument) v1.0. International
Organization for Standardization.
Redman, T. C., 1996. Data Quality for the Information
Age, Boston, MA, USA, Artech House Publishers.
Strong, D., Lee, Y. & Wang, R., 1997. Data Quality in
Context. Communications of the ACM, Vol. 40, Nº 5,
103 -110.
Verbo, E., Caballero, I. & Piattini, M., 2007. DQXSD: An
XML Schema for Data Quality. Paper accepted for the
9th International Conference on Enterprise
Information Systems (ICEIS). Funchal, Madeira -
Portugal.
Wang, R. Y., Reddy, M. P. & Kon, H. B., 1995. Toward
quality data: An attribute-based approach. Decision
Support Systems.
ICSOFT 2007 - International Conference on Software and Data Technologies
122