because the quality information embedded in cell
elements of the DQXML is divided in three tables in
the database. If we put together the values of every
table of the schema, we get the same result, 4.
For the measure number of attributes the result is
different as well. The explanation is that in the
database, the storage medium does not differentiate
between quality and raw data while DQXSD treats
quality data adding semantic value that it did not
have when stored in a database.
Table 6. Measurements results
The results of the measures NEE and NEA
shows that the DQXML has high quality because it
has no empty elements or attributes that waste
bandwidth. Lastly, NN and NArc with their low
values indicates that the DQXML has no excessive
complexity, statement confirmed by
SC
XML
.
4 CONCLUSIONS AND FUTURE
WORKS
Traditionally, data quality has been only applied to
data stored in databases as being raw data for
manufacturing data products. This approach is
clearly out of date because data exchanging is
continuously getting more important in parallel to
the consolidation of Service Oriented Architectures.
Static data quality issues must also be
propagated when transmitted. To give the necessary
support to this goal, we define a new document
structure, DQXSD based on the most important
technology for information exchanging, XML. To
define it, XML Schema is used.
DQXSD helps to capture quality data stored in a
database schema and translate it to a proper format
ready to be transmitted.
To prove the data quality preservation through
that process, several measures for DQXML
documents have been developed and compared to
database equivalents getting satisfactory results.
Although the results presented in this paper are
oriented to capture quality data stored in relational
databases, DQXSD could be easily adapted to other
storage models due to the flexibility of the
technologies used for its definition.
ACKNOWLEDGMENTS
This research is part of the FAMOSO and ESFINGE
projects supported by the Dirección General de
Investigación of the Spanish Ministerio de Ciencia y
Tecnología (Ministry of Science and
Technology)(TIC2003-07804-C05-03).
REFERENCES
Calero, C., Piattini, M. & Genero, M., 2001. Metrics for
controlling Databases Complexity, Becker, S.
Díaz, E., 2003. Herramienta para la gestión de métricas
en documentos XML. Departamento de Tecnologías y
Sistemas de Información, Escuela Superior de
Informática de Ciudad Real, Universidad de Castilla-
La Mancha.
Fran, W. & Simeon, J., 2003. Integrity constraints for
XML. Journal of Computer and System Sciences.
García, F., Bertoa, M. F., Calero, C., Vallecillo, A., Ruiz,
F., Piattini, M. & Genero, M., 2005. Toward a
consistent terminology for software measurement.
Information and Software Technology, 48, 631-644.
Ivan, I., Parlog, O., Oprea, P., Nosca, G. & Ivan, A.-A.,
1998. Data Metrics. In IQ 1998, Conference on
Information Quality.
Klettke, Sheneider, M. L. & Heuer, A., 2002. Metrics for
XML Document Collections. Database Research
Group, University of Rostock, Germany.
Lee, Y. W., Pipino, L. L., Funk, J. D. & Wang, R. Y.,
2006. Journey to Data Quality, The MIT Press.
Piattini, M., Calero, C. & Genero, M., 2001. Table
Oriented Metrics for Relational Databases. Software
Quality Journal.
Strong, D. M., Lee, Y. W. & Wang, R. Y., 1997. Data
Quality in Context. Communications of the ACM.
Strong, D. M., Lee, Y. W. & Wang, R. Y., 1997. 10
Potholes in the Road to Information Quality. IEEE
Computer.
Wang, R. Y., 1998. A Product Perspective on Total Data
Quality Management. Communications of the ACM.
Wang, R. Y., Reddy, M. P. & Kon, H. B., 1995. Toward
quality data: An attribute-based approach. Decision
Support Systems.
Measure Relational database DQXML
DRT 3
DDQT 3
RD RD(Comp)=2
RD(EE
1
)=2
RD(EE
2
)=0
RD(R
1
)=4
NA NA(Comp)=4
NA(EE
1
)=4
NA(EE
2
)=3
NA(D)=4
COS 0 0
NEE - 0
NEA - 0
NN - 73
NArc - 72
SC
XML
- 0
ICEIS 2007 - International Conference on Enterprise Information Systems
364