sis techniques for interpreting the assessments. The
first technique compares an organisation’s DQ to a
benchmark from a best-practices organisation. The
second technique measures the distances between
the assessments of different stakeholders.
Beyond novel contributions the AIMQ method
can be criticised for measuring DQ based on the
subjective estimation of DQ via a questionnaire.
This approach prohibits an automated, objective and
repeatable DQ measurement. Moreover, it provides
no possibility to adapt this measurement to a particu-
lar scope (R3). But instead it combines subjective
DQ estimations of several users who normally use
data for different purposes.
The approach by (Helfert, 2002) distinguishes -
based upon (Juran, 1999) - two quality factors: qual-
ity of design and quality of conformance (see also
Heinrich & Helfert, 2003). Quality of design denotes
the degree of correspondence between the users’
requirements and the information system’s specifica-
tion (e. g. specified by means of data schemata).
Helfert focuses on quality of conformance that
represents the degree of correspondence between the
specification and the information system. This de-
termination is important within the context of meas-
uring DQ: It separates the subjective estimation of
the correspondence between the users’ requirements
and the specified data schemata from the measure-
ment - which can be objectivised - of the correspon-
dence between the specified data schemata and the
existing data values. Helfert’s main issue is the inte-
gration of DQ management into the meta data ad-
ministration which shall enable an automated and
tool-based DQ management. Thereby the DQ re-
quirements have to be represented by means of a set
of rules that is verified automatically for measuring
DQ. However, Helfert does not propose any metrics.
This is due to his goal of describing DQ manage-
ment on a conceptual level.
Besides these scientific approaches two practical
concepts by English and Redman shall be presented
in the following. English describes the total quality
data management method (English, 1999) that fol-
lows the concepts of total quality management. He
introduces techniques for measuring quality of data
schemata and architectures (of an information sys-
tem), and quality of attribute values. Despite the fact
that these techniques have been applied within sev-
eral projects, a general, well-founded procedure for
measuring DQ is missing. In contrast, Redman
chooses a process oriented approach and combines
measurement procedures for selected parts in an
information flow with the concept of statistical qual-
ity control (Redman, 1996). He also does not present
any particular metrics.
From a conceptual view, the approach by
(Hinrichs, 2002) is very interesting, since he devel-
ops metrics for selected DQ dimensions in detail.
His technique is promising, because it aims at an
objective, goal-oriented measurement. Moreover this
measurement is supposed to be automated. A closer
look reveals that major problems come along when
applying Hinrichs’ metrics in practice, since they are
hardly interpretable. This fact makes a justification
of the metrics’ results difficult (cp. requirement R5).
E. g., some of the metrics proposed by Hinrichs - as
the one for consistency - base on a quotient of the
following form:
()
1
1
+function distance the of result
An example for such a distance function is
()
∑
ℜ
=1s
s
wr
, where w denotes an attribute value within
the information system.
ℜ
is a set of consistency
rules (with |
ℜ
| as the number of set elements) that
shall be applied to w. Each consistency rule r
s
∈
ℜ
(s = 1, 2, …, |
ℜ
|) returns the value 0, if w fulfils the
consistency rule, otherwise the rule returns the value
1:
Thereby the distance function indicates how many
consistency rules are violated by the attribute
value w. In general, the distance function’s value
range is [0; ∞]. Thereby the value range of the met-
ric (quotient) is limited to the interval [0; 1]. How-
ever, by building this quotient the values become
hardly interpretable relating to (R1). Secondly, the
value range [0; 1] is normally not covered, because a
value of 0 is resulting only if the value of the dis-
tance function is ∞ (e. g. the number of consistency
rules violated by an attribute value has to be infi-
nite). Moreover the metrics are hardly applicable
within an economic-oriented DQ management, since
both absolute and relative changes can not be inter-
preted. In addition, the required cardinality is not
given (R2), a fact hindering economic planning and
ex post evaluation of the efficiency of realised DQ
measures.
⎩
⎨
⎧
=
else1
ruley consistenc thefulfils if0
:)(
s
s
rw
wr
Table 1 demonstrates this weakness: For improving
the value of consistency from 0 to 0.5, the corre-
sponding distance function has to be decreased from
∞ to 1. In contrast, an improvement from 0.5 to 1
needs only a reduction from 1 to 0. Summing up, it
METRICS FOR MEASURING DATA QUALITY - Foundations for an Economic Data Quality Management
89