2.3.1 Test-Data Quality
The data quality factors reflect the general data
quality aspects. The test-data and especially the test-
data for the end-to-end testing have its specifics:
it should cover all real production cases and
configurations
it should properly reflect the real production data
relations and stochastic distributions
it should support possible future functionality
that is not used currently in the production
system, but is possible technically
it should quantitatively reflect the requirements
for non-functional testing, that is over the current
production volumes.
These specifics do not influence the general
approach, however, they definitely important for the
evaluation in the model presented below.
The test-data quality has one additional
characteristic. Namely, how good the data covers all
needs of the planned test activity, that is how good it
for the execution of all planned test cases. Finally
that is the crucial thing about the test-data.
The test coverage is usually derived from
business and technical use cases and requirements.
Each use case is typically transferred into several
test requirements. When you manage to build a pure
hierarchal test planning system then each test
requirement is covered by several test cases. In
practise you may have more complicated relations –
many-to-many, when one test case supports several
test requirements. Besides, every test case may have
several configurations.
The quality factor reflects how good this
complicated structure is supported by the test-data,
so, that every test case could be executed with all its
configurations. This criterion complements the set of
quality factors.
2.3.2 Correctness
The data accuracy is easily formalised both for the
interfaces and in the data store. They check formats,
default values, initial load values against system
specifications, empty fields. We are speaking about
pure technical checks. More business-oriented
verification is done along the data comprehension
dimension.
Precision for a data entity should be the same all
over the system. It makes no sense to store a field
with, say, six decimal places, when the data source
delivers it with only four positions after decimal
point. The check of the precision aspect is
complicated, since the same variables might have
different names in different components (see the
consistency dimension) and the proper analysis
requires a big analytical rather than formal effort.
The granularity is business-defined and is
covered in the data model level.
For SSE the major focus lays on default values,
initial load for new versions and migrations, and
NULLs. The NULL-values are traditionally the
source of many problems. They often play an
important role in the integration between
components when one component designed by an
outsourcing company, say in India, allows NULLs
for specific field, and the next component that has
been purchased from a provider in Australia expects
no NULLs on the interface. Often such problems
happen. Generally NULL can note either non
existing value, or existing but unknown (not
provided by the data source), or when you do not
know, whether the value exists.
2.3.3 Completeness
The data completeness is as well easily checked on
the database level. It is better when you have the
statistical reference from the real production. The
source of the data in the test system is mainly test
automation scripts. They should be configured in
such a way that at least the data relations (not always
volumes) remain production-like.
On the table level we should verify that there are
no empty tables. For instance, in the current SSE
data warehouse test system there are 36 empty tables
out of totally 187. However, some of them relate to
the functions, which are not tested in the end-to-end
system, others are not populated any more, but still
kept in production for historic reporting. There are
some tables that have been introduced, however, not
used in production, since the corresponding business
functionality is not yet activated.
On the record level – check for empty, not-
populated fields. Here we should verify not simply
NULLs, but those fields that are filled in some
components, but empty in other.
From the dynamic point of view it should be
controlled that there is data for all days (in SSE
context) or other relevant time entities. For instance,
in the current test data warehouse there is data from
01.06.2015 till 01.05.2016, that is, 335 calendar days
(2016 is a leap-year). That corresponds to 231
business days (minus weekends and bank holidays).
In the data warehouse there are 220 loaded days, i.e.
11 days are missing for some reasons.
From the object point of view – all objects (like
legal entities, participants or trading users by SSE)
should include required data.