The first step is to wrap existing database
systems and data marts using a cell structure (Karran
& Wright (2007)). Each cell manages its data-flow
autonomously providing it has been initiated by a
role with the required access permissions. Each cell
can assess the data quality which passes through its
‘cell membrane’ and returns a quality value for data
it uses. Data which does not meet predefined quality
settings is transferred to holding bay for further
cleaning and at predefined intervals (depending on
user settings), the dirty data is passed to a higher
level cell for further cleaning. As well as dealing
with the error and the number of errors, the higher
level cell is able to assess and count the type of
errors reaching that layer. If necessary, a new data
cleaning filter can be added to the lower level cell to
prevent this type of error from occurring in future.
The layered cells allow the organization to
develop a strategy for managing day to day
transactions. At the same time the internal
monitoring procedures of the cell membranes allow
the organization to respond to unforeseen situations,
and to abnormal or incorrect data. At the higher
layers, the intelligence cells may base their decisions
on generalised versions of core information, but it is
also possible to drill through the underlying layers to
obtain the source data on which these decisions were
made. The rules for decision making are refined at
the control layer, depending on what is needed for
extrapolating rules about the environment,
monitoring patterns of behaviour, and adapting to
circumstances.
Data quality is a fundamental property of good
intelligence and is managed in a layered way so that
the source of the generalized rules can be retrieved
and tested if necessary. Since data quality is also
vital to ensure a holistic and speedy adaptive
response once new information has been discovered,
there are efficient feedback loops linking the layers.
The whole process is dynamic since decisions made
must result in improvement and the intelligence cells
can test this. The system will become increasingly
autonomous because it is possible for the top layer
to test the rules and predictions made by the lower
layers and to replace poorly performing strategies
with better adaptations based on analysis. This data
quality cycle holds true for any complex information
system and not only business enterprises.
4 ADDING AUTONOMOUS DATA
QUALITY MONITORING TO
AN ENTERPRISE
ARCHITECTURE
Testing of the architecture has so far been confined
to specific architectural elements. It is important to
note that there are several different strands to the
automation process for data quality. For example,
there is a distinction between architectural elements
which deal with the intrinsic quality and those which
deal with the data requirements of the user/role. For
example, in figure 3 above, each user has a security
profile detailing accesses and permission to cells. At
the same time, the enterprise layer has it own data
quality requirements. It is anticipated that each
layer will include a data mining tool which analyses
user accesses and transactions. It is envisaged that
this may become part of the meta data catalogue
which stores details about the quality of the data. It
should help to ensure that users receive the right
data in a timely manner. At the same time, it is
possible to monitor for potential unauthorised
access.
4.1 Managing Database Autonomy
The concept of managing databases autonomously is
still in its early stages. This is because the data held
by the organization is a primary resource and any
change is potentially dangerous to the integrity of
the existing systems. However, we have explored
how it is possible for a remote agent, initially the
database administrator, but potentially an
autonomous cell, to administer the database logs in
terms of simple errors, and other log conditions
(Karran & Wright (2007)). We have shown that it is
possible to design and build a CODA type cell
which is able to provide advice for the database
administrator away from the console.
4.2 Creating Autonomous Filters for
Managing Data Quality at ETL
Layer
The methods and principles of autonomously
filtering data have been applied using Oracle and
Excel formats for project metrics in a large software
house maintaining third party software(unpublished
research Ragaven (2005)).
The original source data was huge, error prone
and duplicated. Each project had a separate metric
AUTONOMOUS DATA QUALITY MONITORING AS A PROPERTY OF CODA ENTERPRISE ARCHITECTURE
115