measure the severity level of registered problems is to
measure KPIs, which faces some challenges like
defining effective, valid, and standardized
performance indicators. For instance, a KPI based on
measuring the hamming distance of 2 words can be
ineffective. For instance, the words “Netherlands”
and “Holland” are semantically closer than their
Hamming distances when considering the cultural
background of both words. Measuring semantic
distances, on the other hand, is more challenging than
measuring hamming distances.
An underlying assumption in our proposal is that
data analysts of an organization register encountered
problems in an ITS. In practice, users are not eager to
register problems effectively and expressively.
Organizations should encourage and train their
employees to fill in such logging system so that the
benefits of the proposed system can be harvested.
Using tags and labels to mark DQ problems, see
(Canovas Izquierdo, et al., 2015), can further be
explored to this end.
We proposed a data quality management
approach to utilize user-generated inputs about DQ
problems to carry out DQ management. For each
functional component, furthermore, we proposed
some simple (and heuristic) methods to realize the
component’s functionality. Due to modular property
of the proposed DQ management approach, one can
replace these methods by defining customized
methods suitable for own organization and problem
domain.
5 CONCLUSIONS
In this contribution we presented the formal
description and the system architecture of an
integrated system for resolving the problems
observed in datasets based on DQ management. The
proposed architecture, moreover, results in a dynamic
DQ management system, which relies on user
generated data (i.e., data users/analysts who describe
the DQ related problems they encounter in their daily
practice). By managing DQ related problems
encountered in an organization at an operational
level, our proposal manages also the organization’s
DQ issues (i.e., realizes DQ management). To this
end, we semantically and dynamically map the
descriptions of DQ related problems to DQ attributes.
The mapping provides a quantitative and dynamic
means to determine the relevant DQ attributes and the
level of their relevancy, given the operational setting
(i.e., the desired and momentary problem severity
levels).
The realization of the proposed DQ management
in our organization has given us insightful feedback
on its advantages and limitations. As we envisioned,
the solution bridged successfully the gap between the
operational level (e.g., data analysts) and strategic
level (e.g., managers) DQ stakeholders within our
organization. To fully benefit from the potentials of
the proposed architecture, however, it is necessary to
encourage the users of datasets (i.e., data analysts) to
provide their inputs about the DQ related problems
that they encounter proactively and expressively.
Through improving the problem registration process
one can reduce the number of untargeted problems
and guarantee their influence on dataset problem
resolution and DQ management processes. It is for
our future research to explore, for example, user
awareness and training solutions, and to develop
objective KPIs and problem resolving techniques
(e.g., to determine the capabilities and costs of
candidate solutions).
REFERENCES
AHIMA, 2012. Data Quality Management Model
(Updated). In Journal of American Health Information
Management Association: AHIMA. Vol. 83, No.7, 62-
67.
Bargh, M. S., Choenni, S., Meijer, R., 2015a. Privacy and
Information Sharing in a Judicial Setting: A Wicked
Problem. In Proceedings of DG.O, 97-106, ACM.
Bargh, M. S., van Dijk, J., Choenni, S., 2015b. Dynamic
data quality management using issue tracking systems.
In the IADIS International Journal on Computer
Science and Information Systems (IJCSIS, ISSN: 1646-
3692), ed. P. Isaias and M. Paprzycki, Vol. 10, No. 2,
pp. 32-51.
Bargh, M. S., Mbgong, F., Dijk, J. van, Choenni, S., 2015c.
A framework for Dynamic Data Quality Management.
In Proceedings of ISPCM, Las Palmas, de Gran
Canaria, Spain.
Batini C, Cappiello C, Francalanci C, Maurino A., 2009.
Methodologies for Data Quality Assessment and
Improvement. ACM Computing Surveys, Vol. 41, No.
3, Article 16, ACM.
Birman, K. P., 2012. Consistency in Distributed Systems.
Book Chapter in Guide to Reliable Distributed Systems,
457-470.
Bugzilla Website, 2015. https://www.bugzilla.org
(retrieved on 31/10/2015).
Choenni, S., Leertouwer, E., 2010. Public Safety Mashups
to Support Policy Makers. In Electronic Government
and the Information Systems Perspective (EGOVIS),
Bilbao, 234-248, Springer.
Canovas Izquierdo, J. L., Cosentino, V., Rolandi, B.,
Bergel, A., Cabot, J., 2015. GiLA: GitHub Label
Analyzer. In IEEE 22nd International Conference on