Management of Data Quality Related Problems - Exploiting Operational Knowledge

Mortaza S. Bargh, Jan van Dijk, Sunil Choenni

Abstract

Dealing with data quality related problems is an important issue that all organizations face in realizing and sustaining data intensive advanced applications. Upon detecting these problems in datasets, data analysts often register them in issue tracking systems in order to address them later on categorically and collectively. As there is no standard format for registering these problems, data analysts often describe them in natural languages and subsequently rely on ad-hoc, non-systematic, and expensive solutions to categorize and resolve registered problems. In this contribution we present a formal description of an innovative data quality resolving architecture to semantically and dynamically map the descriptions of data quality related problems to data quality attributes. Through this mapping, we reduce complexity – as the dimensionality of data quality attributes is far smaller than that of the natural language space – and enable data analysts to directly use the methods and tools proposed in literature. Furthermore, through managing data quality related problems, our proposed architecture offers data quality management in a dynamic way based on user generated inputs. The paper reports on a proof of concept tool and its evaluation.

References

  1. AHIMA, 2012. Data Quality Management Model (Updated). In Journal of American Health Information Management Association: AHIMA. Vol. 83, No.7, 62- 67.
  2. Bargh, M. S., Choenni, S., Meijer, R., 2015a. Privacy and Information Sharing in a Judicial Setting: A Wicked Problem. In Proceedings of DG.O, 97-106, ACM.
  3. Bargh, M. S., van Dijk, J., Choenni, S., 2015b. Dynamic data quality management using issue tracking systems. In the IADIS International Journal on Computer Science and Information Systems (IJCSIS, ISSN: 1646- 3692), ed. P. Isaias and M. Paprzycki, Vol. 10, No. 2, pp. 32-51.
  4. Bargh, M. S., Mbgong, F., Dijk, J. van, Choenni, S., 2015c. A framework for Dynamic Data Quality Management. In Proceedings of ISPCM, Las Palmas, de Gran Canaria, Spain.
  5. Batini C, Cappiello C, Francalanci C, Maurino A., 2009. Methodologies for Data Quality Assessment and Improvement. ACM Computing Surveys, Vol. 41, No. 3, Article 16, ACM.
  6. Birman, K. P., 2012. Consistency in Distributed Systems. Book Chapter in Guide to Reliable Distributed Systems, 457-470.
  7. Bugzilla Website, 2015. https://www.bugzilla.org (retrieved on 31/10/2015).
  8. Choenni, S., Leertouwer, E., 2010. Public Safety Mashups to Support Policy Makers. In Electronic Government and the Information Systems Perspective (EGOVIS), Bilbao, 234-248, Springer.
  9. Canovas Izquierdo, J. L., Cosentino, V., Rolandi, B., Bergel, A., Cabot, J., 2015. GiLA: GitHub Label Analyzer. In IEEE 22nd International Conference on Software Analysis, Evolution and Reengineering (SANER), 479-483. Montreal, Canada.
  10. Davenport, T. H., Glaser, J., 2002. Just-in-time delivery comes to knowledge management. Harvard business review, 80(7), 107-11.
  11. Dijk, J. van, Choenni, R., Leertouwer, E., Spruit, Brinkkemper, S., 2013. A Data Space System for the Criminal Justice Chain. In Proceedings of ODBASE, Graz, Austria, Springer, 755-763.
  12. EPA, 2006. Environmental Protection Agency. Data Quality Assessment: A Reviewer's Guide, Technical Report EPA/240/B-06/002, EPA QA/G-9R.
  13. Eppler, M. J., Wittig, D., 2000. Conceptualizing Information Quality: A Review of Information Quality Frameworks from the Last Ten Years. In Proceedings of the Conference on Info Quality, 83-96.
  14. H2desk Website, 2015. https://www.h2desk.com (retrieved on 31/10/2015).
  15. Jiang, L., Barone, D., Borgida, A., Mylopoulos, J. 2009. Measuring and Comparing Effectiveness of Data Quality Techniques. van Eck, P., Gordijn, J., Wieringa, R. (Eds.), International Conference on Advanced Information Systems Engineering (CAiSE), LNCS 5565,171-185, Springer-Verlag Berlin Heidelberg.
  16. JIRA Software Website, 2015. https://www. atlassian.com/software/jira (retrieved on 31/10/2015).
  17. Knowledgent 2015. White Paper Series: Building a Successful Data Quality Management Program, http://knowledgent.com/whitepaper/buildingsuccessful-data-quality-management-program/ (retrieved on 31/10/2015).
  18. Kornai, A. 2010. The Algebra of Lexical Semantics. In: Mathematics of Language, 174-199, Springer.Mooney R. J., 2007. Learning for Semantic Parsing. In Proceedings of Computational Linguistics and Intelligent Text Processing, Mexico City (invited paper), A. Gelbukh (Ed.), 311-324, Springer.
  19. Netten, N., van den Braak, S., Choenni, S., Leertouwer, E., 2014. Elapsed Times in Criminal Justice Systems. In Proceedings of ICEGOV, 99-108, ACM.
  20. Pipino, L. L. et al., 2012. Data Quality Assessment. In: Communications of the ACM. Vol. 45, No. 4, 211-218.
  21. Price, R., Shanks, G., 2004. A Semiotic Information Quality Framework. In Proceedings of International Conference on Decision Support Systems (DSS), 658- 672.
  22. TOPdesk Website, 2015. http://www.topdesk.nl (retrieved on 31/10/2015).
  23. Wand, Y., Wang, R. Y., 1996. Anchoring Data Quality Dimensions in Ontological Foundations. In Communications of the ACM, Vol. 39, No. 11, 86-95.
  24. Wang, R. Y., Strong, D. M. 1996. Beyond Accuracy: What Data Quality Means to Data Consumers. In: Journal of Management Information Systems. Vol. 12, No. 4, 5- 33.
  25. Woodall, P., Borek, A., Parlikad, A. K., 2013. Data Quality Assessment: The Hybrid Approach. In Information & Management, Vol. 50.
Download


Paper Citation


in Harvard Style

S. Bargh M., van Dijk J. and Choenni S. (2016). Management of Data Quality Related Problems - Exploiting Operational Knowledge . In Proceedings of the 5th International Conference on Data Management Technologies and Applications - Volume 1: DATA, ISBN 978-989-758-193-9, pages 31-42. DOI: 10.5220/0005982300310042


in Bibtex Style

@conference{data16,
author={Mortaza S. Bargh and Jan van Dijk and Sunil Choenni},
title={Management of Data Quality Related Problems - Exploiting Operational Knowledge},
booktitle={Proceedings of the 5th International Conference on Data Management Technologies and Applications - Volume 1: DATA,},
year={2016},
pages={31-42},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005982300310042},
isbn={978-989-758-193-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 5th International Conference on Data Management Technologies and Applications - Volume 1: DATA,
TI - Management of Data Quality Related Problems - Exploiting Operational Knowledge
SN - 978-989-758-193-9
AU - S. Bargh M.
AU - van Dijk J.
AU - Choenni S.
PY - 2016
SP - 31
EP - 42
DO - 10.5220/0005982300310042