TOWARDS DATA AND DATA QUALITY MANAGEMENT FOR LARGE SCALE HEALTHCARE SIMULATIONS - Position Paper

Philipp Baumgärtel, Richard Lenz

Abstract

The approach of ProHTA (Prospective Health Technology Assessment) is to understand the impact of medical processes and technologies as early as possible. Therefore, simulation techniques are utilized to estimate the effects of innovative health technologies and find potentials of efficiency enhancement within the supply chain of healthcare. Data management for healthcare simulations is required as heterogeneous data is needed both as simulation input data and for validation purposes. The main problem is the heterogeneity of the data and the initially unknown and continuously changing demands of the simulation. Also, data quality considerations are necessary to quantify the reliability of simulation output. A solution has to consider all of these aspects and must be extensible to cope with changing requirements. As the structure of the data is not known in advance, a generic database schema is required. This paper proposes an approach to store heterogeneous statistical data in an RDF-triplestore. Semantic annotations based on conceptual models are utilized to describe the datasets. Additionally, a special query language helps loading the data into the simulation. The feasibility of the approach has been demonstrated in a prototype implementation. We discuss the benefits of this approach as well as remaining challenges and issues.

References

  1. Ainsworth, J. D., Carruthers, E., Couch, P., Green, N., O'Flaherty, M., Sperrin, M., Williams, R., Asghar, Z., Capewell, S., and Buchan, I. E. (2011). Impact: A generic tool for modelling and simulating public health policy. Methods of Information in Medicine, 5:454-463.
  2. Batini, C., Cappiello, C., Francalanci, C., and Maurino, A. (2009). Methodologies for data quality assessment and improvement. ACM Comput. Surv., 41:16:1- 16:52.
  3. Bengtsson, N., Shao, G., Johansson, B., Lee, Y., Leong, S., Skoogh, A., and Mclean, C. (2009). Input data management methodology for discrete event simulation. In Winter Simulation Conference (WSC), Proceedings of the 2009, pages 1335 -1344.
  4. Boulonne, A., Johansson, B., Skoogh, A., and Aufenanger, M. (2010). Simulation data architecture for sustainable development. In Proceedings of the 2010 Winter Simulation Conference.
  5. Cheng, R. C. H. and Holland, W. (2004). Calculation of confidence intervals for simulation output. ACM Trans. Model. Comput. Simul., 14:344-362.
  6. Cyganiak, R., Reynolds, D., and Tennison, J. (2010). The rdf data cube vocabulary. http://publishing-statisticaldata.googlecode.com/svn/trunk/specs/src/main/html/ cube.html.
  7. Fürber, C. and Hepp, M. (2010). Using semantic web resources for data quality management. In Proceedings of the 17th international conference on Knowledge engineering and management by the masses, EKAW'10, pages 211-225, Berlin, Heidelberg. Springer-Verlag.
  8. Gowri, K. (2001). Enerxml - a schema for representing energy simulation data. In Proceedings of the Seventh International IBPSA Conference.
  9. Hausenblas, M., Halb, W., Raimond, Y., Feigenbaum, L., and Ayers, D. (2009). Scovo: Using statistics on the web of data. In The Semantic Web: Research and Applications, volume 5554 of Lecture Notes in Computer Science, pages 708-722. Springer Berlin / Heidelberg.
  10. Kurze, C., Gluchowski, P., and Bohringer, M. (2010). Towards an ontology of multidimensional data structures for analytical purposes. In System Sciences (HICSS), 2010 43rd Hawaii International Conference on, pages 1 -10.
  11. Lassila, O., Swick, R. R., Wide, W., and Consortium, W. (1999). Resource description framework (rdf) model and syntax specification. http://www.w3.org/TR/1999/REC-rdf-syntax19990222.
  12. Lenz, H.-J. and Shoshani, A. (1997). Summarizability in olap and statistical data bases. In Scientific and Statistical Database Management, 1997. Proceedings., Ninth International Conference on, pages 132 -143.
  13. Lenz, R., Elstner, T., Siegele, H., and Kuhn, K. A. (2002). A practical approach to process support in health information systems. Journal of the American Medical Informatics Association, 9(6):571-585.
  14. Nadkarni, P. M., Marenco, L., Chen, R., Skoufos, E., Shepherd, G., and Miller, P. (1999). Organization of heterogeneous scientific data using the eav/cr representation. Journal of the American Medical Informatics Association, 6(6):478-493.
  15. Niemi, T. and Niinimäki, M. (2010). Ontologies and summarizability in olap. In Proceedings of the 2010 ACM Symposium on Applied Computing, SAC 7810, pages 1349-1353, New York, NY, USA. ACM.
  16. Niemi, T., Toivonen, S., Niinimaki, M., and Nummenmaa, J. (2007). Ontologies with semantic web/grid in data integration for olap. International Journal on Semantic Web and Information Systems (IJSWIS), 3:25-49.
  17. Prud'hommeaux, E. and Seaborne, A. (2008). Sparql query language for rdf. http://www.w3.org/TR/2008/RECrdf-sparql-query-20080115/.
  18. Reimann, P., Reiter, M., Schwarz, H., Karastoyanova, D., and Leymann, F. (2011). Simpl - a framework for accessing external data in simulation workflows. In Datenbanksysteme fr Business, Technologie und Web.
  19. Robertson, N. and Perera, T. (2002). Automated data collection for simulation? Simulation Practice and Theory, 9(6-8):349 - 364.
  20. Rogers, J., Simakov, R., Soroush, E., Velikhov, P., Balazinska, M., DeWitt, D., Heath, B., Maier, D., Madden, S., Patel, J., Stonebraker, M., Zdonik, S., Smirnov, A., Knizhnik, K., and Brown, P. G. (2010). Overview of scidb, large scale array storage, processing and analysis. In Proceedings of the SIGMOD'10.
  21. Skoogh, A., Michaloski, J., and Bengtsson, N. (2010). Towards continuously updated simulation models: Combingin automated raw data collection and automated data processing. In Proceedings of the 2010 Winter Simulation Conference.
  22. Stonebraker, M., Becla, J., DeWitt, D., Lim, K.-T., Maier, D., Ratzesberger, O., and Zdonik, S. (2009). Requirements for science data bases and scidb. In Proceedings of the CIDR 2009 Conference.
  23. Wang, R. Y. and Strong, D. M. (1996). Beyond accuracy: what data quality means to data consumers. J. Manage. Inf. Syst., 12:5-33.
  24. Zhang, Y., Kersten, M., Ivanova, M., and Nes, N. (2011). Sciql, bridging the gap between science and relational dbms. In Proceedings of the IDEAS11.
Download


Paper Citation


in Harvard Style

Baumgärtel P. and Lenz R. (2012). TOWARDS DATA AND DATA QUALITY MANAGEMENT FOR LARGE SCALE HEALTHCARE SIMULATIONS - Position Paper . In Proceedings of the International Conference on Health Informatics - Volume 1: HEALTHINF, (BIOSTEC 2012) ISBN 978-989-8425-88-1, pages 275-280. DOI: 10.5220/0003871602750280


in Bibtex Style

@conference{healthinf12,
author={Philipp Baumgärtel and Richard Lenz},
title={TOWARDS DATA AND DATA QUALITY MANAGEMENT FOR LARGE SCALE HEALTHCARE SIMULATIONS - Position Paper},
booktitle={Proceedings of the International Conference on Health Informatics - Volume 1: HEALTHINF, (BIOSTEC 2012)},
year={2012},
pages={275-280},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003871602750280},
isbn={978-989-8425-88-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Health Informatics - Volume 1: HEALTHINF, (BIOSTEC 2012)
TI - TOWARDS DATA AND DATA QUALITY MANAGEMENT FOR LARGE SCALE HEALTHCARE SIMULATIONS - Position Paper
SN - 978-989-8425-88-1
AU - Baumgärtel P.
AU - Lenz R.
PY - 2012
SP - 275
EP - 280
DO - 10.5220/0003871602750280