An Ontology for Describing ETL Patterns Behavior

Bruno Oliveira, Orlando Belo

Abstract

The use of software patterns is a common practice in software design, providing reusable solutions for recurring problems. Patterns represent a general skeleton used to solve common problems, providing a way to share regular practices and reduce the resources needed for implementing software systems. Data warehousing populating processes are a very particular type of software used to migrate data from one or more data sources to a specific data schema used to support decision support activities. The quality of such processes should be guarantee. Otherwise, the final system will deal with data inconsistencies and errors, compromising its suitability to support strategic business decisions. To minimize such problems, we propose a pattern-oriented approach to support ETL lifecycle, from conceptual representation to its execution primitives using a specific commercial tool. An ontology-based meta model it was designed and used for describing patterns internal specification and providing the means to support and enable its configuration and instantiation using a domain specific language.

References

  1. Akkaoui, Z. et al., 2011. A model-driven framework for ETL process development. Proceedings of the ACM 14th international workshop on Data Warehousing and OLAP (DOLAP'11), pp.45-52.
  2. Akkaoui, Z. et al., 2012. BPMN-Based Conceptual Modeling of ETL Processes. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 7448, pp.1-14.
  3. Akkaoui, Z. & Zimanyi, E., 2009. Defining ETL worfklows using BPMN and BPEL. In Proceeding of the ACM twelfth international workshop on Data warehousing and OLAP DOLAP 09. pp. 41-48. Available at: http://portal.acm.org/citation.cfm?doid=1 651291.1651299.
  4. Alexander, C., Ishikawa, S. & Silverstein, M., 1977. A Pattern Language: Towns, Buildings, Construction, Oxford University Press.
  5. Bouman, R. & Dongen, J. Van, 2009. Pentaho® Solutions: Business Intelligence and Data Warehousing with Pentaho and MySQL®,
  6. Brickley, D. & Guha, R.V., 2004. RDF Vocabulary Description Language 1.0: RDF Schema. W3C, pp.1- 15. Available at: http://www.w3.org/TR/rdf-schema/.
  7. Dietrich, J. & Elgar, C., 2007. Towards a web of patterns. Web Semantics: Science, Services and Agents on the World Wide Web, 5(2), pp.108-116. Available at: http://linkinghub.elsevier.com/retrieve/pii/S15708268 07000030.
  8. Gamma, E. et al., 1995. Design patterns: elements of reusable object-oriented software. Design, 206, p.395. Available at: http://www.cs.up.ac.za/cs/aboake/sws7 80/references/patternstoarchitecture/Gamma-DesignPa tternsIntro.pdf.
  9. Gruber, T.R., 1993. A translation approach to portable ontology specifications. Knowledge Acquisition, 5(2), pp.199-220. Available at: http://citeseerx.ist.psu.edu/ viewdoc/summary?doi=10.1.1.101.7493.
  10. Horridge, M., 2012. protégé-owl api. Research, Stanford Center for Biomedical Informatics, p.1. Available at: http://protege.stanford.edu/plugins/owl/api/.
  11. Köppen, V., Brüggemann, B. & Berendt, B., 2011. Designing Data Integration: The ETL Pattern Approach. The European Journal for the Informatics Professional, XII(3).
  12. McGuinness, D.L. & van Harmelen, F., 2004. OWL Web Ontology Language Overview, OMG.
  13. McGuinness, D.L. & Wright, J.R., 1998. Conceptual modelling for configuration: A description logic-based approach. Artificial Intelligence for Engineering Design, Analysis and Manufacturing, 12(4), pp.333- 344.
  14. Noy, N. & McGuinness, D., 2001. Ontology development 101: A guide to creating your first ontology. Development, 32, pp.1-25. Available at: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10. 1.1.136.5085&rep=rep1&type=pdf\nhttp://li ris.cnrs.fr/alain.mille/enseignements/Ecole_Centrale/ What is an ontology and why we need it.htm.
  15. Oliveira, B. et al., 2015. Conceptual-physical bridging - From BPMN models to physical implementations on kettle. In CEUR Workshop Proceedings. pp. 55-59.
  16. Oliveira, B. & Belo, O., 2015. A Domain-Specific Language for ETL Patterns Specification in Data Warehousing Systems. In 17th Portuguese Conference on Artificial Intelligence.
  17. Oliveira, B. & Belo, O., 2012. BPMN Patterns for ETL Conceptual Modelling and Validation. In 20th International Symposium on Methodologies for Intelligent Systems.
  18. Oliveira, B. & Belo, O., 2013. Pattern-based ETL conceptual modelling. In 3rd International Conference on Model & Data Engineering (MEDI 2013).
  19. Oliveira, B., Belo, O. & Cuzzocrea, A., 2014. A Patternoriented Approach for Supporting ETL Conceptual Modelling and its YAWL-based Implementation. In 4th International Conference on Data Management Technologies and Applications.
  20. Protégé, 2011. The Protégé Ontology Editor, Available at: http://protege.stanford.edu/.
  21. Rahm, E. & Do, H., 2000. Data cleaning: Problems and current approaches. IEEE Data Eng. Bull., 23(4), pp.3-13. Available at: http://wwwiti.cs.uni-magdeburg .de/iti_db/lehre/dw/paper/data_cleaning.pdf\npapers2:/ /publication/uuid/17B58056-3A7F-4184-8E8B-0E4D8 2EFEA1A\nhttp://dc-pubs.dbs.uni-leipzig.de/files/Rah m2000DataCleaningProblemsand.pdf.
  22. Simitsis, A. & Vassiliadis, P., 2008. A method for the mapping of conceptual designs to logical blueprints for ETL processes. Decision Support Systems, 45(1), pp.22-40.
  23. Vassiliadis, P. et al., 2003. A framework for the design of ETL scenarios. In Proceedings of the 15th international conference on Advanced information systems engineering. CAiSE'03. Berlin, Heidelberg: Springer-Verlag, pp. 520-535. Available at: http://dl .acm.org/citation.cfm?id=1758398.1758445.
  24. Vassiliadis, P. et al., 2000. ARKTOS: A tool for data cleaning and transformation in data warehouse environments. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, pp.1-7. Available at: http://scholar.google.com/scholar?hl= en&btnG=Search&q=intitle:A+Tool+For+Data+Clean ing+and+Transformation+in+Data+Warehouse+Envir onments#0.
  25. Vassiliadis, P., Simitsis, A. & Skiadopoulos, S., 2002a. Conceptual Modeling for ETL Processes. Proceedings of the 5th ACM international workshop on Data Warehousing and OLAP - DOLAP 7802, pp.1-25. Available at: http://dl.acm.org/citation.cfm?id=583890 .583893.
  26. Vassiliadis, P., Simitsis, A. & Skiadopoulos, S., 2002b. On the Logical Modeling of ETL Processes. Science, pp.782-786. Available at: http://www.springerlink.co m/index/tjaep8rw6y7nrb07.pdf.
  27. W3.org, 2012. Semantic Web - W3C. W3.org. Available at: http://www.w3.org/standards/semanticweb/.
  28. Weske, M., van der Aalst, W. & Verbeek, H., 2004. Advances in Business Process Management. Data & Knowledge Engineering, 50, pp.1-8.
  29. White, S.A. & Corp, I.B.M., 2005. Using BPMN to Model a BPEL Process. Business, 3, pp.1-18. Available at: http://www.bpmn.org/Documents/Mapping_BPMN_to _BPEL_Example.pdf.
Download


Paper Citation


in Harvard Style

Oliveira B. and Belo O. (2016). An Ontology for Describing ETL Patterns Behavior . In Proceedings of the 5th International Conference on Data Management Technologies and Applications - Volume 1: DATA, ISBN 978-989-758-193-9, pages 102-109. DOI: 10.5220/0005974001020109


in Bibtex Style

@conference{data16,
author={Bruno Oliveira and Orlando Belo},
title={An Ontology for Describing ETL Patterns Behavior},
booktitle={Proceedings of the 5th International Conference on Data Management Technologies and Applications - Volume 1: DATA,},
year={2016},
pages={102-109},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005974001020109},
isbn={978-989-758-193-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 5th International Conference on Data Management Technologies and Applications - Volume 1: DATA,
TI - An Ontology for Describing ETL Patterns Behavior
SN - 978-989-758-193-9
AU - Oliveira B.
AU - Belo O.
PY - 2016
SP - 102
EP - 109
DO - 10.5220/0005974001020109