impact estimations. The algorithms and techniques
have been successfully employed in several large case
studies, leading to practical data lineage and
component dependency visualizations. We continue
this research by performance measurement with the
number of different big datasets, to present practical
examples and draw conclusion of our approach.
We also considering a more abstract, conceptual
and business level approach in addition to the current
physical/technical level of data lineage representation
and automation.
ACKNOWLEDGEMENTS
The research has been supported by EU through
European Regional Development Fund.
REFERENCES
Anand, M. K., Bowers, S., McPhillips, T., & Ludäscher, B.
(2009, March). Efficient provenance storage over
nested data collections. In Proceedings of the 12th
International Conference on Extending Database
Technology: Advances in Database Technology (pp.
958-969). ACM.
Anand, M. K., Bowers, S., & Ludäscher, B. (2010, March).
Techniques for efficiently querying scientific workflow
provenance graphs. In EDBT (Vol. 10, pp. 287-298).
Benjelloun, O., Sarma, A. D., Hayworth, C., & Widom, J.
(2006). An introduction to ULDBs and the Trio system.
IEEE Data Engineering Bulletin, March 2006.
Buneman, P., Khanna, S., & Wang-Chiew, T. (2001). Why
and where: A characterization of data provenance. In
Database Theory—ICDT 2001 (pp. 316-330). Springer
Berlin Heidelberg.
Cheney, J., Chiticariu, L., & Tan, W. C. (2009). Provenance
in databases: Why, how, and where. Now Publishers
Inc.
Cui, Y., Widom, J., & Wiener, J. L. (2000). Tracing the
lineage of view data in a warehousing environment.
ACM Transactions on Database Systems (TODS),
25(2), 179-227.
Cui, Y., & Widom, J. (2003). Lineage tracing for general
data warehouse transformations. The VLDB Journal—
The International Journal on Very Large Data Bases,
12(1), 41-58.
de Santana, A. S., & de Carvalho Moura, A. M. (2004).
Metadata to support transformations and data &
metadata lineage in a warehousing environment. In
Data Warehousing and Knowledge Discovery (pp. 249-
258). Springer Berlin Heidelberg.
Fan, H., & Poulovassilis, A. (2003, November). Using
AutoMed metadata in data warehousing environments.
In Proceedings of the 6th ACM international workshop
on Data warehousing and OLAP (pp. 86-93). ACM.
Giorgini, P., Rizzi, S., & Garzetti, M. (2008). GRAnD: A
goal-oriented approach to requirement analysis in data
warehouses. Decision Support Systems, 45(1), 4-21.
Heinis, T., & Alonso, G. (2008, June). Efficient lineage
tracking for scientific workflows. In Proceedings of the
2008 ACM SIGMOD international conference on
Management of data (pp. 1007-1018). ACM.
Ikeda, R., Das Sarma, A., & Widom, J. (2013, April).
Logical provenance in data-oriented workflows?. In
Data Engineering (ICDE), 2013 IEEE 29th
International Conference on (pp. 877-888). IEEE.
Missier, P., Belhajjame, K., Zhao, J., Roos, M., & Goble,
C. (2008). Data lineage model for Taverna workflows
with lightweight annotation requirements. In
Provenance and Annotation of Data and Processes (pp.
17-30). Springer Berlin Heidelberg.
Priebe, T., Reisser, A., & Hoang, D. T. A. (2011).
Reinventing the Wheel?! Why Harmonization and
Reuse Fail in Complex Data Warehouse Environments
and a Proposed Solution to the Problem.
Ramesh, B., & Jarke, M. (2001). Toward reference models
for requirements traceability. Software Engineering,
IEEE Transactions on, 27(1), 58-93.
Reisser, A., & Priebe, T. (2009, August). Utilizing
Semantic Web Technologies for Efficient Data Lineage
and Impact Analyses in Data Warehouse Environments.
In Database and Expert Systems Application, 2009.
DEXA'09. 20th International Workshop on (pp. 59-63).
IEEE.
Skoutas, D., & Simitsis, A. (2007). Ontology-based
conceptual design of ETL processes for both structured
and semi-structured data. International Journal on
Semantic Web and Information Systems (IJSWIS),
3(4), 1-24.
Tan, W. C. (2007). Provenance in Databases: Past, Current,
and Future. IEEE Data Eng. Bull., 30(4), 3-12.
Tomingas, K., Tammet, T., & Kliimask, M. (2014), Rule-
Based Impact Analysis for Enterprise Business
Intelligence. In Proceedings of the Artificial
Intelligence Applications and Innovations (AIAI2014)
conference workshop (MT4BD). Series: IFIP
Advances in Information and Communication
Technology, Vol. 437.
Tomingas, K., Kliimask, M., & Tammet, T. (2015). Data
Integration Patterns for Data Warehouse Automation.
In New Trends in Database and Information Systems II
(pp. 41-55). Springer International Publishing.
Vassiliadis, P., Simitsis, A., & Skiadopoulos, S. (2002).
Conceptual modeling for ETL processes. In
Proceedings of the 5th ACM international workshop on
Data Warehousing and OLAP (pp. 14-21). ACM.
Widom, J. (2004). Trio: A system for integrated
management of data, accuracy, and lineage. Technical
Report.
Woodruff, A., & Stonebraker, M. (1997). Supporting fine-
grained data lineage in a database visualization
environment. In Data Engineering, 1997. Proceedings.
13th International Conference on (pp. 91-102). IEEE.