mation requirements. We proposed a data warehouse
architecture that includes data acquisition from vari-
ous sources, as well as ETL processes for transform-
ing data into an integrated structure so that it can be
loaded into a data warehouse. The operation of the
system is based on metadata that describe schema
of all data sets involved in the system as well as all
changes identified by the change discovery algorithm.
The main contribution of this paper is the mech-
anism for processing of discovered changes and
changes performed manually. As a proof of concept,
the proposed solution has been successfully applied
to the publication data warehouse.
There are several benefits of the proposed ap-
proach comparing to manual processing of changes in
data sources and information requirements. Changes
of certain types are discoverable automatically and
comprehensive information about changes occurred is
available to the developer. Management of evolution
is ensured with less human participation. Change pro-
cessing is transparent as all operations performed and
conditions verified are available to the developer. The
proposed approach is flexible and may be extended
by defining additional operations and conditions in
the corresponding metadata tables, then building new
change adaptation scenarios from them and assigning
these scenarios to change types.
Possible directions of future work include defini-
tion of preferences regarding adaptation scenarios for
various change types that are expressed by the devel-
oper to be used to choose scenarios automatically.
ACKNOWLEDGEMENTS
This work has been supported by the European
Regional Development Fund (ERDF) project No.
1.1.1.2./VIAA/1/16/057.
REFERENCES
Ahmed, W., Zim
´
anyi, E., and Wrembel, R. (2014). A logi-
cal model for multiversion data warehouses. In Data
Warehousing and Knowledge Discovery, pages 23–34,
Cham. Springer International Publishing.
Bentayeb, F., Favre, C., and Boussaid, O. (2008). A
user-driven data warehouse evolution approach for
concurrent personalized analysis needs. Integrated
Computer-Aided Engineering, 15(1):21–36.
Chen, S. (2010). Cheetah: A high performance, custom data
warehouse on top of mapreduce. Proc. VLDB Endow.,
3(1–2):1459–1468.
Cuzzocrea, A., Bellatreche, L., and Song, I.-Y. (2013). Data
warehousing and olap over big data: Current chal-
lenges and future research directions. In Proceedings
of the 16th International Workshop on Data Ware-
housing and OLAP, DOLAP ’13, page 67–70, New
York, NY, USA. ACM.
Golfarelli, M., Lechtenb
¨
orger, J., Rizzi, S., and Vossen, G.
(2006). Schema versioning in data warehouses: En-
abling cross-version querying via schema augmenta-
tion. Data & Knowledge Engineering, 59(2):435 –
459.
Holubov
´
a, I., Klettke, M., and St
¨
orl, U. (2019). Evolu-
tion management of multi-model data. In Heteroge-
neous Data Management, Polystores, and Analytics
for Healthcare, pages 139–153, Cham. Springer In-
ternational Publishing.
Kaisler, S., Armour, F., Espinosa, J. A., and Money, W.
(2013). Big data: Issues and challenges moving for-
ward. In 2013 46th Hawaii International Conference
on System Sciences, pages 995–1004.
Kimball, R. and Ross, M. (2019). The data warehouse
toolkit: The definitive guide to dimensional modeling,
ed. wiley.
Malinowski, E. and Zim
´
anyi, E. (2008). A conceptual
model for temporal data warehouses and its transfor-
mation to the er and the object-relational models. Data
& Knowledge Engineering, 64(1):101 – 133.
Nadal, S., Romero, O., Abell
´
o, A., Vassiliadis, P., and Van-
summeren, S. (2019). An integration-oriented ontol-
ogy to govern evolution in big data ecosystems. Infor-
mation Systems, 79:3 – 19.
Quix, C., Hai, R., and Vatov, I. (2016). Metadata extraction
and management in data lakes with gemms. Complex
Systems Informatics and Modeling Quarterly, (9):67–
83.
Solodovnikova, D. and Niedrite, L. (2018). Towards a
data warehouse architecture for managing big data
evolution. In Proceedings of the 7th International
Conference on Data Science, Technology and Ap-
plications, DATA 2018, page 63–70, Setubal, PRT.
SCITEPRESS - Science and Technology Publications,
Lda.
Solodovnikova, D. and Niedrite, L. (2020). Change discov-
ery in heterogeneous data sources of a data warehouse.
In Databases and Information Systems, pages 23–37,
Cham. Springer International Publishing.
Solodovnikova, D., Niedrite, L., and Niedritis, A. (2019).
On metadata support for integrating evolving hetero-
geneous data sources. In New Trends in Databases
and Information Systems, pages 378–390, Cham.
Springer International Publishing.
Sumbaly, R., Kreps, J., and Shah, S. (2013). The big data
ecosystem at linkedin. In Proceedings of the 2013
ACM SIGMOD International Conference on Manage-
ment of Data, SIGMOD ’13, page 1125–1134, New
York, NY, USA. ACM.
Wang, Z., Zhou, L., Das, A., Dave, V., Jin, Z., and Zou, J.
(2020). Survive the schema changes: Integration of
unmanaged data using deep learning. arXiv preprint
arXiv:2010.07586.
Wojciechowski, A. (2018). Etl workflow reparation by
means of case-based reasoning. Information Systems
Frontiers, 20(1):21–43.
Managing Evolution of Heterogeneous Data Sources of a Data Warehouse
117