original solutions. We plan to use HDFS for storage
of raw source data and intermediate levels of the
data highway and Apache Hive for a data
warehouse. Apache Kylin will be suitable to
implement the functionality of the cube engine since
it is capable of producing cubes from Hive data
structures. Cubes pre-computed by Kylin need to be
stored in Apache HBase database. We plan to
employ a RDBMS for the metastore and possibly
Pentaho software to implement ELT processes.
Finally, since no existing products support big data
evolution to the full extent, we will implement the
original solutions for the metadata management tool
and the adaptation component.
4 CONCLUSIONS
The main contribution of this paper is a data
warehouse architecture that on one hand allows to
perform different kinds of analytical tasks, including
OLAP-like analysis, on big data loaded from
multiple heterogeneous data sources with different
latency. On the other hand, our proposed
architecture is capable of processing changes in data
sources as well as evolving analysis requirements.
We described the components of the architecture,
the necessary metadata and gave examples of
changes that are supported by the architecture
together with their implementations within the
architecture.
Our future research directions include a
construction of metadata models to describe
schemata of the data highway, requirements for data,
source data and changes. The main challenge here
would be to determine metadata of non-structured or
semi-structured big data and the possible solution is
to leverage meta-learning.
To enable handling of big data evolution,
algorithms for automatic and semi-automatic change
detection and treatment are necessary. Since change
detection may be impossible at the data source layer,
the possible solution would be to specify constraints
on data items incoming from data sources and detect
violation of such constraints to discover evolution.
ACKNOWLEDGEMENTS
This work has been partly supported by the
European Regional Development Fund (ERDF)
project No. 1.1.1.2./VIAA/1/16/057 “Handling
Adaptation of Big Data Warehouse” and by
University of Latvia project No. AAP2016/B032
“Innovative information technologies".
REFERENCES
Abaker, I., Hashem, T., Yaqoob, I., Anuar, N.B., Mokhtar,
S., Gani, A., Khan, S.U., 2015. The rise of “big data”
on cloud computing: Review and open research issues.
In Information Systems, 47(C), pp. 98-115.
Ahmed, W., Zimányi, E., Wrembel, R., 2014. A Logical
Model for Multiversion Data Warehouses. In 16th
International Conference on Data Warehousing and
Knowledge Discovery, pp. 23-34.
Apache Kylin Overview [Online]. Available at:
http://kylin.apache.org (Accessed: 27 April 2018).
Bentayeb, F., Favre, C., Boussaid, O., 2008. A User-
driven Data Warehouse Evolution Approach for
Concurrent Personalized Analysis Needs. In
Integrated Computer-Aided Engineering, 15(1), pp.
21-36.
Chen, S., 2010. Cheetah: A High Performance, Custom
Data Warehouse on Top of MapReduce. In VLDB
Endowment, 3(2), pp. 1459-1468.
Chen, W., Wang, H., Zhang, X., 2017. An optimized
distributed OLAP system for big data. In 2nd IEEE
International Conference on Computational
Intelligence and Applications, pp. 36-40.
Cohen, J., Dolan, B., Dunlap, M., Hellerstein, J.M.,
Welton, C., 2009. MAD Skills: New Analysis
Practices for Big Data. In VLDB Endowment, pp.
1481-1492.
Cuzzocrea, A., Bellatreche, L., Song, I., 2013. Data
Warehousing and OLAP over Big Data: Current
Challenges and Future Research Directions. In 16th
international workshop on Data warehousing and
OLAP, pp. 67-70.
George, L., 2011. HBase: the definitive guide. O’Reilly
Media Inc.
Golfarelli, M., Lechtenbörger, J., Rizzi, S., Vossen, G.,
2006. Schema versioning in data warehouses:
Enabling cross-version querying via schema
augmentation. In Data & Knowledge Engineering,
59(2), pp. 435-459
Kaisler, S., Armour, F., Espinosa, J.A., Money, W., 2013.
Big Data: Issues and Challenges Moving Forward. In
46th Hawaii International Conference on System
Sciences, pp. 995-1004.
Kimball, R., Ross, M., 2013. The Data Warehouse
Toolkit: The Definitive Guide to Dimensional
Modeling. John Wiley & Sons, Inc., 3rd edition.
Malinowski, E., Zimányi, E., 2008. A Conceptual Model
for Temporal Data Warehouses and Its Transformation
to the ER and the Object-Relational Models. In Data
& Knowledge Engineering, 64(1), pp. 101-133.
Nadal, S., Romero, O., Abelló, A., Vassiliadis, P.,
Vansummeren, S., 2017. An integration-oriented
ontology to govern evolution in big data ecosystems.
Towards a Data Warehouse Architecture for Managing Big Data Evolution
69