databases in a Data Lake. The paper (Diamantini et
al., 2018) proposes an approach to structure the data
of a Data Lake by linking the data sources in the form
of a graph composed of keywords. Other works
propose a metamodel unifying several data source
metamodels thus allowing the unification of data
from multiple sources. The authors in (Candel, Ruiz,
& García-Molina, 2021) proposed a metamodel
unifying the logical schemas of the four most
common types of NoSQL systems and relational
systems. Our solution is based on extracting data from
a data lake and loading it into a data warehouse. To
do so, we first proposed two metamodels representing
the physical models of each database: the first
metamodel concerns relational databases as source,
the second metamodel concerns document-oriented
NoSQL databases as target database of our solution.
We used EMF as our metamodeling tools. EMF
allowed us to formalize the transformation rules from
a source metamodel to the target metamodel. We
relied on the QVT standard to express our
transformation rules. The solution we propose allows
the interrogation of data contained in a Data Lake
thanks to the creation of a Data Warehouse.
9 CONCLUSION AND
PERSPECTIVES
This paper proposed a process to ingest data from a
Data Lake to a Data Warehouse; this one is made of
a unique NoSQL database and the Data Lake contains
several databases. We have limited the content of the
Data Lake to relational databases. Three modules
ensure the ingestion of the data. The CreateDW
module transforms each relational database into a
unique NoSQL database by applying MDA rules.
This mechanism will be used and extended to
transform other types of databases in the Data Lake.
The ConvertLinks module translates relational links
(keys) into references in accordance with the
principles of object databases supported by the
OrientDB system. Finally, the MergeClasses module
merges semantically equivalent classes from different
Data Lake databases; this merge is based on an
ontology provided by business experts.
Currently, we are continuing our work on the
ingestion of other types of data sources from a Data
Lake. Indeed, the Data Lake of our medical case study
contains various database types.
REFERENCES
Alotaibi, R., Cautis, B., Deutsch, A., Latrache, M.,
Manolescu, I., & Yang, Y. (2020). ESTOCADA :
Towards scalable polystore systems. Proceedings of the
VLDB Endowment, 13(12), 2949‑2952.
Bruel, J., Combemale, B., Guerra, E., Jézéquel, J., Kienzle,
J., Lara, J., Mussbacher, G., and al. (2019). Comparing
and classifying model transformation reuse approaches
across metamodels. Software and Systems Modeling.
Candel, C. J. F., Ruiz, D. S., & García-Molina, J. J. (2021).
A Unified Metamodel for NoSQL and Relational
Databases. ArXiv:2105.06494 [cs].
Chickerur, S., Goudar, A., & Kinnerkar, A. (2015).
Comparison of Relational Database with Document-
Oriented Database (MongoDB) for Big Data
Applications. 2015 8th International Conference on
Advanced Software Engineering Its Applications
(ASEA) (p. 41‑47).
Diamantini, C., Lo Giudice, P., Musarella, L., Potena, D.,
Storti, E., & Ursino, D. (2018). A New Metadata Model
to Uniformly Handle Heterogeneous Data Lake
Sources: ADBIS 2018 Short Papers and Workshops,
AI*QA, BIGPMED, CSACDB, M2U, BigDataMAPS,
ISTREND, DC, Budapest, Hungary, September, 2-5,
2018, Proceedings (p. 165‑177).
Duggan, J., Kepner, J., Elmore, A. J., & Madden, S. (2015).
The BigDAWG Polystore System. SIGMOD Record,
44(2), 6.
El Malki, M., Kopliku, A., Sabir, E., & Teste, O. (2018).
Benchmarking Big Data OLAP NoSQL Databases. In
N. Boudriga, M.-S. Alouini, S. Rekhis, E. Sabir, & S.
Pollin (Éds.), Ubiquitous Networking, Lecture Notes in
Computer Science (Vol. 11277, p. 82‑94). Cham:
Springer International Publishing.
Erraissi, A., & Banane, M. (2020). Managing Big Data
using Model Driven Engineering: From Big Data Meta-
model to Cloudera PSM meta-model (p. 1235‑1239).
Hanine, M., Bendarag, A., & Boutkhoum, O. (2015). Data
Migration Methodology from Relational to NoSQL
Databases, 9(12), 6.
Khine, P. P., & Wang, Z. S. (2018). Data lake : A new
ideology in big data era. ITM Web of Conferences, 17.
Liyanaarachchi, G., Kasun, L., Nimesha, M., Lahiru, K., &
Karunasena, A. (2016). MigDB - relational to NoSQL
mapper. 2016 IEEE International Conference on
Information and Automation for Sustainability (ICIAfS)
(p. 1‑6).
Mahmood, A. A. (2018). Automated Algorithm for Data
Migration from Relational to NoSQL Databases. Al-
Nahrain Journal for Engineering Sciences, 21(1), 60.
Meehan, J., Tatbul, N., Aslantas, C., & Zdonik, S. (s. d.).
Data Ingestion for the Connected World, 11.
Nargesian, F., Zhu, E., Miller, R. J., Pu, K. Q., & Arocena,
P. C. (2019). Data lake management: Challenges and
opportunities. Proceedings of the VLDB Endowment,
12(12), 1986‑1989.
Stanescu, L., Brezovan, M., & Burdescu, D. D. Federated
Conference on Computer Science and Information
Systems (2016). Automatic Mapping of MySQL
Databases to NoSQL MongoDB (p. 837‑840).