performance of complex joint processes. Also,
storing all entities as embedded documents in a single
collection is not useful because it will produce a large
amount of unnecessary and inconsistent data.
Additionally, all data would be uploaded when
updated, thereby reducing performance. Therefore,
document-oriented databases should be designed
using embedded and reference document
technologies to improve synchronization when
updating redundant data (Atzeni et al., 2016). In
addition to that, the study by Imam et al. (2018)
mentioned issues that still need to be addressed to
implement a document-oriented database, such as
how to represent one-to-many relationships in
document-oriented databases, as well as how and
when to use reference documents instead of
embedding documents. However, Oliveira, Oliveira,
and Alturas (2018) found that there are no
investigations to understand the migration process or
the methodology of migrating from a relational
database to a document-oriented database.
The aim of this study is to facilitate the process of
transformation of the relational database schema to a
document-oriented data schema through two
concepts: the first is clarifying the embedded
document (de-normalization) relationships by storing
the sub-document into a super-document collection;
the second is using the reference document to
normalize the relationship by linking the collections
with a foreign key.
2 BACKGROUNDS AND
RELATED WORK
The previous model addresses how to transform a
strong entity by creating a new collection and
transform the relationships of one-to-one by
embedded document without taking into
consideration the size of datasets and not addressing
how to apply the embedded and reference document
for other relationship types such as one-to-many,
many-to-many, and unary relationship. Additionally,
the weak entity has been transformed by using new
collections, while it should belong to the strong entity
as an embedded document to avoid many join
operations between many collections.
Many researchers have proposed methods to
migrate relational databases to the document-oriented
database (Corbellini, Mateos, Zunino, Godoy, &
Schiaffino, 2017; El Alami & Bahaj, 2016; Goyal,
Swaminathan, Pande, & Attar, 2016; Győrödi,
Győrödi, Pecherle, & Olah, 2015; Hanine, Bendarag,
& Boutkhoum, 2016; Imam, Basri, Ahmad, Watada,
& González-Aparicio, 2018; Karnitis & Arnicans,
2015; Mason, 2015; Stanescu, Brezovan, &
Burdescu, 2016, 2017; Yoon, Jeong, Kang, & Lee,
2016). For instance, El Alami and Bahaj (2016);
Hanine et al. (2016); Mason (2015); Stanescu et al.
(2016, 2017) have focused on migrating a relational
database to a document-oriented database based on
the concept of embedded and reference documents.
However, these migration methods are facing
various issues; the first issue is that no specification
can be recognized to define a schema for a document-
oriented database due to the various ways of storage,
management, and implementation in document-
oriented databases (Goyal et al., 2016). The lack of
presenting a schema led to present many challenges
and complex problems in migration because
designing a schema for the document-oriented
database is important for defining the principles and
overcoming the issues of relationship types for
document-oriented databases (Truică, Apostol,
Darmont, & Pedersen, 2021). Also, it may lead to
incorrect or inappropriate schema design, especially
when handling relationships based on normalizing
and de-normalizing data. For instance, the method of
Stanescu et al. (2017), did not properly migrate all the
database properties especially, the multi-values, weak
entity, and relationship types. Some migration result
is an embedded document while they should be
migrated by using an array data type as it contains one
field with many values. In addition, if there is any
table refereed by more than two other tables and has
more than one foreign key. These cases were missing
in the Stanescu et al. (2017) algorithm.
Additionally, there is no technique method to
normalize or de-normalize data to implement the
embedded and reference document for handling the
various types of relationships (Hanine et al., 2016;
Mehmood et al., 2017). According to Mehmood,
Culmone, and Mostarda (2017), normalization
(reference document) and de-normalization
(embedded document) are the two techniques that
must be considered when designing a schema. These
techniques can affect the performance and storage
effectively as the databases grow rapidly. González-
Aparicio et al. (2017) observed that the normalization
of the data model is one of the important research
issues and there are no standard principles of
normalization in the document-oriented database.
The transformation rules of previous work have
mapped the relational database schema to the
document-oriented database directly without
considering any specification (Varga et al., 2016;
Mehmood et al., 2017; Mior et al., 2017; Imam et al.,