5.4 Validation
Concerning the model extraction of schema-less
NoSQL databases, our approach allows to display to
the developer simultaneously a conceptual model and
a physical model; the first to understand the semantics
of the database and the second to write queries. To
evaluate the relevance of our approach, our prototype
(section 4) was implemented by three developers at
Trimane, a digital services company specialized in
business intelligence and Big Data. The three
experienced developers (IT consulting engineers)
were tasked with providing maintenance for three
separate applications. None of the developers know,
previously, the data model of the concerned
applications. For each application, each developer
writes ten queries that have an increasing complexity
according to three different cases: (1) without any
data model, (2) with the physical data model or (3)
with the both conceptual and physical models.
Figures 7(a) and 7(b) show respectively an example
of the conceptual and physical models corresponding
to one of the three applications. Note that due to lack
of place, we present data models (conceptual and
physical one) of only one application.
We should also highlight that for reasons of
visibility, models are represented to the user in the
same screen and with an appropriate format: JSON
for the physical model and the graphic format for the
conceptual one. Each time we click on a class on the
conceptual model, we will have its equivalent on the
physical model. For example, the part of the physical
model written in bold corresponds to the selected
class (Trials).
Each database is associated with a set of queries
whose natural language statements are provided to the
three developers. In Table 3, we calculated the
average time of writing the queries by the three
developers in each situation: (1) without any data
model, (2) with the physical data model or (3) with
the both conceptual and physical models.
Our initial hypothesis was verified in the
situations considered. This establishes that a
knowledge of semantics and data structure allows the
developer to write queries faster on a schema-less
NoSQL database. The small difference noted between
the use of the single physical diagram and the use of
the two models (conceptual and physical), is probably
due to the experience of the three developers.
6 CONCLUSION AND FUTURE
WORK
Our work is part of Big Data databases. They are
currently dealing with the reverse engineering
mechanisms of schema-less NoSQL databases to
provide users with models to manipulate NoSQL
databases.
In this article, we have proposed an automatic
process ToConceptualModel which focuses on the
transformation of a physical model into a conceptual
model represented using a UML class diagrams by
applying a set of rules. The resulting conceptual
model makes it easier for developers and decision-
makers to understand the database and write queries.
To formalize and automate our process, we use the
Model Driven Architecture (MDA) proposed by the
OMG, which provides a formal framework for
automating model transformations.
The major contribution of our solution is the
consideration of structured attributes, association
relationships, composition relationships as well as
association classes. We have experimented our
process on the case of a medical application which
relates to scientific programs of follow-up of
pathologies; the database is stored on a document-
oriented NoSQL Database.
As future work, we plan to complete our
transformation process to have more semantics in the
conceptual model by considering other types of links
such as inheritance, aggregation and N-ary.
REFERENCES
Angadi, A. B., & Gull, K. C. (2013). Growth of New
Databases & Analysis of NOSQL Datastores.
International Journal of Advanced Research in
Computer Science and Software Engineering, 3, 1307-
1319.
Baazizi, M. A., Lahmar, H. B., Colazzo, D., Ghelli, G., &
Sartiani, C. (2017, March). Schema inference for
massive JSON datasets. In Extending Database
Technology (EDBT).
Baazizi, M. A., Colazzo, D., Ghelli, G., & Sartiani, C.
(2019). Parametric schema inference for massive JSON
datasets. The VLDB Journal, 1-25.
Bondiombouy, C. (2015). Query processing in cloud
multistore systems. In BDA : Bases de Données
Avancées.
Budinsky, F., Steinberg, D., Ellersick, R., Grose, T. J., &
Merks, E. (2004). Eclipse modeling framework: a
developer's guide. Addison-Wesley Professional.
Chen, CL Philip et Zhang, Chun-Yang. Data-intensive
applications, challenges, techniques and technologies: