for models automatic transformations. Our approach
starts from a document-oriented NoSQL database
and extracts automatically its physical model. As
discussed in the related work, few solutions have
dealt with the NoSQL database model extraction. To
the best of our knowledge, none of the existing
contribution has treated the links between
collections.
The remainder of the paper is structured as
follows. Section 2 motivates our work using a case of
study in the healthcare field. Section 3 introduces our
NoSQL database model extraction process. Section 4
reviews previous work. Section 5 details our
experiments as well as the validation of our process.
Finally, Section 6 concludes the paper and announces
future work.
2 ILLUSTRATIVE EXAMPLE
To motivate and illustrate our work, we relied on a
case study in the healthcare field that we have used
in previous work (Abdelhedi, 2017). This case study
concerns international scientific programs for
monitoring patients suffering from serious diseases.
The main goal of this program is (1) to collect data
about diseases development over time, (2) to study
interactions between different diseases and (3) to
evaluate the short and medium-term effects of their
treatments. The medical program can last up to 3
years. Data collected from establishments involved in
this kind of program have the features of Big Data
(the 3 V): Volume: the amount of data collected from
all the establishments in three years can reach several
terabytes. Variety: data created while monitoring
patients come in different types; it could be (1)
structured as the patient’s vital signs (respiratory rate,
blood pressure, etc.), (2) semi-structured document
such as the package leaflets of medicinal products,
(3) unstructured such as consultation summaries,
paper prescriptions and radiology reports. Velocity:
some data are produced in continuous way by
sensors; it needs a [near] real time process because it
could be integrated into a time-sensitive processes
(for example, some measurements, like temperature,
require an emergency medical treatment if they cross
a given threshold).
This is a typical example in which the use of a
NoSQL system is suitable. On the one hand, in the
medical application, briefly presented above, the
database contains structured data, data of various
types and formats (explanatory texts, medical
records, x-rays, etc.), and big tables (records of
variables produced by sensors). On the other hand,
NoSQL data stores are ideally suited for this kind of
applications that use large amounts of disparate data.
Therefore, we are convinced that a NoSQL DBMS,
like MongoDB, is the most adapted system to store
the medical database.
As mentioned before, this kind of systems operate
on schema-less data model. Nevertheless, there is
still a need for the database model in order to know
how data is structured and related in the database and
then to express queries. Regarding the medical
application, doctors enter measures regularly for a
cohort of patients. They can also record new data in
cases where the patient's state of health evolve over
time. Few months later, they will analyze the entered
data in order to follow the evolution of the pathology.
For this, they need the database model to express
their queries.
In our view, it’s important to have a precise and
automatic solution that guides and facilitates the
database model extraction task within NoSQL
systems. For this, we propose the ToNoSQLModel
process presented in the next section that extracts the
physical model of a database stored in MongoDB.
This model is expressed using the JSON format.
3 ToNoSQLModel PROCESS
This article focuses on extracting the model from a
NoSQL database with the "schema less" property.
We limit ourselves to the document-oriented type
which is the most complete in terms of expression of
links (use of references and nesting). For this, we
propose the ToNoSQLModel process which
automatically extracts the model from a document-
oriented NoSQL database.
The ToNoSQLModel process is based on OMG's
Model Driven Architecture (Hutchinson, 2011). We
recall below the outlines of this model transformation
approach. MDA is a formal framework for
formalizing and automating model transformations.
The purpose of this architecture is to describe
separately the functional specifications and
implementation specifications of an application on a
given platform. For this, MDA uses three models
representing the abstraction levels of the application.
These are (1) the Computational Independent Model
(CIM) describing the services that the application
must provide to meet the needs of users, (2) the
analysis and design model (PIM for Platform
Independent Model) which defines the structure and
the behavior of the system without indicating the
execution platform and (3) the model of code (PSM
for Platform Specific Model) which is the projection
KDIR 2019 - 11th International Conference on Knowledge Discovery and Information Retrieval
146