Insert Book
Books
Book_Id
Book_Title
Books_by_author
Author_Id
Book_Id
1
Inconsistency
produced: Book was
not inserted in
Books_by_author
2
Figure 1: Logical integrity broken.
As the number of tables in a database increases,
so too does the difficulty of maintaining the
consistency. This article approaches this problem,
proposing a solution based on a conceptual model
connected to the Cassandra datamodel that
automatically keeps the integrity of the data. The
contributions of this paper are as follows:
1. A method that automatically identifies the
tables that need maintenance of the integrity
and proposes how they may be maintained.
2. An evaluation in a case study of the proposed
method.
This paper is organized as follows. In section 2,
we review the current state of the art. In section 3, we
describe our approach to keep the logical integrity of
the data. In section 4, we evaluate the results of
applying our method to keep the logical integrity in a
case study. The article finishes in section 5 with the
conclusions and the proposed future work.
2 RELATED WORK
Most works that study the integrity of the data are
focused on the physical integrity of the data
(Datastax, 2017). This integrity is related to the
consistency of a row replicated throughout all of the
replicas in the Cassandra cluster. However, in this
paper we will treat the problem of the logical
integrity, which is related to the integrity of the
information repeated among several table.
The official team of Cassandra has studied the
problem of keeping the data integrity by developing
the feature “Materialized views” (Datastax, 2015).
The “Materialized views” are table-like structures
where the denormalization is handled automatically
on the server-side, ensuring the integrity. Usually, in
Cassandra data modelling, a table is created to satisfy
one specified query. However, using this feature, the
created tables (named base tables) are meant to store
data that will be queried in several ways through
Materialized Views, which are query-only tables.
Every modification of the data in a base table is
reflected in the materialized views, it not being
possible to write data directly in a materialized view.
Each materialized view is synchronized with only one
base table, not being possible to have information
from more than one table, unlike what happens in the
materialized views of the relational databases. This
means that if there is a query that involves
information stored in more than one base table, it is
not possible to use Materialized Views to satisfy it,
and the creation of a normal table is required.
Related to the aforementioned problem is the
absence of Join operations in Cassandra. A study
(Peter, 2015) has researched the possibility of adding
the Join operation in Cassandra. This work achieves
its objective of implementing the join by modifying
the source code of Cassandra 2.0 but it still has room
for improvement regarding the performance of the
implementation.
There have also been studies (Chebotko et al.,
2015) that have given a great deal of importance to
the conceptual model, such as where a new
methodology for Cassandra data modelling is
proposed that uses a conceptual model to create the
Cassandra tables in addition to the queries. This is
achieved by the definition of a set of data modelling
principles, mapping rules, and mappings. Regarding
our problem, the work in (Chebotko et al., 2015)
introduces an interesting concept: the use of a
conceptual model that is directly related to the tables
of a Cassandra database, an idea that we will use for
our approach.
The conceptual model is the core of the previous
work (Chebotko et al, 2015) but it is unusual to have
such a model in NoSQL databases. Regarding this,
there have been studies that propose the generation of
a conceptual model based on the database tables. One
of these works (Ruiz et al., 2015) is focused on
generating schemas for document databases but
claims that the research could be used for other types
of NoSQL databases. These schemas are obtained
through a process that, starting from the original
database, generates a set of entities, each one
representing the information stored in the database.
The final product is a normalized schema that
represents the different entities and relationships.
3 KEEPING THE LOGICAL
INTEGRITY
In Cassandra there is no mechanism to ensure the
integrity of the data, and therefore it must be
controlled in the client application that works with the
Cassandra database. We have identified two types of
modifications that can break the logical integrity:
Modifications of the logical model: When
there is a modification regarding the tables,