Insert Book
Books
Book_Id
Book_Title
Books_by_author
Author_Id
Book_Id
1
Inconsistency 
produced: Book was 
not inserted in 
Books_by_author
2
 
Figure 1: Logical integrity broken. 
As the number of tables in a database increases, 
so too does the difficulty of maintaining the 
consistency. This article approaches this problem, 
proposing a solution based on a conceptual model 
connected to the Cassandra datamodel that 
automatically keeps the integrity of the data. The 
contributions of this paper are as follows: 
1.  A method that automatically identifies the 
tables that need maintenance of the integrity 
and proposes how they may be maintained. 
2.  An evaluation in a case study of the proposed 
method. 
This paper is organized as follows. In section 2, 
we review the current state of the art. In section 3, we 
describe our approach to keep the logical integrity of 
the data. In section 4, we evaluate the results of 
applying our method to keep the logical integrity in a 
case study. The article finishes in section 5 with the 
conclusions and the proposed future work. 
2  RELATED WORK 
Most works that study the integrity of the data are 
focused on the physical integrity of the data 
(Datastax, 2017). This integrity is related to the 
consistency of a row replicated throughout all of the 
replicas in the Cassandra cluster. However, in this 
paper we will treat the problem of the logical 
integrity, which is related to the integrity of the 
information repeated among several table. 
The official team of Cassandra has studied the 
problem of keeping the data integrity by developing 
the feature “Materialized views” (Datastax, 2015). 
The “Materialized views” are table-like structures 
where the denormalization is handled automatically 
on the server-side, ensuring the integrity. Usually, in 
Cassandra data modelling, a table is created to satisfy 
one specified query. However, using this feature, the 
created tables (named base tables) are meant to store 
data that will be queried in several ways through 
Materialized Views, which are query-only tables. 
Every modification of the data in a base table is 
reflected in the materialized views, it not being 
possible to write data directly in a materialized view. 
Each materialized view is synchronized with only one 
base table, not being possible to have information 
from more than one table, unlike what happens in the 
materialized views of the relational databases. This 
means that if there is a query that involves 
information stored in more than one base table, it is 
not possible to use Materialized Views to satisfy it, 
and the creation of a normal table is required. 
Related to the aforementioned problem is the 
absence of Join operations in Cassandra. A study 
(Peter, 2015) has researched the possibility of adding 
the Join operation in Cassandra. This work achieves 
its objective of implementing the join by modifying 
the source code of Cassandra 2.0 but it still has room 
for improvement regarding the performance of the 
implementation.  
There have also been studies (Chebotko et al., 
2015) that have given a great deal of importance to 
the conceptual model, such as where a new 
methodology for Cassandra data modelling is 
proposed that uses a conceptual model to create the 
Cassandra tables in addition to the queries. This is 
achieved by the definition of a set of data modelling 
principles, mapping rules, and mappings. Regarding 
our problem, the work in (Chebotko et al., 2015) 
introduces an interesting concept: the use of a 
conceptual model that is directly related to the tables 
of a Cassandra database, an idea that we will use for 
our approach. 
The conceptual model is the core of the previous 
work (Chebotko et al, 2015) but it is unusual to have 
such a model in NoSQL databases. Regarding this, 
there have been studies that propose the generation of 
a conceptual model based on the database tables. One 
of these works (Ruiz et al., 2015) is focused on 
generating schemas for document databases but 
claims that the research could be used for other types 
of NoSQL databases. These schemas are obtained 
through a process that, starting from the original 
database, generates a set of entities, each one 
representing the information stored in the database. 
The final product is a normalized schema that 
represents the different entities and relationships. 
3  KEEPING THE LOGICAL 
INTEGRITY 
In Cassandra there is no mechanism to ensure the 
integrity of the data, and therefore it must be 
controlled in the client application that works with the 
Cassandra database. We have identified two types of 
modifications that can break the logical integrity: 
  Modifications of the logical model:  When 
there is a modification regarding the tables,