such as files security, availability of the storage
service (Bo and Hong, 2015), etc. In the following
sub part, we present some studies about customer
files management in clouds.
2.1 Current Techniques
The availability of the customer files is the
constraint that should be respected, to consider such
a constraint, Google File System (GFS) duplicate
customer’s data around multiple data centers in
order to make them available from any place
(Unuvar and Tantawi, 2015), besides if one of the
data centers breaks down, the others will ensure the
performance, including scalability, reliability, and
the availability of the stored files.
Currently, Google is using multiple GFS clusters,
the largest ones have about 1000 storage nodes, and
about 300TB of storage space (Feng and Meng,
2012).
In the case of Hadoop Distributed File System
(HDFS), the customer’s file is broken up into 64 MB
chunks, every chunk is replicated for three times
around the data centers (Kulkarni and Tracey, 2004).
Amazon Simple Storage Service (S3) makes
some snapshots to every data center in order to
ensure availability and security of customer’s file
(Ramgovind and Smith, 2010).
2.2 Limitations of Current Techniques
We studied above the most used techniques in the
term of file storage systems, such as GFS, HDFS
and S3, the providers of these techniques are the
dominants of the field of Cloud environments and
they have the biggest number of customers.
What we can distinguish from these methods is
that they have some common problems. In fact, if
there is a files redundancy in a data center used by
one of these providers, all of these methods will be
unable to prevent it and to save the storage space,
consequently, these methods have a limitation in
term of files management around the data centers.
3 PROPOSED MODEL
3.1 Problem Statement
Actually, the problem of files redundancy before the
duplication phase in Cloud environments is one of
the critical problems for providers, because it
represents a loss of storage space, this problem is
due to the increase of the number of customers
around the world, and the popularity of the storage
service.
Figure 1: File Redundancy.
Figure 1 represents the redundant files in a data
center caused by different customers before the
duplication, we can distinguish that many users can
upload the same file to the Data Center (DC), and
consequently this scenario represents loss that can be
accumulated after the duplication.
Precisely, the duplication of a redundant file
represents a loss of storage space that can be
calculated by the following formula:
L = S x N x M (1)
The equation (1) represents the loss of the
storage space in the case of duplicating a redundant
file:
• S: the size of the redundant file.
• N: number of the redundancy.
• M: number of data centers used for the
duplication.
The optimal case here is to delete all the
redundant files before the duplication phase:
OC = M (2)
The equation (2) represents the Optimal Case
(OC), which is the duplication of only one file from
all the redundant files, and delete all the others.
In fact, the duplication phase insures the
performance, scalability, reliability, and the
availability of the stored files. Indeed, the
availability and security of the customer’s file are
constraints that should be taken into consideration
by Cloud providers, consequently, all the solution in
our context should respect these constraints.
3.2 Proposed Solution
Firstly, we have to mention that our solution will
take the place just before the duplication process,