such as files security, availability of the storage 
service (Bo and Hong, 2015), etc. In the following 
sub part, we present some studies about customer 
files management in clouds. 
2.1 Current Techniques 
The availability of the customer files is the 
constraint that should be respected, to consider such 
a constraint, Google File System (GFS) duplicate 
customer’s data around multiple data centers in 
order to make them available from any place 
(Unuvar and Tantawi, 2015), besides if one of the 
data centers breaks down, the others will ensure the 
performance, including scalability, reliability, and 
the availability of the stored files. 
Currently, Google is using multiple GFS clusters, 
the largest ones have about 1000 storage nodes, and 
about 300TB of storage space (Feng and Meng, 
2012).  
In the case of Hadoop Distributed File System 
(HDFS), the customer’s file is broken up into 64 MB 
chunks, every chunk is replicated for three times 
around the data centers (Kulkarni and Tracey, 2004). 
Amazon Simple Storage Service (S3) makes 
some snapshots to every data center in order to 
ensure availability and security of customer’s file 
(Ramgovind and Smith, 2010). 
2.2  Limitations of Current Techniques 
We studied above the most used techniques in the 
term of file storage systems, such as GFS, HDFS 
and S3, the providers of these techniques are the 
dominants of the field of Cloud environments and 
they have the biggest number of customers. 
What we can distinguish from these methods is 
that they have some common problems. In fact, if 
there is a files redundancy in a data center used by 
one of these providers, all of these methods will be 
unable to prevent it and to save the storage space, 
consequently, these methods have a limitation in 
term of files management around the data centers. 
3 PROPOSED MODEL 
3.1 Problem Statement 
Actually, the problem of files redundancy before the 
duplication phase in Cloud environments is one of 
the critical problems for providers, because it 
represents a loss of storage space, this problem is 
due to the increase of the number of customers 
around the world, and the popularity of the storage 
service. 
 
Figure 1: File Redundancy. 
Figure 1 represents the redundant files in a data 
center caused by different customers before the 
duplication, we can distinguish that many users can 
upload the same file to the Data Center (DC), and 
consequently this scenario represents loss that can be 
accumulated after the duplication.  
Precisely, the duplication of a redundant file 
represents a loss of storage space that can be 
calculated by the following formula: 
L = S x N x M  (1)
The equation (1) represents the loss of the 
storage space in the case of duplicating a redundant 
file: 
•  S: the size of the redundant file. 
•  N: number of the redundancy. 
•  M: number of data centers used for the 
duplication. 
The optimal case here is to delete all the 
redundant files before the duplication phase: 
OC =  M   (2)
The equation (2) represents the Optimal Case 
(OC), which is the duplication of only one file from 
all the redundant files, and delete all the others. 
In fact, the duplication phase insures the 
performance, scalability, reliability, and the 
availability of the stored files. Indeed, the 
availability and security of the customer’s file are 
constraints that should be taken into consideration 
by Cloud providers, consequently, all the solution in 
our context should respect these constraints. 
3.2 Proposed Solution 
Firstly, we have to mention that our solution will 
take the place just before the duplication process,