Assuming that the encoding algorithm breaks one
document into blocks with a same size, the attacker
can also group blocks by their size in order to easily
detect blocks belonging to the same document. The
privacy is, once again, threatened. To solve this is-
sue, encoding algorithms could make blocks with ran-
dom sizes but it is easier to design algorithms that
create blocks of equal sizes for all documents. As
the design of encoding algorithms is already compli-
cated, this option is rejected. Another option is to use
blocks with a fixed size, either using padding or mak-
ing small blocks. In order to have blocks with equal
sizes, the system fills blocks with useless data, called
padding, until the required size. The main disadvan-
tage of padding is to increase the size of the data to
upload for one document. An alternative of padding is
to build small blocks with a fixed size to increase the
number of blocks with equal sizes on each provider.
In that case, the number of blocks generated for one
document increases in order to drown blocks associ-
ated to one document in a sea of blocks. This solu-
tion allows to protect the privacy without uploading
more data than necessary but increase dramatically
the number of blocks to upload. We choose the last
option by counting on simultaneous downloads and
uploads of blocks to reduce the loss of performance.
3.2 Metadata Storage
Once users have encoded their documents, the meta-
data of these documents contains the entire informa-
tion required to reconstruct them. So, the protec-
tion of metadata is a determining factor to preserve
the user privacy. Moreover, metadata must be saved
in a safe storage because the loss of metadata pre-
vents users to reconstruct their documents. As a com-
puter and a mobile phone can be stolen, the backup
of metadata must be stored in a remote storage. In
our approach, this remote storage is the sky storage
service itself. To minimize the potential damage of a
theft, metadata is never written to the local drive of
the user device but it remains in the device memory.
So, when users close the application, their metadata
is destroyed.
The permanent storage of metadata is performed
by breaking it into blocks from the encoding algo-
rithm before sending the generated blocks to the cloud
storage service. Two pieces of information are re-
quired to store the metadata blocks: a list of cloud
providers and a list of block names. To store the
metadata, all registered providers are used. Conse-
quently, users must obtain access to every provider
to retrieve their metadata from other devices. So,
users must manually connect to the right providers
before retrieving their metadata, and so, their docu-
ments. This seems to be restrictive but it is necessary
for the security of the application in order to avoid to
give cloud provider credentials, mostly passwords, to
a third-party. However, credentials are stored on user
devices, and so, users have to authenticate just once
on cloud providers. An additional password is asked
at the storage service startup to protect user docu-
ments in case the computer or the phone is stolen.
To compute the list of metadata block names, cryp-
tographic functions calculate the block names from
the password, the name of the provider and the email
of the provider account. For example, the function
SHA-1 could be used for calculating names of blocks
required to save the metadata of the user from its pass-
word as follows: SHA1(password + providerName +
email).
In conclusion, the retrieval of metadata is based
on (i) the correct list of cloud providers, (ii) the cre-
dentials of these cloud accounts, (iii) the block names
defined from passwords, emails and provider names
of cloud provider accounts.
3.3 Architecture Comparison
In the storage service proposed in this paper, users
control their data by choosing the encoding algorithm
and by avoiding that cloud providers can read their
metadata. In TrustyDrive (Pottier and Menaud, 2016),
the anonymity provides a stronger data privacy by
hiding the owner of the data but it is achieved by us-
ing a third-party but users must trust the cloud storage
service for not saving metadata. In the remainder of
this section, we discuss the benefits and the disadvan-
tages of connecting users directly to cloud providers
or by the means of a third-party.
A direct connection between users and cloud
providers means that users create one account on each
cloud provider. For an architecture with three cloud
providers, the creation and the connection do not take
a long time but, for seven or more cloud providers,
this operation can quickly become cumbersome and
tricky. Moreover, inexperience people may find diffi-
cult to know exactly the features of the cloud provider,
for example, the transfer rates, the location of the
data, etc. This features could be analyzed and pro-
posed by expert third-parties. The last, but not the
least, disadvantage is the number of blocks stored on
cloud providers. In the architecture without any third-
party, cloud providers are used by one single user. So,
the number of blocks could not be sufficient to ensure
the data privacy and credentials to connect to cloud
providers can lead to the user identity.
By using a third-party as a proxy to connect to
CLOSER 2017 - 7th International Conference on Cloud Computing and Services Science
380