Security-Aware Allocation of Replicated Data
in Distributed Storage Systems
Sabrina De Capitani di Vimercati
a
, Sara Foresti
b
, Giovanni Livraga
c
,
Pierangela Samarati
d
and Mauro Tedesco
Computer Science Department, Universit
`
a degli Studi di Milano, via Celoria 18, Milano, Italy
{sabrina.decapitani, sara.foresti, giovanni.livraga, pierangela.samarati}@unimi.it, mauro.tedesco@studenti.unimi.it
Keywords:
Security-Aware Data Allocation, Distributed Storage, Data Management, Replica Management, Optimal
Allocation.
Abstract:
Distributed storage systems offer scalable and cost-effective solutions for managing large data collections. A
critical factor for the adoption of these systems is the allocation of data (possibly including replicas) to the
storage nodes satisfying operational and security requirements, while ensuring economic effectiveness. Ap-
propriate data and replica management provides significant benefits, ranging from enhanced fault tolerance
and improved data availability, to reduced latency and optimized workload distribution. A suboptimal place-
ment of data and replicas can instead lead to excessive costs, security risks, and performance bottlenecks.
This paper proposes a novel model for permitting data owners to specify in a friendly manner complex data
and replica allocation constraints, and an approach for computing optimal (satisfying operational and security
requirements and minimizing costs) data allocations in distributed storage environments. Our work aims to
improve the reliability, security, and cost-effectiveness of distributed storage systems.
1 INTRODUCTION
The availability of distributed storage systems, pos-
sibly based on cloud/fog/edge paradigms, has revo-
lutionized how large and/or critical data collections
can be stored and managed, offering scalable, fault-
tolerant, and cost-effective solutions for different ap-
plication scenarios (e.g., (Russo Russo et al., 2024)).
To fully benefit from the features of these distributed
systems, it is critical to ensure the storage of data
in compliance with operational and security require-
ments, while maintaining economic effectiveness. An
essential aspect of this challenge concerns the opti-
mal allocation of data items (including their possi-
ble replicas) across the nodes. Careless or subop-
timal data placement can intuitively lead to exces-
sive costs, performance bottlenecks, and/or breaches
of security or operational requirements. These re-
quirements, often dictated by the data owner, may im-
pose constraints such as grouping related data items
(or data items that are frequently accessed together)
a
https://orcid.org/0000-0003-0793-3551
b
https://orcid.org/0000-0002-1658-6734
c
https://orcid.org/0000-0003-2661-8573
d
https://orcid.org/0000-0001-7395-4620
on the same node to facilitate joint visibility, ensur-
ing that sensitive collections are fragmented and dis-
tributed across separate nodes to avoid exposure, or
replicating data among multiple storage nodes. In
particular, replicating data can provide several ben-
efits. A first benefit is represented by enhanced fault
tolerance: replicas mitigate the risks of data loss due
to failures or network outages. Data replication can
also help improving data availability, as replicas per-
mit access to data even when some nodes become
unavailable, reducing downtime and ensuring service
continuity. It can also be beneficial to increase sys-
tem performance: for example, strategically placing
replicas closer to end-users can reduce latency and
improve data access speed, enhancing application re-
sponsiveness. Furthermore, it permits load balancing,
as replicas can help distributing workload across stor-
age nodes, preventing bottlenecks and optimizing re-
source utilization. When allocating data to the stor-
age nodes, a careful management of the replica plays
a critical role, as it directly influences system reliabil-
ity, performance, and cost.
To address this problem, we propose an approach
capable of computing optimal data allocations con-
sidering data replication, balancing two different ob-
jectives: minimizing economic cost on the one hand,
80
De Capitani di Vimercati, S., Foresti, S., Livraga, G., Samarati, P. and Tedesco, M.
Security-Aware Allocation of Replicated Data in Distributed Storage Systems.
DOI: 10.5220/0013484800003950
Paper published under CC license (CC BY-NC-ND 4.0)
In Proceedings of the 15th International Conference on Cloud Computing and Services Science (CLOSER 2025), pages 80-91
ISBN: 978-989-758-747-4; ISSN: 2184-5042
Proceedings Copyright © 2025 by SCITEPRESS Science and Technology Publications, Lda.
Figure 1: Reference scenario.
and satisfying all requirements imposed by the data
owner on the other hand. The contribution of this pa-
per is two-fold. First, we identify and formally define
possible requirements that a data owner may need to
impose to regulate the overall allocation of data and
replicas to the storage nodes. We then present an ap-
proach for computing optimal data allocations (i.e.,
allocations that respect all requirements while min-
imizing the overall economic cost) in a distributed
storage environment. We underline that our approach
is agnostic to (and does not focus on) specific stor-
age/processing frameworks (e.g., distributed file sys-
tems such as HDFS or IPFS) and considers a more
general distributed storage environment, interpreted
as a collection of nodes with storage capabilities that
can be connected and queried, making it applicable
to a broader range of distributed architectures beyond
traditional data processing ecosystems.
Our work aims at contributing to the body of liter-
ature on efficient and secure data outsourcing strate-
gies in distributed systems, with a specific focus on
the critical aspect of replica placement and manage-
ment. Our approach seeks to improve the reliability,
security, and cost-effectiveness of distributed storage
systems, making them more accessible and appealing
to data owners, across various application scenarios,
fostering trust and enabling efficient resource usage
in modern data outsourcing scenarios. The remainder
of this paper is organized as follows. Section 2 intro-
duces our modeling of nodes and data, and the notion
of allocation along with two families of requirements
to be considered when defining allocations. Section 3
discusses the first family of requirements, and char-
acterizes the concepts of acceptable nodes that fit the
characteristics and needs of the data items to be al-
located. Section 4 presents the second family of re-
quirements, which imposes restrictions on data and
replica allocation. Section 5 illustrates the problem of
computing optimal allocations, and an approach for
finding them. Section 6 discusses related works. Fi-
nally, Section 7 concludes the paper.
2 MODELING
We consider a scenario (see Figure 1) characterized
by a data owner who is willing to move their data
to the premises of external storage providers (e.g., on
cloud, edge, fog nodes). While interested in offload-
ing of storage and management of (possibly repli-
cated) data collections, the data owner needs to en-
force restrictions on the nodes storing the data in
terms of their characteristics (e.g., node location or
in-storage encryption algorithm adopted), as well as
in terms of which data items should be or cannot be
allocated to the same node. In the following, we in-
troduce our modeling of nodes and of data, together
with their replicas, and of data allocations.
Nodes Modeling. Our approach is designed to be
highly adaptable and not restricted to any specific dis-
tributed storage architecture. It is applicable to any
architecture that consists of multiple nodes, each po-
tentially having different characteristics (e.g., storage
capacity, performance, connectivity, security/privacy
features). The flexibility of the approach allows data,
including replicas, to be allocated dynamically across
nodes. This general applicability makes the proposed
Security-Aware Allocation of Replicated Data in Distributed Storage Systems
81
Attributes
Nodes prov type loc encr avail Price
v
1
prov1 cloud EU 3DES VH 40
v
2
prov1 cloud US DES H 30
v
3
prov3 cloud EU 3DES M 55
v
4
prov2 edge EU 3DES VH 95
v
5
prov2 edge US DES L 90
v
6
prov4 edge EU AES L 85
v
7
prov1 cloud EU 3DES VH 10
v
8
prov3 cloud EU 3DES M 25
v
9
prov2 edge EU 3DES VH 70
v
10
prov4 edge EU AES L 100
Figure 2: Abstract representation of a set of nodes and price
for their usages (USD/GB).
method suitable for diverse environments, including
cloud computing platforms (where centralized data
centers can host numerous virtualized storage nodes),
fog computing (with distributed devices closer to the
network edge for latency-sensitive applications), and
edge computing (where data are processed and stored
directly on edge devices such as IoT gateways).
In light of this generality, the notion of a “node”
is inherently dependent on the architecture considered
within the specific application scenario. For example,
in cloud computing, a “node” might correspond to a
virtual storage instance or even to a cloud plan, while
in edge computing, it could represent an individual
device or sensor. Regardless of its specific instanti-
ation, a “node” within our framework is defined as
the actual place where data items can be stored and
is characterized by attributes (e.g., capacity, perfor-
mance, accessibility, security features). This flexibil-
ity ensures that the proposed approach can accommo-
date various interpretations of a “node,” tailored to the
requirements of the considered application and archi-
tecture. We then define our problem as the problem
of determining which data item has to be allocated to
which node of the storage architecture.
For generality and in line with previous works
(e.g., (De Capitani di Vimercati et al., 2021a; De
Capitani di Vimercati et al., 2021b)), we assume an
abstract representation of nodes in terms of their at-
tributes: each node v is then represented as a vector
having, for each attribute of interest, one element with
the value assumed by it in v. Figure 2 illustrates an ex-
ample of 10 nodes v
1
, . . . , v
10
defined over attributes
prov (the provider managing the node), type (the
type of node), loc (the geographical location of the
node), encr (the encryption scheme supported by the
node), avail (the declared availability).
Data Modeling. A peculiarity of our scenario is the
consideration of the possibility of replicating data.
The computation of the data allocation to the nodes
must then consider both the original data items, as
well as their replicas. Each data item therefore has
Resource r Size (GB) Num. Replicas
clinical 1000 1
insurance 500 2
equipment 250 0
research 300 1
staff 100 1
admin 200 1
payroll 100 1
Figure 3: Resources of our running example with their size
and number of additional replicas.
an original version (i.e., the item itself), and 0 or
more additional replicas. Our approach is not re-
stricted to a specific type of data and, for general-
ity and to encompass a variety of different data mod-
els (e.g., structured/semi-structured data, documents,
files) in the reminder of this paper we refer to the data
items to be outsourced with the term resources. A
resource can then represent any single informative as-
set of the data owner, defined at a chosen granular-
ity level (from entire documents collections to par-
titioned data), which may (if so desired and done
by the owner) be replicated, and which needs to be
outsourced and hence allocated to a node (or a set
thereof).
Example 2.1. We refer our examples to the outsourc-
ing of the data assets of a hospital. The resources to
be outsourced are the following.
clinical: clinical information related to pa-
tient, such as their diagnosis, ongoing treatments,
past medical history, and so on.
insurance; information on patients’ insurance
and other fiscal data.
equipment: information about medical devices
and equipment maintained by the hospital for its
activities.
research: information related to internal and
external research activities.
staff: personal information of the staff working
in the hospital.
admin: administrative and management informa-
tion.
payroll: information related to the salary and
working position of hospital staff.
Figure 3 illustrates these resources along with
their size and the number of additional replicas
needed for each of them by the hospital. Each replica
of each resource is equal in size. For example, re-
source staff has 1 additional replica, meaning that
the hospital maintains (and wishes to outsource) two
copies of the resource, each of which is 100GB in size.
Resource equipment, instead, has 0 additional repli-
cas, meaning that only the original resource (250GB
in size) is maintained and has to be outsourced.
CLOSER 2025 - 15th International Conference on Cloud Computing and Services Science
82
In our modeling, we use notation r to refer to a
resource in the set R of resources to be outsourced,
for which a certain number n 0 of replicas can exist,
without distinguishing whether reference is made to
the original version of the resource or to one of its
replicas. When explicit reference to a specific version
of a resource (i.e., its original version or one of its
replicas) is needed, we distinguish original resources
from replicas with superscripts: r
0
denotes the
original version of r, while r
i
denotes the i
th
replica
of r. We denote the original version and the set of
replicas existing for resource r with symbol R(r). For
example, with reference to the resources in Figure 3,
consider resource insurance, for which 2 replicas
are needed in addition to the original resource. When
writing insurance, we refer to a generic version
of the resource without explicit consideration of
whether we are referring to the original version or to
one replica. When writing insurance
0
we explicitly
refer to the original version. Also, R(insurance) =
{insurance
0
, insurance
1
, insurance
2
}, which
includes the original version (insurance
0
) and
the first (insurance
1
) and second (insurance
2
)
replica. For readability, we denote with R the
overall set of original resources and their replicas
R =
S
rR
R(r). For example, with reference to the
resources in Figure 3, R will include 14 resources:
the 7 original resources, as well as one replica for
each resource but insurance (for which 2 replicas
are defined), and equipment (for which no replica is
defined).
Allocation Modeling. Given a set R of resources and
a set V of nodes, our goal is to determine an alloca-
tion λ : R V that maps each resource (original or
replica) to a node.
Given a set of nodes, it may be that not every
node is suitable to store every resource. This can be
due to two main factors. The first concerns the fit
between the peculiarities of each node vs. the spe-
cific characteristics or requirements of different re-
sources. For instance, certain resources may require
high-performance nodes with fast processing capabil-
ities, while others might need high-capacity nodes op-
timized for bulk storage. The second factor reflects
the interplay between resource allocations, where the
placement of one resource may influence the suitabil-
ity of a node for storing another resource (e.g., due to
capacity limitations, performance constraints, or op-
erational dependencies).
These two factors motivate two distinct families
of requirements, which may be specified by data own-
ers to regulate allocation. The first family of require-
ments addresses the suitability of individual nodes
for specific resources: the enforcement of such con-
straints determines, for each resource, a set of accept-
able nodes. The second family of requirements ad-
dresses the interplay among allocations: the enforce-
ment of such requirements over acceptable nodes
guarantees that the overall allocation fits the desider-
ata of the owner. We illustrate the first family of re-
quirements along with the notion of acceptable node
in Section 3. We then illustrate the second family of
requirements in Section 4.
3 ACCEPTABLE NODES
The first kind of requirements that should be consid-
ered when computing an allocation of data to stor-
age nodes considers the fit between the characteris-
tics of the different nodes, and those of the resources
to be allocated. This kind of requirements is spec-
ified for each resource r independently, and defines
the node characteristics that make a node suitable or
unsuitable for r. For example, a resource requiring
high reliability might be restricted to nodes with ro-
bust failover mechanisms, while a resource involv-
ing latency-sensitive operations might be allocated to
nodes with low communication delay. By capturing
these requirements, data owners can ensure that their
resources can be allocated only to nodes that meet
their needs and expectations. For clarity, in the re-
mainder of this paper we will denote such require-
ments as resource-level requirements. There are sev-
eral approaches that can be used to specify, and en-
force, this kind of requirements (e.g., (De Capitani
di Vimercati et al., 2021b; De Capitani di Vimer-
cati et al., 2021a)). We aim at remaining general
and do not restrict our approach to operate with any
specific approach. For example, the proposal in (De
Capitani di Vimercati et al., 2021b) provides a lan-
guage for specifying complex resource-level require-
ments, building upon the concept of base require-
ment, denoted c. Given a set {w
1
, . . . , w
n
} of values
in the domain of a node attribute (characteristic) at,
a base requirement on at imposes that at can assume
(at(w
1
, . . . , w
n
)) or cannot assume (¬at(w
1
, . . . , w
n
))
such a set of values. For instance, a base require-
ment of the form prov(provA, provB, provC) states
that, to be acceptable, a node must be managed by
provider provA, provB, or provC. Starting from base
requirements, the specification language in (De Cap-
itani di Vimercati et al., 2021b) permits to express a
variety of complex requirements, summarized in Fig-
ure 4(a). Those complex requirements can model al-
ternatives among base requirements (ANY), sets of
base requirements that must be jointly satisfied (ALL),
conditional requirements (IF-THEN), prohibited char-
Security-Aware Allocation of Replicated Data in Distributed Storage Systems
83
ANY(c
1
, . . . , c
n
)
ALL(c
1
, . . . , c
n
)
IF ALL(c
1
, . . . , c
k
) THEN ANY(c
k+1
, . . . , c
n
)
FORBIDDEN(c
1
, . . . , c
n
)
AT LEAST(m, (c
1
, . . . , c
n
))
AT MOST(m, (c
1
, . . . , c
n
))
(a)
ANY({prov(prov2), type(cloud)})
ALL({loc(EU), avail(VH)})
IF(prov(prov1)) THEN (type(cloud))
FORBIDDEN({prov(prov3), type(cloud)})
(b)
Figure 4: Complex requirements supported by the language
in (De Capitani di Vimercati et al., 2021b) (a) and an ex-
ample of a set of requirements for resource clinical in
Figure 3 (b).
acteristic combinations (FORBIDDEN), and a mini-
mum (AT LEAST) or maximum (AT MOST) number
of base requirements to be satisfied.
Note that, while in principle each version (e.g.,
original and/or replica) of each resource may be as-
signed different requirements (and our approach fully
supports this scenario), in our examples we assume
that all versions of each resource are associated with
the same set of requirements and, for readability,
we associate resource-level requirements with the
generic resource names. Figure 4(b) illustrates a set
of 4 sample resource-level requirements for resource
clinical in Figure 3. The first requirement demands
that clinical is to be outsourced to a node that is
managed by prov2 or is of type cloud. The second re-
quirement demands that the node must be located in
EU and guarantee a very high availability. The third
requirement demands that if a node is managed by
prov1, it must be of type cloud. The last requirement
prohibits clinical to be stored at a node managed
by prov3 and that is of type cloud.
The resource-level requirements associated with a
resource r restrict the possible nodes that can be con-
sidered for allocating the original version of r and
its replicas, based on the values the nodes assume
for the attributes that characterize them. Regardless
of the approach adopted for specifying and enforcing
resource-level requirements, we call a node v that sat-
isfies all requirements specified for a resource r as an
acceptable node for r. For example, with reference to
the nodes in Figure 2, the resources in Figure 3 and
the requirements in Figure 4(b), the set of acceptable
nodes for clinical includes v
1
, v
4
, v
7
, and v
9
. On
the contrary, for example, node v
2
is not acceptable
for clinical since its characteristics do not satisfy
the second requirement of Figure 4(b) (it is not lo-
cated in EU nor has a very high availability).
Given a set of nodes and a set of resources with
resource-level requirements, the identification of the
Nodes
Resources v
1
v
2
v
3
v
4
v
5
v
6
v
7
v
8
v
9
v
10
clinical
insurance
equipment
research
staff
admin
payroll
Figure 5: An example of acceptable nodes for the resources
in Figure 3.
acceptable nodes for each resource can be automat-
ically computed in different ways (e.g., leveraging
a Boolean formulation of the problem as proposed
in (De Capitani di Vimercati et al., 2021b)). Figure 5
reports an example of acceptable nodes, among those
considered in our running example in Figure 2, for the
resources in Figure 3.
4 REPLICA AND ALLOCATION
CONSTRAINTS
Replica and allocation constraints are used to spec-
ify requirements that refer to the overall allocation of
sets of resources, and of the replicas. They are en-
forced to determine which resource will be allocated
to which node, among the acceptable ones for that re-
source. While the resource-level requirements illus-
trated in Section 3 specify, for each resource, a set
of conditions that identify acceptable nodes, the main
goal of the constraints introduced in this section is to
force the joint/separate allocation of sets of resources
(e.g., the replicas of a resource and its original ver-
sion) to nodes not based on the characteristics of the
nodes, bur rather on the resources themselves.
We introduce and define 8 kinds of replica
and allocation constraints: together, together
,
all together, not together, not together
,
split, all split, and alone. Constraints
together, together
, and all together force the
joint allocation of resources on the same node. Con-
straints not together and not together
force
fragmentation of resources across different nodes.
Constraints split, all split force different allo-
cations for the original version and the replica of a
resource. Constraint alone force the allocation of a
resource to be different from any other resource. In
the remainder of this section, we formally illustrate
and discuss these constraints. For simplicity, we will
denote with r and s two resources in R, with ρ an in-
stance of r in R(r), and with σ an instance of s in
R(s).
CLOSER 2025 - 15th International Conference on Cloud Computing and Services Science
84
c
1
: together(equipment
0
,clinical
0
)
c
2
: together
(staff,research)
c
3
: all together(staff,payroll)
c
4
: not together(payroll
0
,insurance
0
)
c
5
: not together(payroll
0
,insurance
1
)
c
6
: not together(payroll
0
,insurance
2
)
c
7
: not together
(insurance,clinical)
c
8
: split(clinical)
c
9
: all split(staff)
c
10
: all split(insurance)
c
11
: alone(admin
0
)
c
12
: alone(admin
1
)
Figure 6: An example of a set of replica and allocation con-
straints for the resources in Figure 3.
together(ρ, σ). A together constraint is de-
fined over two specific versions ρ R(r) and
σ R(s) of two resources r and s, and demands
that they are both allocated to the same node (e.g.,
to account for the fact that those resources are of-
tentimes accessed together, or combined).
Formally, a together(ρ, σ) constraint is satisfied
by an allocation λ iff λ(ρ) = λ(σ). This constraint
can be formulated by the owner to restrict the allo-
cation of a pair of resources and, as such, demands
the identification of two specific versions ρ of r
and σ of s to be involved in the constraint. In other
words, this constraint demands the specification
of which versions of r and s are to be allocated
together. For example, constraint c
1
in Figure 6
requires the original versions of equipment and
of clinical to be allocated on the same node,
modeling the fact that these two resources are of-
ten to be accessed together.
together
(r,s). A together
constraint is de-
fined over two generic resources r and s, and de-
mands that at least one version of r be allocated
to the same node of at least one version of s. In
other words, it requires the allocation to include
a node that stores at least one version of r and
one version of s. We expect this constraint to be
formulated, similarly to the together constraint,
over resources that are expected to be frequently
accessed together. The main difference in seman-
tics with the together constraint is that, in this
case, it is not necessary to specify which versions
of the two resources must be placed together.
Formally, a together
(r,s) is satisfied by an allo-
cation λ iff ρ, σ s.t. ρ R(r), σ R(s) : λ(ρ) =
λ(σ). For example, constraint c
2
in Figure 6 re-
quires that at least one version of staff and at
least one version of research be allocated to the
same node.
all together(r,s). An all together con-
straint is defined over two generic resources r and
s, and demands that whenever a node stores a ver-
sion of r, it also stores a version of s. In other
words, it guarantees that r is always accompanied
by s. Note the difference in semantics with the
together
constraint, which is also formulated
over two generic versions of a pair of resources,
but is satisfied when at least one pair of versions
are allocated to the same node.
Formally, an all together(r,s) is satisfied by
an allocation λ iff ρ R(r), σ R(s) : λ(ρ) =
λ(σ). For example, constraint c
3
in Figure 6
requires that, when a node stores a version of
staff, it also stores at least one version of
payroll. We note that when |R(r)| > |R(s)|,
then the satisfaction of this constraint inevitably
implies that more than one replica of r be allo-
cated to the same node (as clearly no more than
|R(s)| nodes can be considered for satisfying this
constraint).
not together(ρ, σ). A not together con-
straint is defined over two specific versions ρ
R(r) and σ R(s) of two resources r and s, and
demands that they are not allocated to the same
node.
Formally, a not together(ρ, σ) constraint is sat-
isfied by an allocation λ iff λ(ρ) ̸= λ(σ). Like the
together constraint, this constraint can be for-
mulated by the owner to restrict the allocation of a
pair of resources and, as such, demands the identi-
fication of two specific versions ρ of r and σ of s to
be specified in the constraint. In other words, this
constraint demands the specification of which ver-
sions of r and s should not be allocated together.
This general constraint is expected to be formu-
lated for performance reasons: for example, to en-
sure concurrent access to both resources reducing
delays and bottlenecks that may be experienced
if ρ and σ are large and the available nodes do
not exhibit excellent performance. The constraint
can also be used to guarantee service continuity in
case of node failure: for example, if ρ and σ are
allocated at different nodes, the unavailability of ρ
compromises only the functions that directly de-
pend on ρ, while still permitting all the activities
that rely on σ. For example, constraints c
4
, c
5
and
c
6
in Figure 6 demand that the original version of
payroll is not allocated together with instances
of insurance.
not together
(r,s). A not together
con-
straint is defined over two generic resources r and
Security-Aware Allocation of Replicated Data in Distributed Storage Systems
85
s, and demands that no version of s be allocated to
a node that stores a version of s, and vice versa.
In other words, it requires the allocation not to
include a node that stores one version of r and
one version of s. Note the difference in seman-
tics with the not together constraint, which is
formulated over two specific versions of a pair of
resources. In particular, while the not together
constraint can be formulated for performance rea-
sons, a not together
constraint models confi-
dentiality constraints, sets (pairs in this case) of
resources that should never be visible together as
their association is considered sensitive. Note that
a not together
constraint can also be formu-
lated as a set of not together constraints (one
for each combination of a version of r and a ver-
sion of s).
Formally, a not together
(r,s) is satisfied by an
allocation λ iff ρ R(r), σ R(s) : λ(ρ) ̸=
λ(σ). For example, constraint c
7
in Figure 6
prevents having, on the same node, versions of
insurance and versions of clinical, account-
ing for the fact that these two resources should not
be visible together.
split(r). A split constraint is formulated over
a generic version of a resource r, and demands
that all replicas of r be not allocated to the node
to which the original version r
0
is allocated. This
constraint can be formulated for security reasons,
to improve availability and resilience to node fail-
ures. Note that a split constraint can also be for-
mulated as a set of not together constraints of
the form not together(r
0
, r
i
), for each r
i
R(r)
with i ̸= 0.
Formally, a split(r) constraint is satisfied by an
allocation λ iff ρ R(r) \ {r
0
} : λ(ρ) ̸= λ(r
0
).
For example, constraint c
8
in Figure 6 requires all
replicas of clinical to be allocated to a node dif-
ferent from that to which clinical
0
is allocated.
all split(r). An all split constraint is for-
mulated over a generic version of a resource r,
and extends the split constraint demanding that
no two versions (original nor replicas) of r be al-
located to the same node. The satisfaction of this
constraint can further increase availability and re-
silience to node failures (each version of a re-
source is allocated to a different node), while pos-
sibly requiring a large number of nodes to be in-
cluded in the allocation. A all split constraint
can also be formulated as a set of not together
constraints (one for each combination of two ver-
sions of r).
Formally, an all split constraint is satisfied by
an allocation λ iff ρ, σ R(r), ρ ̸= σ : λ(ρ) ̸=
λ(σ). For example, constraint c
9
(c
10
, resp.)
in Figure 6 requires all versions of staff (of
insurance, resp.) to be spread across different
nodes.
alone(ρ). An alone constraint is formulated
over a specific version of a resource ρ R(r)
and requires it is not allocated to a node where
other resources (be them original or replicas) as
well as replicas of r are also allocated. In other
words, it requires ρ to be allocated alone to a
node. An alone constraint can also be formulated
as a set of not together constraints of the form
not together(ρ, x), with x any possible instance
of any possible resource.
Formally, an alone(ρ) constraint is satisfied by
an allocation λ iff σ R \ {ρ} : λ(ρ) ̸= λ(σ).
For example, constraints c
11
and c
12
in Figure 6
require, respectively, the original version and the
replica of admin to be allocated to nodes where
no other resources are allocated.
5 OPTIMAL ALLOCATION
In this section, we illustrate our notion of optimal al-
location of resources to nodes (Section 5.1), and illus-
trate a binary programming-based modeling that per-
mits its computation (Section 5.2).
5.1 Problem Definition
We are interested in computing an allocation that sat-
isfies all the requirements specified for the resources
to be outsourced: in other words, we are interested in
an allocation that satisfies all resource-level require-
ments (and hence for which each original resource
and each replica ρ is allocated at a node that is accept-
able for ρ, Section 3), and which satisfies all replica
and allocation constraints (Section 4). We define such
an allocation as a correct allocation. Given a set C
of resource-level requirements and of replica and al-
location constraints, we denote the correctness of an
allocation λ w.r.t. C with notation λ |= C .
In principle, given a set of resources with a set
of resource-level requirements and replica and allo-
cation constraints, there may exist different correct
allocations, possibly characterized by different eco-
nomic costs (as different nodes/providers may charge
different costs for providing their services). We then
aim at computing an optimal allocation of resources
to nodes, meaning an allocation that, besides being
CLOSER 2025 - 15th International Conference on Cloud Computing and Services Science
86
correct, also minimizes the overall economic cost of
the allocation.
Given a set R of resources and a set V of nodes,
the overall cost cost(λ) of an allocation λ : R V is
therefore estimated as cost(λ) =
ρR
cost(ρ, λ(ρ)).
The problem of determining an optimal allocation can
therefore be formulated as follows.
Problem 5.1 (Optimal allocation). Given a set R of
resources associated, a set C of resource-level re-
quirements and of replica and allocation constraints,
and a set V of nodes, determine an allocation λ :
R V of resources to nodes such that i) λ |= C (i.e.,
the allocation is correct); and ii) λ
: R V , λ
̸=λ,
such that λ
|= C and cost(λ
) < cost(λ).
5.2 Computing an Optimal Allocation
To compute an optimal allocation, we interpret Prob-
lem 5.1 as a binary programming problem, which is
formulated as follows: given a set of binary variables,
a set of constraints over them, and an objective func-
tion, determine an assignment of values to variables
that i) satisfies all the constraints; and ii) minimizes
the value of the objective function. Our interpretation
of Problem 5.1 therefore assumes: the objective func-
tion as the cost function of the allocation to be min-
imized; the constraints as the resource-level require-
ments and replica and allocation constraints, which
are to be satisfied; and the binary variables as a set of
variables, which we now illustrate, modeling the al-
location (or non-allocation) of each resource ρ R
to each node v V . In particular, for each resource
ρ
i
and each node v
z
we define a Boolean variable a
i,z
that has value 1 iff resource ρ
i
is allocated to node v
z
,
0 otherwise (i.e., a
i,z
= 1 λ(ρ
i
) = v
z
).
Having introduced our variables, we can now il-
lustrate the constraints, as well as the objective func-
tion, needed for the computation of a solution to our
binary programming interpretation of Problem 5.1. In
the following formulation, we use variable ¯a to model
acceptable allocations (i.e., allocations that do not vi-
olate any resource-level requirement, see Section 3):
given a resource ρ
i
and a node v
z
, ¯a
i,z
= 1 iff v
z
is an
acceptable node for ρ
i
.
Constraints. Our constraints need to guide assign-
ment of values to our binary variables a
i,z
, for all
ρ
i
R and v
z
V , such that the assignment: i) repre-
sents an allocation; ii) allocates each resource to one
among its acceptable nodes; and iii) satisfies all the
replica and allocation constraints.
Allocation. Each resource must be allocated to
exactly one node. We model this requirement with
the following constraint:
|R |
i=1
|V |
z=1
a
i,z
!
= 1
For this constraint to assume value 1, for each re-
source ρ
i
, the innermost sum (over all a
i,z
over all
nodes v
z
) must be equal to 1. This requires that,
for each resource ρ
i
, there is one and only one
node v
z
for which a
i,z
= 1, hence implying that
each resource is allocated to exactly one node.
Acceptable nodes. Each resource must be allo-
cated to an acceptable node. We model this re-
quirement with the following constraint:
|R |
i=1
|V |
z=1
¯a
i,z
· a
i,z
!
= 1
This constraint extends the allocation constraint
by considering for each resource, in the innermost
sum, the product ¯a
i,z
·a
i,z
. This product is equal to
1 iff ρ
i
is allocated to a node v
z
and v
z
is an ac-
ceptable node for ρ
i
. By requiring the sum over
all nodes of this product to be equal to 1, for each
resource, this constraint then implies that all re-
sources are allocated to acceptable nodes.
together requirement. Given two specific ver-
sions ρ
i
R(r) and ρ
j
R(s) of two resources
r, s R involved in a together requirement, an
allocation must place both resources on the same
node. We model this requirement with the fol-
lowing constraint, defined for each together re-
quirement:
t = together(ρ
i
, ρ
j
) :
|V |
z=1
(a
i,z
· a
j,z
) = 1
Intuitively, given a node v
z
and two resources ρ
i
and ρ
j
, we have that (a
i,z
· a
j,z
) equals 1 iff v
z
stores both ρ
i
and ρ
j
. Demanding the sum over
all nodes to be equal to 1 implies that there is ex-
actly one node that stores both ρ
i
and ρ
j
.
together
requirement. Given two generic re-
sources r and s involved in a together
require-
ment, an allocation must place at least one ver-
sion of r and at least one version of s on the same
node. We model this requirement with the fol-
lowing constraint, defined for each together
re-
quirement:
t
= together
(r, s) :
|V |
z=1
ρ
i
R(r),
ρ
j
R(s)
(a
i,z
· a
j,z
)
1
Security-Aware Allocation of Replicated Data in Distributed Storage Systems
87
Given a node v
z
, the sum over all versions ρ
i
and
ρ
j
of r and s, respectively, of (a
i,z
· a
j,z
) is greater
than or equal to 1 if v
z
stores at least one version
ρ
i
of r and ρ
j
of s.
all together requirement. Given two generic
resources r and s involved in an all together
requirement, whenever a node v
z
stores a version
of r, then it must also store at least one version
of s. We model this requirement with the follow-
ing constraint, defined for each all together re-
quirement:
at = all together(r, s) :
ρ
i
R(r)
|V |
z=1
a
i,z
· (
ρ
j
R(s)
a
j,z
)
1
For this constraint to hold, the parameter of the
outermost product must be greater than or equal
to 1 for all versions ρ
i
R(r) of r. Consider
a specific version ρ
i
of r and a node v
z
: a
i,z
·
(
ρ
j
R(s)
a
j,z
) is equal to or greater than 1 if v
z
stores ρ
i
and at least one version ρ
j
of s. The in-
termediate sum over all nodes ensures that there
is at least one node for which the condition holds.
The outermost product over all versions of r then
ensures that the overall condition is satisfied for
all versions of r.
not together requirement. Given two specific
versions ρ
i
R(r) and ρ
j
R(s) of two re-
sources r, s R involved in a not together re-
quirement, an allocation cannot place both re-
sources on the same node. We model this re-
quirement with the following constraint, defined
for each not together requirement:
nt = not together(ρ
i
, ρ
j
) :
|V |
z=1
(a
i,z
· a
j,z
) = 0
Given a node v
z
, we have that (a
i,z
· a
j,z
) equals
0 iff at least one among a
i,z
and a
j,z
is equal to 0
(i.e., at least one among ρ
i
and ρ
j
is not stored at
v
z
). Requiring the sum over all nodes to be equal
to 0 then implies that no node stores at the same
time ρ
i
and ρ
j
.
not together
requirement. Given two re-
sources r and s involved in a not together
requirement, an allocation cannot place a version
of r on a same node that stores a version of s. We
model this requirement with the following con-
straint, defined for each not together
re-
quirement:
nt
= not together
(r, s) :
|V |
z=1
ρ
i
R(r),
ρ
j
R(s)
a
i,z
· a
j,z
= 0
Given a node v
z
, requiring the sum over all ver-
sions of r and s of a
i,z
· a
j,z
(i.e., the innermost
sum) to be equal to 0 implies that no node v
z
stores
at the same time a version of r and a version of s.
The outermost sum enforces this guarantee for all
nodes.
split requirement. Given a resource r, an allo-
cation must not place the original version r
0
on
the nodes storing the replicas. We model this re-
quirement with the following constraint, defined
for each split requirement:
sp = split(r) :
|V |
z=1
a
r
0
,z
·
ρ
i
R(r)\{r
0
}
a
i,z
= 0
Given a node v
z
, a
r
0
,z
·
ρ
i
R(r)\{r
0
}
a
i,z
is equal
to 0 if the original r
0
is not allocated to the same
node as a replica ρ
i
R(r) \ {r
0
} of r. The outer-
most sum enforces this guarantee for all nodes.
all split requirement. Given a resource r, an
allocation must not place two versions of r on the
same node. We model this requirement with the
following constraint, defined for each all split
requirement:
as = all split(r) :
|V |
z=1
ρ
i
,ρ
j
R(r),
ρ
i
̸=ρ
j
a
i,z
· a
j,z
= 0
Similarly to the previous constraint, the innermost
sum guarantees that, given a node v
z
, no pair of
versions of r are allocated together at v
z
. The out-
ermost sum enforces this guarantee for all nodes.
alone requirement. Given a specific version ρ
i
of a resource r R, an allocation must not place
it on a node storing also another version of an-
other resource. We model this requirement with
the following constraint, defined for each alone
requirement:
CLOSER 2025 - 15th International Conference on Cloud Computing and Services Science
88
Nodes
Resources v
1
v
2
v
3
v
4
v
5
v
6
v
7
v
8
v
9
v
10
clinical
0
clinical
1
insurance
0
insurance
1
insurance
2
equipment
0
research
0
research
1
staff
0
staff
1
admin
0
admin
1
payroll
0
payroll
1
Figure 7: Optimal allocation of the resources in Figure 3 to
the nodes in Figure 2 (acceptable nodes are denoted with a
colored background).
a = alone(α) :
|V |
z=1
a
i,z
·
ρ
j
∈{R \{ρ
i
}}
a
j,z
= 0
Given a node v
z
, a
i,z
·
ρ
j
∈{R \{ρ
i
}}
a
j,z
is equal to
0 if v
z
does not store both ρ
i
and a version of an-
other resource. The outermost sum enforces this
guarantee for all nodes.
Objective Function. The objective function of our
binary programming problem, which needs to be min-
imized, models the cost of the allocation represented
by the allocation variables a
i,z
. The allocation cost il-
lustrated in Section 5.1 is therefore computed taking
into account the values assumed by a
i,z
, to guarantee
equivalence between the binary programming prob-
lem and Problem 5.1. The objective function is de-
fined as follows:
min
|R |
i=1
|V |
z=1
a
i,z
· size(ρ
i
) · price(v
z
)
where size(ρ
i
) represents the size of resource ρ
i
(e.g.,
in GB) and price(v
z
) represents the unitary cost (e.g.,
in USD/GB) to be paid for using v
z
. Note that, for
simplicity and in line with previous works (e.g., (De
Capitani di Vimercati et al., 2021a)), in our exam-
ples we consider a linear cost function and estimate
the cost cost(ρ, v) of allocating resource ρ to node v
by multiplying the size of ρ (e.g., in GB) by the uni-
tary storage price (e.g., in USD/GB) that is applied
by v, with the note that such an estimate may con-
sider different cost factors (e.g., transfer costs) and/or
cost functions.
Figure 7 illustrates an optimal allocation com-
puted over the resources in Figure 3 and the accept-
able plans in Figure 5, considering the replica and al-
location constraints in Figure 6.
The problem can be efficiently solved using off-
the-shelf optimization solvers; for instance, we im-
plemented our approach using Google OR-Tools,
demonstrating its effectiveness in computing optimal
allocations respecting all specified constraints.
6 RELATED WORK
This work contributes to different lines of research at
the intersection between distributed storage systems
and data allocation methodologies, supporting owner-
specified requirements in storage selection.
The majority of works addressing the problem of
computing optimal allocations in distributed systems
do not take into consideration specific requirements
imposed by data owners, but rather focus on guar-
anteeing recovery possibility (e.g., (Jakoveti
´
c et al.,
2015; Leong et al., 2012; Roshandeh et al., 2017;
Bhattacharya et al., 2019; Peng et al., 2021)). For ex-
ample, the approach in (Leong et al., 2012) introduces
coding strategies to store data optimally for maximiz-
ing reliability in terms of recovery possibility. The
approach in (Roshandeh et al., 2017) focuses on het-
erogeneous data in distributed storage, proposing an
approach that assumes data of the same type are min-
imally spread across nodes. The approach in (Bhat-
tacharya et al., 2019) aims to balance storage and net-
work costs, focusing on bandwidth-constrained envi-
ronments. The approach in (Jakoveti
´
c et al., 2015)
proposes a distributed algorithm for optimizing data
allocations based on local communication between
nodes, ensuring data retrieval is achieved with nodes
only communicating with neighboring ones. The ap-
proach in (Peng et al., 2021) focuses instead on de-
termining an optimal subset of nodes for storing data.
A distributed storage system for edge and IoT devices
with proof of replication is proposed in (Wu et al.,
2022). Our work differs from those falling in this line
of research due to the fact that we are not concerned
with coding and fragmenting data across nodes, as we
aim at supporting heterogeneous architectures (e.g.,
cloud, fog, and edge) and complex owner-defined
constraints, regulating allocations of data and replica
to optimally place them in a distributed architecture.
Another related line of work addresses the prob-
lem of cloud plan selection, possibly in the context of
multi-cloud scenarios and possibly supporting owner-
specified requirements. Approaches that permit own-
ers to specify resource-level requirements are orthog-
Security-Aware Allocation of Replicated Data in Distributed Storage Systems
89
onal to our work and can be used to determine ac-
ceptable nodes on which then computing optimal al-
locations. Other approaches falling in this category
do not typically support arbitrary owner-specified re-
quirements or do not consider replica management,
which is a primary objective of our work. The ap-
proach in (Ruiz-Alvarez and Humphrey, 2012) pro-
poses an automated solution for selecting storage
providers. While sharing with our work some support
for user-specified desiderata, this approach does not
explicitly support the management of replicas and of
their allocation, nor the management of complex con-
straints regulating the joint/disjoint allocation of data
to the same provider, and has a specific focus on cost
and performance factors. The approaches proposed
in (Dastjerdi and Buyya, 2014; Esposito et al., 2016;
De Capitani di Vimercati et al., 2019) adopt fuzzy
logic and reasoning for cloud providers selection. In
particular, fuzzy logic is adopted in (De Capitani di
Vimercati et al., 2019) for language. The approach
in (De Capitani di Vimercati et al., 2021b) proposes
a specification language for permitting owners to for-
mulate requirements and preferences for cloud plan
selection and formal model and different strategies
for reasoning on requirements, preferences, and ac-
ceptable plans. These approaches are complementary
to ours, and can be used for the identification of ac-
ceptable nodes that are then used to compute our op-
timal allocations. The work presented in (Taha et al.,
2017), while aligning with our objective of facilitating
the selection of multiple providers or services, bases
its requirements on the importance levels assigned to
predefined Service Level Objectives (SLOs). Other
related efforts focus on solving various aspects of re-
source allocation optimization, such as considering
provider workloads (Wendell et al., 2010), address-
ing fault tolerance mechanisms (Jhawar et al., 2013),
leveraging multi-cloud solutions for application de-
velopment and management (Ferry et al., 2018), and
integrating multi-cloud storage systems with existing
NAS-based programs (Chen and Zadok, 2019).
Related yet orthogonal works have investigated
approaches for permitting owners effectively protect-
ing and securely deleting resources while relying on
decentralized cloud services for storage. For exam-
ple, the approach in (Bacis et al., 2020) combines
All-Or-Nothing-Transform for strong resource pro-
tection, and ad-hoc strategies for slicing resources
and for their decentralized allocation in the storage
network. Our work builds on existing research in
distributed storage optimization and multicloud data
placement, extending their applicability to modern ar-
chitectures (e.g., fog and edge computing) and ensur-
ing full support for obeying owner-specified require-
ments and managing replica, while ensuring eco-
nomic efficiency.
7 CONCLUSIONS
This paper presented an approach to the optimal allo-
cation of data and replicas in distributed storage sys-
tems, with a focus on balancing economic cost and
operational constraints. Our approach can fit diverse
architectural environments (e.g., cloud, fog, and edge
computing) by generalizing the concept of a “node”,
and permitting to accommodate varying requirements
and characteristics. A peculiarity of our approach
consists in permitting owners to specify in a friendly
way, and have enforced in the computation of the al-
location, complex requirements and constraints that
take into consideration both the characteristics of the
storage nodes, as well as the interplay among the al-
locations of different data items and replicas. To en-
able the computation of optimal allocations, satisfy-
ing all constraints while minimizing the overall eco-
nomic cost entailed by the allocation, this work pro-
posed a formulation of the problem in terms of a bi-
nary programming problem, which can then be effi-
ciently solved leveraging off-the-shelves solvers. Fur-
ther research will include the definition of additional
constraints that could be specified and taken into con-
sideration when computing allocations.
ACKNOWLEDGEMENTS
This work was supported in part by the EC under
project GLACIATION (101070141), by the Italian
MUR under PRIN projects POLAR (2022LA8XBH)
and KURAMi (20225WTRFN), and by project SER-
ICS (PE00000014) under the MUR NRRP funded by
the EU - NGEU. Views and opinions expressed are
however those of the authors only and do not neces-
sarily reflect those of the European Union or the Ital-
ian MUR. Neither the European Union nor the Italian
MUR can be held responsible for them.
REFERENCES
Bacis, E., De Capitani di Vimercati, S., Foresti, S., Para-
boschi, S., Rosa, M., and Samarati, P. (2020). Se-
curing resources in decentralized cloud storage. IEEE
TIFS, 15(1):286–298.
Bhattacharya, H., Chattopadhyay, S., Chattopadhyay, M.,
and Banerjee, A. (2019). Storage and bandwidth op-
CLOSER 2025 - 15th International Conference on Cloud Computing and Services Science
90
timized reliable distributed data allocation algorithm.
IJACI, 10(1):78–95.
Chen, M. and Zadok, E. (2019). Kurma: Secure geo-
distributed multi-cloud storage gateways. In Proc. of
ACM SYSTOR 2019, Haifa, Israel.
Dastjerdi, A. V. and Buyya, R. (2014). Compatibility-aware
cloud service composition under fuzzy preferences of
users. IEEE TCC, 2(1):1–13.
De Capitani di Vimercati, S., Foresti, S., Livraga, G., Pi-
uri, V., and Samarati, P. (2019). A fuzzy-based bro-
kering service for cloud plan selection. IEEE ISJ,
13(4):4101–4109.
De Capitani di Vimercati, S., Foresti, S., Livraga, G., Pi-
uri, V., and Samarati, P. (2021a). Security-aware
data allocation in multicloud scenarios. IEEE TDSC,
18(5):2456–2468.
De Capitani di Vimercati, S., Foresti, S., Livraga, G., Piuri,
V., and Samarati, P. (2021b). Supporting user require-
ments and preferences in cloud plan selection. IEEE
TSC, 14(1):274–285.
Esposito, C., Ficco, M., Palmieri, F., and Castiglione, A.
(2016). Smart cloud storage service selection based
on fuzzy logic, theory of evidence and game theory.
IEEE TC, 65(8):2348–2362.
Ferry, N., Chauvel, F., Song, H., Rossini, A., Lushpenko,
M., and Solberg, A. (2018). CloudMF: Model-Driven
management of multi-cloud applications. ACM TOIT,
18(2):16:1–16:24.
Jakoveti
´
c, D., Minja, A., Bajovi
´
c, D., and Vukobra-
tovi
´
c, D. (2015). Distributed storage allocations for
neighborhood-based data access. In Proc. of IEEE
ITW 2015, Jerusalem, Israel.
Jhawar, R., Piuri, V., and Santambrogio, M. (2013). Fault
tolerance management in cloud computing: a system-
level perspective. IEEE ISJ, 7(2):288–297.
Leong, D., Dimakis, A. G., and Ho, T. (2012). Distributed
storage allocations. IEEE TIT, 58(7):4733–4752.
Peng, P., Noori, M., and Soljanin, E. (2021). Distributed
storage allocations for optimal service rates. IEEE
TCOM, 69(10):6647–6660.
Roshandeh, K. P., Noori, M., Ardakani, M., and Tellam-
bura, C. (2017). Distributed storage allocation for
multi-class data. In Proc. of IEEE ISIT 2017, Aachen,
Germany.
Ruiz-Alvarez, A. and Humphrey, M. (2012). A model and
decision procedure for data storage in cloud comput-
ing. In Proc. of IEEE/ACM CCGrid 2012, Ottawa,
Canada.
Russo Russo, G., Ferrarelli, D., Pasquali, D., Cardellini, V.,
and Lo Presti, F. (2024). QoS-aware offloading poli-
cies for serverless functions in the Cloud-to-Edge con-
tinuum. FGCS, 156:1–15.
Taha, A., Manzoor, S., and Suri, N. (2017). SLA-based ser-
vice selection for multi-cloud environments. In Proc.
of IEEE EDGE 2017, Honolulu, HI, USA.
Wendell, P., Jiang, J. W., Freedman, M. J., and Rexford, J.
(2010). DONAR: Decentralized server selection for
cloud services. In Proc. of ACM SIGCOMM 2010,
New Delhi, India.
Wu, C., Chen, Y., Qi, Z., and Guan, H. (2022). DSPR:
Secure decentralized storage with proof-of-replication
for edge devices. JSA, 125:102441.
Security-Aware Allocation of Replicated Data in Distributed Storage Systems
91