Security-Aware Allocation of Replicated Data

in Distributed Storage Systems

Sabrina De Capitani di Vimercati

, Sara Foresti

, Giovanni Livraga

Pierangela Samarati

and Mauro Tedesco

Computer Science Department, Universit

a degli Studi di Milano, via Celoria 18, Milano, Italy

Keywords:

Security-Aware Data Allocation, Distributed Storage, Data Management, Replica Management, Optimal

Allocation.

Abstract:

Distributed storage systems offer scalable and cost-effective solutions for managing large data collections. A

critical factor for the adoption of these systems is the allocation of data (possibly including replicas) to the

storage nodes satisfying operational and security requirements, while ensuring economic effectiveness. Ap-

propriate data and replica management provides signiﬁcant beneﬁts, ranging from enhanced fault tolerance

and improved data availability, to reduced latency and optimized workload distribution. A suboptimal place-

ment of data and replicas can instead lead to excessive costs, security risks, and performance bottlenecks.

This paper proposes a novel model for permitting data owners to specify in a friendly manner complex data

and replica allocation constraints, and an approach for computing optimal (satisfying operational and security

requirements and minimizing costs) data allocations in distributed storage environments. Our work aims to

improve the reliability, security, and cost-effectiveness of distributed storage systems.

1 INTRODUCTION

The availability of distributed storage systems, pos-

sibly based on cloud/fog/edge paradigms, has revo-

lutionized how large and/or critical data collections

can be stored and managed, offering scalable, fault-

tolerant, and cost-effective solutions for different ap-

plication scenarios (e.g., (Russo Russo et al., 2024)).

To fully beneﬁt from the features of these distributed

systems, it is critical to ensure the storage of data

in compliance with operational and security require-

ments, while maintaining economic effectiveness. An

essential aspect of this challenge concerns the opti-

mal allocation of data items (including their possi-

ble replicas) across the nodes. Careless or subop-

timal data placement can intuitively lead to exces-

sive costs, performance bottlenecks, and/or breaches

of security or operational requirements. These re-

quirements, often dictated by the data owner, may im-

pose constraints such as grouping related data items

(or data items that are frequently accessed together)

https://orcid.org/0000-0003-0793-3551

https://orcid.org/0000-0002-1658-6734

https://orcid.org/0000-0003-2661-8573

https://orcid.org/0000-0001-7395-4620

on the same node to facilitate joint visibility, ensur-

ing that sensitive collections are fragmented and dis-

tributed across separate nodes to avoid exposure, or

replicating data among multiple storage nodes. In

particular, replicating data can provide several ben-

eﬁts. A ﬁrst beneﬁt is represented by enhanced fault

tolerance: replicas mitigate the risks of data loss due

to failures or network outages. Data replication can

also help improving data availability, as replicas per-

mit access to data even when some nodes become

unavailable, reducing downtime and ensuring service

continuity. It can also be beneﬁcial to increase sys-

tem performance: for example, strategically placing

replicas closer to end-users can reduce latency and

improve data access speed, enhancing application re-

sponsiveness. Furthermore, it permits load balancing,

as replicas can help distributing workload across stor-

age nodes, preventing bottlenecks and optimizing re-

source utilization. When allocating data to the stor-

age nodes, a careful management of the replica plays

a critical role, as it directly inﬂuences system reliabil-

ity, performance, and cost.

To address this problem, we propose an approach

capable of computing optimal data allocations con-

sidering data replication, balancing two different ob-

jectives: minimizing economic cost on the one hand,

De Capitani di Vimercati, S., Foresti, S., Livraga, G., Samarati, P. and Tedesco, M.

Security-Aware Allocation of Replicated Data in Distributed Storage Systems.

DOI: 10.5220/0013484800003950

In Proceedings of the 15th International Conference on Cloud Computing and Services Science (CLOSER 2025), pages 80-91

ISBN: 978-989-758-747-4; ISSN: 2184-5042

Figure 1: Reference scenario.

and satisfying all requirements imposed by the data

owner on the other hand. The contribution of this pa-

per is two-fold. First, we identify and formally deﬁne

possible requirements that a data owner may need to

impose to regulate the overall allocation of data and

replicas to the storage nodes. We then present an ap-

proach for computing optimal data allocations (i.e.,

allocations that respect all requirements while min-

imizing the overall economic cost) in a distributed

storage environment. We underline that our approach

is agnostic to (and does not focus on) speciﬁc stor-

age/processing frameworks (e.g., distributed ﬁle sys-

tems such as HDFS or IPFS) and considers a more

general distributed storage environment, interpreted

as a collection of nodes with storage capabilities that

can be connected and queried, making it applicable

to a broader range of distributed architectures beyond

traditional data processing ecosystems.

Our work aims at contributing to the body of liter-

ature on efﬁcient and secure data outsourcing strate-

gies in distributed systems, with a speciﬁc focus on

the critical aspect of replica placement and manage-

ment. Our approach seeks to improve the reliability,

security, and cost-effectiveness of distributed storage

systems, making them more accessible and appealing

to data owners, across various application scenarios,

fostering trust and enabling efﬁcient resource usage

in modern data outsourcing scenarios. The remainder

of this paper is organized as follows. Section 2 intro-

duces our modeling of nodes and data, and the notion

of allocation along with two families of requirements

to be considered when deﬁning allocations. Section 3

discusses the ﬁrst family of requirements, and char-

acterizes the concepts of acceptable nodes that ﬁt the

characteristics and needs of the data items to be al-

located. Section 4 presents the second family of re-

quirements, which imposes restrictions on data and

replica allocation. Section 5 illustrates the problem of

computing optimal allocations, and an approach for

ﬁnding them. Section 6 discusses related works. Fi-

nally, Section 7 concludes the paper.

2 MODELING

We consider a scenario (see Figure 1) characterized

by a data owner who is willing to move their data

to the premises of external storage providers (e.g., on

cloud, edge, fog nodes). While interested in ofﬂoad-

ing of storage and management of (possibly repli-

cated) data collections, the data owner needs to en-

force restrictions on the nodes storing the data in

terms of their characteristics (e.g., node location or

in-storage encryption algorithm adopted), as well as

in terms of which data items should be or cannot be

allocated to the same node. In the following, we in-

troduce our modeling of nodes and of data, together

with their replicas, and of data allocations.

Nodes Modeling. Our approach is designed to be

highly adaptable and not restricted to any speciﬁc dis-

tributed storage architecture. It is applicable to any

architecture that consists of multiple nodes, each po-

tentially having different characteristics (e.g., storage

capacity, performance, connectivity, security/privacy

features). The ﬂexibility of the approach allows data,

including replicas, to be allocated dynamically across

nodes. This general applicability makes the proposed

Security-Aware Allocation of Replicated Data in Distributed Storage Systems

Attributes

Nodes prov type loc encr avail Price

prov1 cloud EU 3DES VH 40

prov1 cloud US DES H 30

prov3 cloud EU 3DES M 55

prov2 edge EU 3DES VH 95

prov2 edge US DES L 90

prov4 edge EU AES L 85

prov1 cloud EU 3DES VH 10

prov3 cloud EU 3DES M 25

prov2 edge EU 3DES VH 70

prov4 edge EU AES L 100

Figure 2: Abstract representation of a set of nodes and price

for their usages (USD/GB).

method suitable for diverse environments, including

cloud computing platforms (where centralized data

centers can host numerous virtualized storage nodes),

fog computing (with distributed devices closer to the

network edge for latency-sensitive applications), and

edge computing (where data are processed and stored

directly on edge devices such as IoT gateways).

In light of this generality, the notion of a “node”

is inherently dependent on the architecture considered

within the speciﬁc application scenario. For example,

in cloud computing, a “node” might correspond to a

virtual storage instance or even to a cloud plan, while

in edge computing, it could represent an individual

device or sensor. Regardless of its speciﬁc instanti-

ation, a “node” within our framework is deﬁned as

the actual place where data items can be stored and

is characterized by attributes (e.g., capacity, perfor-

mance, accessibility, security features). This ﬂexibil-

ity ensures that the proposed approach can accommo-

date various interpretations of a “node,” tailored to the

requirements of the considered application and archi-

tecture. We then deﬁne our problem as the problem

of determining which data item has to be allocated to

which node of the storage architecture.

For generality and in line with previous works

(e.g., (De Capitani di Vimercati et al., 2021a; De

Capitani di Vimercati et al., 2021b)), we assume an

abstract representation of nodes in terms of their at-

tributes: each node v is then represented as a vector

having, for each attribute of interest, one element with

the value assumed by it in v. Figure 2 illustrates an ex-

ample of 10 nodes v

, . . . , v

deﬁned over attributes

prov (the provider managing the node), type (the

type of node), loc (the geographical location of the

node), encr (the encryption scheme supported by the

node), avail (the declared availability).

Data Modeling. A peculiarity of our scenario is the

consideration of the possibility of replicating data.

The computation of the data allocation to the nodes

must then consider both the original data items, as

well as their replicas. Each data item therefore has

Resource r Size (GB) Num. Replicas

clinical 1000 1

insurance 500 2

equipment 250 0

research 300 1

staff 100 1

admin 200 1

payroll 100 1

Figure 3: Resources of our running example with their size

and number of additional replicas.

an original version (i.e., the item itself), and 0 or

more additional replicas. Our approach is not re-

stricted to a speciﬁc type of data and, for general-

ity and to encompass a variety of different data mod-

els (e.g., structured/semi-structured data, documents,

ﬁles) in the reminder of this paper we refer to the data

items to be outsourced with the term resources. A

resource can then represent any single informative as-

set of the data owner, deﬁned at a chosen granular-

ity level (from entire documents collections to par-

titioned data), which may (if so desired and done

by the owner) be replicated, and which needs to be

outsourced and hence allocated to a node (or a set

thereof).

Example 2.1. We refer our examples to the outsourc-

ing of the data assets of a hospital. The resources to

be outsourced are the following.

• clinical: clinical information related to pa-

tient, such as their diagnosis, ongoing treatments,

past medical history, and so on.

• insurance; information on patients’ insurance

and other ﬁscal data.

• equipment: information about medical devices

and equipment maintained by the hospital for its

activities.

• research: information related to internal and

external research activities.

• staff: personal information of the staff working

in the hospital.

• admin: administrative and management informa-

tion.

• payroll: information related to the salary and

working position of hospital staff.

Figure 3 illustrates these resources along with

their size and the number of additional replicas

needed for each of them by the hospital. Each replica

of each resource is equal in size. For example, re-

source staff has 1 additional replica, meaning that

the hospital maintains (and wishes to outsource) two

copies of the resource, each of which is 100GB in size.

Resource equipment, instead, has 0 additional repli-

cas, meaning that only the original resource (250GB

in size) is maintained and has to be outsourced.

CLOSER 2025 - 15th International Conference on Cloud Computing and Services Science

In our modeling, we use notation r to refer to a

resource in the set R of resources to be outsourced,

for which a certain number n ≥ 0 of replicas can exist,

without distinguishing whether reference is made to

the original version of the resource or to one of its

replicas. When explicit reference to a speciﬁc version

of a resource (i.e., its original version or one of its

replicas) is needed, we distinguish original resources

from replicas with superscripts: r

denotes the

original version of r, while r

denotes the i

replica

of r. We denote the original version and the set of

replicas existing for resource r with symbol R(r). For

example, with reference to the resources in Figure 3,

consider resource insurance, for which 2 replicas

are needed in addition to the original resource. When

writing insurance, we refer to a generic version

of the resource without explicit consideration of

whether we are referring to the original version or to

one replica. When writing insurance

we explicitly

refer to the original version. Also, R(insurance) =

{insurance

, insurance

}, which

includes the original version (insurance

) and

the ﬁrst (insurance

) and second (insurance

)

replica. For readability, we denote with R the

overall set of original resources and their replicas

R =

r∈R

R(r). For example, with reference to the

resources in Figure 3, R will include 14 resources:

the 7 original resources, as well as one replica for

each resource but insurance (for which 2 replicas

are deﬁned), and equipment (for which no replica is

deﬁned).

Allocation Modeling. Given a set R of resources and

a set V of nodes, our goal is to determine an alloca-

tion λ : R → V that maps each resource (original or

replica) to a node.

Given a set of nodes, it may be that not every

node is suitable to store every resource. This can be

due to two main factors. The ﬁrst concerns the ﬁt

between the peculiarities of each node vs. the spe-

ciﬁc characteristics or requirements of different re-

sources. For instance, certain resources may require

high-performance nodes with fast processing capabil-

ities, while others might need high-capacity nodes op-

timized for bulk storage. The second factor reﬂects

the interplay between resource allocations, where the

placement of one resource may inﬂuence the suitabil-

ity of a node for storing another resource (e.g., due to

capacity limitations, performance constraints, or op-

erational dependencies).

These two factors motivate two distinct families

of requirements, which may be speciﬁed by data own-

ers to regulate allocation. The ﬁrst family of require-

ments addresses the suitability of individual nodes

for speciﬁc resources: the enforcement of such con-

straints determines, for each resource, a set of accept-

able nodes. The second family of requirements ad-

dresses the interplay among allocations: the enforce-

ment of such requirements over acceptable nodes

guarantees that the overall allocation ﬁts the desider-

ata of the owner. We illustrate the ﬁrst family of re-

quirements along with the notion of acceptable node

in Section 3. We then illustrate the second family of

requirements in Section 4.

3 ACCEPTABLE NODES

The ﬁrst kind of requirements that should be consid-

ered when computing an allocation of data to stor-

age nodes considers the ﬁt between the characteris-

tics of the different nodes, and those of the resources

to be allocated. This kind of requirements is spec-

iﬁed for each resource r independently, and deﬁnes

the node characteristics that make a node suitable or

unsuitable for r. For example, a resource requiring

high reliability might be restricted to nodes with ro-

bust failover mechanisms, while a resource involv-

ing latency-sensitive operations might be allocated to

nodes with low communication delay. By capturing

these requirements, data owners can ensure that their

resources can be allocated only to nodes that meet

their needs and expectations. For clarity, in the re-

mainder of this paper we will denote such require-

ments as resource-level requirements. There are sev-

eral approaches that can be used to specify, and en-

force, this kind of requirements (e.g., (De Capitani

di Vimercati et al., 2021b; De Capitani di Vimer-

cati et al., 2021a)). We aim at remaining general

and do not restrict our approach to operate with any

speciﬁc approach. For example, the proposal in (De

Capitani di Vimercati et al., 2021b) provides a lan-

guage for specifying complex resource-level require-

ments, building upon the concept of base require-

ment, denoted c. Given a set {w

, . . . , w

} of values

in the domain of a node attribute (characteristic) at,

a base requirement on at imposes that at can assume

(at(w

, . . . , w

)) or cannot assume (¬at(w

, . . . , w

))

such a set of values. For instance, a base require-

ment of the form prov(provA, provB, provC) states

that, to be acceptable, a node must be managed by

provider provA, provB, or provC. Starting from base

requirements, the speciﬁcation language in (De Cap-

itani di Vimercati et al., 2021b) permits to express a

variety of complex requirements, summarized in Fig-

ure 4(a). Those complex requirements can model al-

ternatives among base requirements (ANY), sets of

base requirements that must be jointly satisﬁed (ALL),

conditional requirements (IF-THEN), prohibited char-

Security-Aware Allocation of Replicated Data in Distributed Storage Systems

ANY(c

, . . . , c

)

ALL(c

, . . . , c

)

IF ALL(c

, . . . , c

) THEN ANY(c

k+1

, . . . , c

)

FORBIDDEN(c

, . . . , c

)

AT LEAST(m, (c

, . . . , c

))

AT MOST(m, (c

, . . . , c

))

(a)

ANY({prov(prov2), type(cloud)})

ALL({loc(EU), avail(VH)})

IF(prov(prov1)) THEN (type(cloud))

FORBIDDEN({prov(prov3), type(cloud)})

(b)

Figure 4: Complex requirements supported by the language

in (De Capitani di Vimercati et al., 2021b) (a) and an ex-

ample of a set of requirements for resource clinical in

Figure 3 (b).

acteristic combinations (FORBIDDEN), and a mini-

mum (AT LEAST) or maximum (AT MOST) number

of base requirements to be satisﬁed.

Note that, while in principle each version (e.g.,

original and/or replica) of each resource may be as-

signed different requirements (and our approach fully

supports this scenario), in our examples we assume

that all versions of each resource are associated with

the same set of requirements and, for readability,

we associate resource-level requirements with the

generic resource names. Figure 4(b) illustrates a set

of 4 sample resource-level requirements for resource

clinical in Figure 3. The ﬁrst requirement demands

that clinical is to be outsourced to a node that is

managed by prov2 or is of type cloud. The second re-

quirement demands that the node must be located in

EU and guarantee a very high availability. The third

requirement demands that if a node is managed by

prov1, it must be of type cloud. The last requirement

prohibits clinical to be stored at a node managed

by prov3 and that is of type cloud.

The resource-level requirements associated with a

resource r restrict the possible nodes that can be con-

sidered for allocating the original version of r and

its replicas, based on the values the nodes assume

for the attributes that characterize them. Regardless

of the approach adopted for specifying and enforcing

resource-level requirements, we call a node v that sat-

isﬁes all requirements speciﬁed for a resource r as an

acceptable node for r. For example, with reference to

the nodes in Figure 2, the resources in Figure 3 and

the requirements in Figure 4(b), the set of acceptable

nodes for clinical includes v

, v

, and v

. On

the contrary, for example, node v

is not acceptable

for clinical since its characteristics do not satisfy

the second requirement of Figure 4(b) (it is not lo-

cated in EU nor has a very high availability).

Given a set of nodes and a set of resources with

resource-level requirements, the identiﬁcation of the

Nodes

Resources v

clinical ✓ ✓ ✓ ✓

insurance ✓ ✓ ✓ ✓ ✓ ✓

equipment ✓ ✓ ✓ ✓

research ✓ ✓

staff ✓ ✓ ✓ ✓

admin ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓

payroll ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓

Figure 5: An example of acceptable nodes for the resources

in Figure 3.

acceptable nodes for each resource can be automat-

ically computed in different ways (e.g., leveraging

a Boolean formulation of the problem as proposed

in (De Capitani di Vimercati et al., 2021b)). Figure 5

reports an example of acceptable nodes, among those

considered in our running example in Figure 2, for the

resources in Figure 3.

4 REPLICA AND ALLOCATION

CONSTRAINTS

Replica and allocation constraints are used to spec-

ify requirements that refer to the overall allocation of

sets of resources, and of the replicas. They are en-

forced to determine which resource will be allocated

to which node, among the acceptable ones for that re-

source. While the resource-level requirements illus-

trated in Section 3 specify, for each resource, a set

of conditions that identify acceptable nodes, the main

goal of the constraints introduced in this section is to

force the joint/separate allocation of sets of resources

(e.g., the replicas of a resource and its original ver-

sion) to nodes not based on the characteristics of the

nodes, bur rather on the resources themselves.

We introduce and deﬁne 8 kinds of replica

and allocation constraints: together, together

∗

all together, not together, not together

∗

split, all split, and alone. Constraints

together, together

∗

, and all together force the

joint allocation of resources on the same node. Con-

straints not together and not together

∗

force

fragmentation of resources across different nodes.

Constraints split, all split force different allo-

cations for the original version and the replica of a

resource. Constraint alone force the allocation of a

resource to be different from any other resource. In

the remainder of this section, we formally illustrate

and discuss these constraints. For simplicity, we will

denote with r and s two resources in R, with ρ an in-

stance of r in R(r), and with σ an instance of s in

R(s).

CLOSER 2025 - 15th International Conference on Cloud Computing and Services Science

: together(equipment

,clinical

)

: together

∗

(staff,research)

: all together(staff,payroll)

: not together(payroll

,insurance

)

: not together(payroll

,insurance

)

: not together(payroll

,insurance

)

: not together

∗

(insurance,clinical)

: split(clinical)

: all split(staff)

: all split(insurance)

: alone(admin

)

: alone(admin

)

Figure 6: An example of a set of replica and allocation con-

straints for the resources in Figure 3.

• together(ρ, σ). A together constraint is de-

ﬁned over two speciﬁc versions ρ ∈ R(r) and

σ ∈ R(s) of two resources r and s, and demands

that they are both allocated to the same node (e.g.,

to account for the fact that those resources are of-

tentimes accessed together, or combined).

Formally, a together(ρ, σ) constraint is satisﬁed

by an allocation λ iff λ(ρ) = λ(σ). This constraint

can be formulated by the owner to restrict the allo-

cation of a pair of resources and, as such, demands

the identiﬁcation of two speciﬁc versions ρ of r

and σ of s to be involved in the constraint. In other

words, this constraint demands the speciﬁcation

of which versions of r and s are to be allocated

together. For example, constraint c

in Figure 6

requires the original versions of equipment and

of clinical to be allocated on the same node,

modeling the fact that these two resources are of-

ten to be accessed together.

• together

∗

(r,s). A together

∗

constraint is de-

ﬁned over two generic resources r and s, and de-

mands that at least one version of r be allocated

to the same node of at least one version of s. In

other words, it requires the allocation to include

a node that stores at least one version of r and

one version of s. We expect this constraint to be

formulated, similarly to the together constraint,

over resources that are expected to be frequently

accessed together. The main difference in seman-

tics with the together constraint is that, in this

case, it is not necessary to specify which versions

of the two resources must be placed together.

Formally, a together

∗

(r,s) is satisﬁed by an allo-

cation λ iff ∃ ρ, σ s.t. ρ ∈ R(r), σ ∈ R(s) : λ(ρ) =

λ(σ). For example, constraint c

in Figure 6 re-

quires that at least one version of staff and at

least one version of research be allocated to the

same node.

• all together(r,s). An all together con-

straint is deﬁned over two generic resources r and

s, and demands that whenever a node stores a ver-

sion of r, it also stores a version of s. In other

words, it guarantees that r is always accompanied

by s. Note the difference in semantics with the

together

∗

constraint, which is also formulated

over two generic versions of a pair of resources,

but is satisﬁed when at least one pair of versions

are allocated to the same node.

Formally, an all together(r,s) is satisﬁed by

an allocation λ iff ∀ρ ∈ R(r), ∃σ ∈ R(s) : λ(ρ) =

λ(σ). For example, constraint c

in Figure 6

requires that, when a node stores a version of

staff, it also stores at least one version of

payroll. We note that when |R(r)| > |R(s)|,

then the satisfaction of this constraint inevitably

implies that more than one replica of r be allo-

cated to the same node (as clearly no more than

|R(s)| nodes can be considered for satisfying this

constraint).

• not together(ρ, σ). A not together con-

straint is deﬁned over two speciﬁc versions ρ ∈

R(r) and σ ∈ R(s) of two resources r and s, and

demands that they are not allocated to the same

node.

Formally, a not together(ρ, σ) constraint is sat-

isﬁed by an allocation λ iff λ(ρ) ̸= λ(σ). Like the

together constraint, this constraint can be for-

mulated by the owner to restrict the allocation of a

pair of resources and, as such, demands the identi-

ﬁcation of two speciﬁc versions ρ of r and σ of s to

be speciﬁed in the constraint. In other words, this

constraint demands the speciﬁcation of which ver-

sions of r and s should not be allocated together.

This general constraint is expected to be formu-

lated for performance reasons: for example, to en-

sure concurrent access to both resources reducing

delays and bottlenecks that may be experienced

if ρ and σ are large and the available nodes do

not exhibit excellent performance. The constraint

can also be used to guarantee service continuity in

case of node failure: for example, if ρ and σ are

allocated at different nodes, the unavailability of ρ

compromises only the functions that directly de-

pend on ρ, while still permitting all the activities

that rely on σ. For example, constraints c

, c

and

in Figure 6 demand that the original version of

payroll is not allocated together with instances

of insurance.

• not together

∗

(r,s). A not together

∗

con-

straint is deﬁned over two generic resources r and

Security-Aware Allocation of Replicated Data in Distributed Storage Systems

s, and demands that no version of s be allocated to

a node that stores a version of s, and vice versa.

In other words, it requires the allocation not to

include a node that stores one version of r and

one version of s. Note the difference in seman-

tics with the not together constraint, which is

formulated over two speciﬁc versions of a pair of

resources. In particular, while the not together

constraint can be formulated for performance rea-

sons, a not together

∗

constraint models conﬁ-

dentiality constraints, sets (pairs in this case) of

resources that should never be visible together as

their association is considered sensitive. Note that

a not together

∗

constraint can also be formu-

lated as a set of not together constraints (one

for each combination of a version of r and a ver-

sion of s).

Formally, a not together

∗

(r,s) is satisﬁed by an

allocation λ iff ∀ρ ∈ R(r), ∀σ ∈ R(s) : λ(ρ) ̸=

λ(σ). For example, constraint c

in Figure 6

prevents having, on the same node, versions of

insurance and versions of clinical, account-

ing for the fact that these two resources should not

be visible together.

• split(r). A split constraint is formulated over

a generic version of a resource r, and demands

that all replicas of r be not allocated to the node

to which the original version r

is allocated. This

constraint can be formulated for security reasons,

to improve availability and resilience to node fail-

ures. Note that a split constraint can also be for-

mulated as a set of not together constraints of

the form not together(r

, r

), for each r

∈ R(r)

with i ̸= 0.

Formally, a split(r) constraint is satisﬁed by an

allocation λ iff ∀ρ ∈ R(r) \ {r

} : λ(ρ) ̸= λ(r

For example, constraint c

in Figure 6 requires all

replicas of clinical to be allocated to a node dif-

ferent from that to which clinical

is allocated.

• all split(r). An all split constraint is for-

mulated over a generic version of a resource r,

and extends the split constraint demanding that

no two versions (original nor replicas) of r be al-

located to the same node. The satisfaction of this

constraint can further increase availability and re-

silience to node failures (each version of a re-

source is allocated to a different node), while pos-

sibly requiring a large number of nodes to be in-

cluded in the allocation. A all split constraint

can also be formulated as a set of not together

constraints (one for each combination of two ver-

sions of r).

Formally, an all split constraint is satisﬁed by

an allocation λ iff ∀ρ, σ ∈ R(r), ρ ̸= σ : λ(ρ) ̸=

λ(σ). For example, constraint c

, resp.)

in Figure 6 requires all versions of staff (of

insurance, resp.) to be spread across different

nodes.

• alone(ρ). An alone constraint is formulated

over a speciﬁc version of a resource ρ ∈ R(r)

and requires it is not allocated to a node where

other resources (be them original or replicas) as

well as replicas of r are also allocated. In other

words, it requires ρ to be allocated alone to a

node. An alone constraint can also be formulated

as a set of not together constraints of the form

not together(ρ, x), with x any possible instance

of any possible resource.

Formally, an alone(ρ) constraint is satisﬁed by

an allocation λ iff ∀σ ∈ R \ {ρ} : λ(ρ) ̸= λ(σ).

For example, constraints c

and c

in Figure 6

require, respectively, the original version and the

replica of admin to be allocated to nodes where

no other resources are allocated.

5 OPTIMAL ALLOCATION

In this section, we illustrate our notion of optimal al-

location of resources to nodes (Section 5.1), and illus-

trate a binary programming-based modeling that per-

mits its computation (Section 5.2).

5.1 Problem Deﬁnition

We are interested in computing an allocation that sat-

isﬁes all the requirements speciﬁed for the resources

to be outsourced: in other words, we are interested in

an allocation that satisﬁes all resource-level require-

ments (and hence for which each original resource

and each replica ρ is allocated at a node that is accept-

able for ρ, Section 3), and which satisﬁes all replica

and allocation constraints (Section 4). We deﬁne such

an allocation as a correct allocation. Given a set C

of resource-level requirements and of replica and al-

location constraints, we denote the correctness of an

allocation λ w.r.t. C with notation λ |= C .

In principle, given a set of resources with a set

of resource-level requirements and replica and allo-

cation constraints, there may exist different correct

allocations, possibly characterized by different eco-

nomic costs (as different nodes/providers may charge

different costs for providing their services). We then

aim at computing an optimal allocation of resources

to nodes, meaning an allocation that, besides being

CLOSER 2025 - 15th International Conference on Cloud Computing and Services Science

correct, also minimizes the overall economic cost of

the allocation.

Given a set R of resources and a set V of nodes,

the overall cost cost(λ) of an allocation λ : R → V is

therefore estimated as cost(λ) =

∑

ρ∈R

cost(ρ, λ(ρ)).

The problem of determining an optimal allocation can

therefore be formulated as follows.

Problem 5.1 (Optimal allocation). Given a set R of

resources associated, a set C of resource-level re-

quirements and of replica and allocation constraints,

and a set V of nodes, determine an allocation λ :

R →V of resources to nodes such that i) λ |= C (i.e.,

the allocation is correct); and ii) ∄λ

′

: R →V , λ

′

̸=λ,

such that λ

′

|= C and cost(λ

′

) < cost(λ).

5.2 Computing an Optimal Allocation

To compute an optimal allocation, we interpret Prob-

lem 5.1 as a binary programming problem, which is

formulated as follows: given a set of binary variables,

a set of constraints over them, and an objective func-

tion, determine an assignment of values to variables

that i) satisﬁes all the constraints; and ii) minimizes

the value of the objective function. Our interpretation

of Problem 5.1 therefore assumes: the objective func-

tion as the cost function of the allocation to be min-

imized; the constraints as the resource-level require-

ments and replica and allocation constraints, which

are to be satisﬁed; and the binary variables as a set of

variables, which we now illustrate, modeling the al-

location (or non-allocation) of each resource ρ ∈ R

to each node v ∈ V . In particular, for each resource

and each node v

we deﬁne a Boolean variable a

i,z

that has value 1 iff resource ρ

is allocated to node v

0 otherwise (i.e., a

i,z

= 1 ⇐⇒ λ(ρ

) = v

Having introduced our variables, we can now il-

lustrate the constraints, as well as the objective func-

tion, needed for the computation of a solution to our

binary programming interpretation of Problem 5.1. In

the following formulation, we use variable ¯a to model

acceptable allocations (i.e., allocations that do not vi-

olate any resource-level requirement, see Section 3):

given a resource ρ

and a node v

, ¯a

i,z

= 1 iff v

is an

acceptable node for ρ

Constraints. Our constraints need to guide assign-

ment of values to our binary variables a

i,z

, for all

∈ R and v

∈ V , such that the assignment: i) repre-

sents an allocation; ii) allocates each resource to one

among its acceptable nodes; and iii) satisﬁes all the

replica and allocation constraints.

• Allocation. Each resource must be allocated to

exactly one node. We model this requirement with

the following constraint:

|R |

∏

i=1

|V |

∑

z=1

i,z

= 1

For this constraint to assume value 1, for each re-

source ρ

, the innermost sum (over all a

i,z

over all

nodes v

) must be equal to 1. This requires that,

for each resource ρ

, there is one and only one

node v

for which a

i,z

= 1, hence implying that

each resource is allocated to exactly one node.

• Acceptable nodes. Each resource must be allo-

cated to an acceptable node. We model this re-

quirement with the following constraint:

|R |

∏

i=1

|V |

∑

z=1

¯a

i,z

· a

i,z

= 1

This constraint extends the allocation constraint

by considering for each resource, in the innermost

sum, the product ¯a

i,z

·a

i,z

. This product is equal to

1 iff ρ

is allocated to a node v

and v

is an ac-

ceptable node for ρ

. By requiring the sum over

all nodes of this product to be equal to 1, for each

resource, this constraint then implies that all re-

sources are allocated to acceptable nodes.

• together requirement. Given two speciﬁc ver-

sions ρ

∈ R(r) and ρ

∈ R(s) of two resources

r, s ∈ R involved in a together requirement, an

allocation must place both resources on the same

node. We model this requirement with the fol-

lowing constraint, deﬁned for each together re-

quirement:

∀t = together(ρ

, ρ

) :

|V |

∑

z=1

i,z

· a

j,z

) = 1

Intuitively, given a node v

and two resources ρ

and ρ

, we have that (a

i,z

· a

j,z

) equals 1 iff v

stores both ρ

and ρ

. Demanding the sum over

all nodes to be equal to 1 implies that there is ex-

actly one node that stores both ρ

and ρ

• together

∗

requirement. Given two generic re-

sources r and s involved in a together

∗

require-

ment, an allocation must place at least one ver-

sion of r and at least one version of s on the same

node. We model this requirement with the fol-

lowing constraint, deﬁned for each together

∗

re-

quirement:

∀t

∗

= together

∗

(r, s) :

|V |

∑

z=1







∑

∈R(r),

∈R(s)

i,z

· a

j,z

)







≥ 1

Security-Aware Allocation of Replicated Data in Distributed Storage Systems

Given a node v

, the sum over all versions ρ

and

of r and s, respectively, of (a

i,z

· a

j,z

) is greater

than or equal to 1 if v

stores at least one version

of r and ρ

of s.

• all together requirement. Given two generic

resources r and s involved in an all together

requirement, whenever a node v

stores a version

of r, then it must also store at least one version

of s. We model this requirement with the follow-

ing constraint, deﬁned for each all together re-

quirement:

∀at = all together(r, s) :

∏

∈R(r)





|V |

∑

z=1

i,z

· (

∑

∈R(s)

j,z

)





≥ 1

For this constraint to hold, the parameter of the

outermost product must be greater than or equal

to 1 for all versions ρ

∈ R(r) of r. Consider

a speciﬁc version ρ

of r and a node v

: a

i,z

(

∑

∈R(s)

j,z

) is equal to or greater than 1 if v

stores ρ

and at least one version ρ

of s. The in-

termediate sum over all nodes ensures that there

is at least one node for which the condition holds.

The outermost product over all versions of r then

ensures that the overall condition is satisﬁed for

all versions of r.

• not together requirement. Given two speciﬁc

versions ρ

∈ R(r) and ρ

∈ R(s) of two re-

sources r, s ∈ R involved in a not together re-

quirement, an allocation cannot place both re-

sources on the same node. We model this re-

quirement with the following constraint, deﬁned

for each not together requirement:

∀nt = not together(ρ

, ρ

) :

|V |

∑

z=1

i,z

· a

j,z

) = 0

Given a node v

, we have that (a

i,z

· a

j,z

) equals

0 iff at least one among a

i,z

and a

j,z

is equal to 0

(i.e., at least one among ρ

and ρ

is not stored at

). Requiring the sum over all nodes to be equal

to 0 then implies that no node stores at the same

time ρ

and ρ

• not together

∗

requirement. Given two re-

sources r and s involved in a not together

∗

requirement, an allocation cannot place a version

of r on a same node that stores a version of s. We

model this requirement with the following con-

straint, deﬁned for each not together

∗

re-

quirement:

∀nt

∗

= not together

∗

(r, s) :

|V |

∑

z=1







∑

∈R(r),

∈R(s)

i,z

· a

j,z







= 0

Given a node v

, requiring the sum over all ver-

sions of r and s of a

i,z

· a

j,z

(i.e., the innermost

sum) to be equal to 0 implies that no node v

stores

at the same time a version of r and a version of s.

The outermost sum enforces this guarantee for all

nodes.

• split requirement. Given a resource r, an allo-

cation must not place the original version r

the nodes storing the replicas. We model this re-

quirement with the following constraint, deﬁned

for each split requirement:

∀sp = split(r) :

|V |

∑

z=1





∑

∈R(r)\{r

}

i,z





= 0

Given a node v

, a



∑

∈R(r)\{r

}

i,z



is equal

to 0 if the original r

is not allocated to the same

node as a replica ρ

∈ R(r) \ {r

} of r. The outer-

most sum enforces this guarantee for all nodes.

• all split requirement. Given a resource r, an

allocation must not place two versions of r on the

same node. We model this requirement with the

following constraint, deﬁned for each all split

requirement:

∀as = all split(r) :

|V |

∑

z=1







∑

,ρ

∈R(r),

̸=ρ

i,z

· a

j,z







= 0

Similarly to the previous constraint, the innermost

sum guarantees that, given a node v

, no pair of

versions of r are allocated together at v

. The out-

ermost sum enforces this guarantee for all nodes.

• alone requirement. Given a speciﬁc version ρ

of a resource r ∈ R, an allocation must not place

it on a node storing also another version of an-

other resource. We model this requirement with

the following constraint, deﬁned for each alone

requirement:

CLOSER 2025 - 15th International Conference on Cloud Computing and Services Science

Nodes

Resources v

clinical

✓

clinical

✓

insurance

✓

insurance

✓

insurance

✓

equipment

✓

research

✓

research

✓

staff

✓

staff

✓

admin

✓

admin

✓

payroll

✓

payroll

✓

Figure 7: Optimal allocation of the resources in Figure 3 to

the nodes in Figure 2 (acceptable nodes are denoted with a

colored background).

∀a = alone(α) :

|V |

∑

z=1





i,z

∑

∈{R \{ρ

}}

j,z





= 0

Given a node v

, a

i,z

∑

∈{R \{ρ

}}

j,z

is equal to

0 if v

does not store both ρ

and a version of an-

other resource. The outermost sum enforces this

guarantee for all nodes.

Objective Function. The objective function of our

binary programming problem, which needs to be min-

imized, models the cost of the allocation represented

by the allocation variables a

i,z

. The allocation cost il-

lustrated in Section 5.1 is therefore computed taking

into account the values assumed by a

i,z

, to guarantee

equivalence between the binary programming prob-

lem and Problem 5.1. The objective function is de-

ﬁned as follows:

min

|R |

∑

i=1

|V |

∑

z=1

i,z

· size(ρ

) · price(v

)

where size(ρ

) represents the size of resource ρ

(e.g.,

in GB) and price(v

) represents the unitary cost (e.g.,

in USD/GB) to be paid for using v

. Note that, for

simplicity and in line with previous works (e.g., (De

Capitani di Vimercati et al., 2021a)), in our exam-

ples we consider a linear cost function and estimate

the cost cost(ρ, v) of allocating resource ρ to node v

by multiplying the size of ρ (e.g., in GB) by the uni-

tary storage price (e.g., in USD/GB) that is applied

by v, with the note that such an estimate may con-

sider different cost factors (e.g., transfer costs) and/or

cost functions.

Figure 7 illustrates an optimal allocation com-

puted over the resources in Figure 3 and the accept-

able plans in Figure 5, considering the replica and al-

location constraints in Figure 6.

The problem can be efﬁciently solved using off-

the-shelf optimization solvers; for instance, we im-

plemented our approach using Google OR-Tools,

demonstrating its effectiveness in computing optimal

allocations respecting all speciﬁed constraints.

6 RELATED WORK

This work contributes to different lines of research at

the intersection between distributed storage systems

and data allocation methodologies, supporting owner-

speciﬁed requirements in storage selection.

The majority of works addressing the problem of

computing optimal allocations in distributed systems

do not take into consideration speciﬁc requirements

imposed by data owners, but rather focus on guar-

anteeing recovery possibility (e.g., (Jakoveti

c et al.,

2015; Leong et al., 2012; Roshandeh et al., 2017;

Bhattacharya et al., 2019; Peng et al., 2021)). For ex-

ample, the approach in (Leong et al., 2012) introduces

coding strategies to store data optimally for maximiz-

ing reliability in terms of recovery possibility. The

approach in (Roshandeh et al., 2017) focuses on het-

erogeneous data in distributed storage, proposing an

approach that assumes data of the same type are min-

imally spread across nodes. The approach in (Bhat-

tacharya et al., 2019) aims to balance storage and net-

work costs, focusing on bandwidth-constrained envi-

ronments. The approach in (Jakoveti

c et al., 2015)

proposes a distributed algorithm for optimizing data

allocations based on local communication between

nodes, ensuring data retrieval is achieved with nodes

only communicating with neighboring ones. The ap-

proach in (Peng et al., 2021) focuses instead on de-

termining an optimal subset of nodes for storing data.

A distributed storage system for edge and IoT devices

with proof of replication is proposed in (Wu et al.,

2022). Our work differs from those falling in this line

of research due to the fact that we are not concerned

with coding and fragmenting data across nodes, as we

aim at supporting heterogeneous architectures (e.g.,

cloud, fog, and edge) and complex owner-deﬁned

constraints, regulating allocations of data and replica

to optimally place them in a distributed architecture.

Another related line of work addresses the prob-

lem of cloud plan selection, possibly in the context of

multi-cloud scenarios and possibly supporting owner-

speciﬁed requirements. Approaches that permit own-

ers to specify resource-level requirements are orthog-

Security-Aware Allocation of Replicated Data in Distributed Storage Systems

onal to our work and can be used to determine ac-

ceptable nodes on which then computing optimal al-

locations. Other approaches falling in this category

do not typically support arbitrary owner-speciﬁed re-

quirements or do not consider replica management,

which is a primary objective of our work. The ap-

proach in (Ruiz-Alvarez and Humphrey, 2012) pro-

poses an automated solution for selecting storage

providers. While sharing with our work some support

for user-speciﬁed desiderata, this approach does not

explicitly support the management of replicas and of

their allocation, nor the management of complex con-

straints regulating the joint/disjoint allocation of data

to the same provider, and has a speciﬁc focus on cost

and performance factors. The approaches proposed

in (Dastjerdi and Buyya, 2014; Esposito et al., 2016;

De Capitani di Vimercati et al., 2019) adopt fuzzy

logic and reasoning for cloud providers selection. In

particular, fuzzy logic is adopted in (De Capitani di

Vimercati et al., 2019) for language. The approach

in (De Capitani di Vimercati et al., 2021b) proposes

a speciﬁcation language for permitting owners to for-

mulate requirements and preferences for cloud plan

selection and formal model and different strategies

for reasoning on requirements, preferences, and ac-

ceptable plans. These approaches are complementary

to ours, and can be used for the identiﬁcation of ac-

ceptable nodes that are then used to compute our op-

timal allocations. The work presented in (Taha et al.,

2017), while aligning with our objective of facilitating

the selection of multiple providers or services, bases

its requirements on the importance levels assigned to

predeﬁned Service Level Objectives (SLOs). Other

related efforts focus on solving various aspects of re-

source allocation optimization, such as considering

provider workloads (Wendell et al., 2010), address-

ing fault tolerance mechanisms (Jhawar et al., 2013),

leveraging multi-cloud solutions for application de-

velopment and management (Ferry et al., 2018), and

integrating multi-cloud storage systems with existing

NAS-based programs (Chen and Zadok, 2019).

Related yet orthogonal works have investigated

approaches for permitting owners effectively protect-

ing and securely deleting resources while relying on

decentralized cloud services for storage. For exam-

ple, the approach in (Bacis et al., 2020) combines

All-Or-Nothing-Transform for strong resource pro-

tection, and ad-hoc strategies for slicing resources

and for their decentralized allocation in the storage

network. Our work builds on existing research in

distributed storage optimization and multicloud data

placement, extending their applicability to modern ar-

chitectures (e.g., fog and edge computing) and ensur-

ing full support for obeying owner-speciﬁed require-

ments and managing replica, while ensuring eco-

nomic efﬁciency.

7 CONCLUSIONS

This paper presented an approach to the optimal allo-

cation of data and replicas in distributed storage sys-

tems, with a focus on balancing economic cost and

operational constraints. Our approach can ﬁt diverse

architectural environments (e.g., cloud, fog, and edge

computing) by generalizing the concept of a “node”,

and permitting to accommodate varying requirements

and characteristics. A peculiarity of our approach

consists in permitting owners to specify in a friendly

way, and have enforced in the computation of the al-

location, complex requirements and constraints that

take into consideration both the characteristics of the

storage nodes, as well as the interplay among the al-

locations of different data items and replicas. To en-

able the computation of optimal allocations, satisfy-

ing all constraints while minimizing the overall eco-

nomic cost entailed by the allocation, this work pro-

posed a formulation of the problem in terms of a bi-

nary programming problem, which can then be efﬁ-

ciently solved leveraging off-the-shelves solvers. Fur-

ther research will include the deﬁnition of additional

constraints that could be speciﬁed and taken into con-

sideration when computing allocations.

ACKNOWLEDGEMENTS

This work was supported in part by the EC under

project GLACIATION (101070141), by the Italian

MUR under PRIN projects POLAR (2022LA8XBH)

and KURAMi (20225WTRFN), and by project SER-

ICS (PE00000014) under the MUR NRRP funded by

the EU - NGEU. Views and opinions expressed are

however those of the authors only and do not neces-

sarily reﬂect those of the European Union or the Ital-

ian MUR. Neither the European Union nor the Italian

MUR can be held responsible for them.

REFERENCES

Bacis, E., De Capitani di Vimercati, S., Foresti, S., Para-

boschi, S., Rosa, M., and Samarati, P. (2020). Se-

curing resources in decentralized cloud storage. IEEE

TIFS, 15(1):286–298.

Bhattacharya, H., Chattopadhyay, S., Chattopadhyay, M.,

and Banerjee, A. (2019). Storage and bandwidth op-

CLOSER 2025 - 15th International Conference on Cloud Computing and Services Science

timized reliable distributed data allocation algorithm.

IJACI, 10(1):78–95.

Chen, M. and Zadok, E. (2019). Kurma: Secure geo-

distributed multi-cloud storage gateways. In Proc. of

ACM SYSTOR 2019, Haifa, Israel.

Dastjerdi, A. V. and Buyya, R. (2014). Compatibility-aware

cloud service composition under fuzzy preferences of

users. IEEE TCC, 2(1):1–13.

De Capitani di Vimercati, S., Foresti, S., Livraga, G., Pi-

uri, V., and Samarati, P. (2019). A fuzzy-based bro-

kering service for cloud plan selection. IEEE ISJ,

13(4):4101–4109.

De Capitani di Vimercati, S., Foresti, S., Livraga, G., Pi-

uri, V., and Samarati, P. (2021a). Security-aware

data allocation in multicloud scenarios. IEEE TDSC,

18(5):2456–2468.

De Capitani di Vimercati, S., Foresti, S., Livraga, G., Piuri,

V., and Samarati, P. (2021b). Supporting user require-

ments and preferences in cloud plan selection. IEEE

TSC, 14(1):274–285.

Esposito, C., Ficco, M., Palmieri, F., and Castiglione, A.

(2016). Smart cloud storage service selection based

on fuzzy logic, theory of evidence and game theory.

IEEE TC, 65(8):2348–2362.

Ferry, N., Chauvel, F., Song, H., Rossini, A., Lushpenko,

M., and Solberg, A. (2018). CloudMF: Model-Driven

management of multi-cloud applications. ACM TOIT,

18(2):16:1–16:24.

Jakoveti

c, D., Minja, A., Bajovi

c, D., and Vukobra-

tovi

c, D. (2015). Distributed storage allocations for

neighborhood-based data access. In Proc. of IEEE

ITW 2015, Jerusalem, Israel.

Jhawar, R., Piuri, V., and Santambrogio, M. (2013). Fault

tolerance management in cloud computing: a system-

level perspective. IEEE ISJ, 7(2):288–297.

Leong, D., Dimakis, A. G., and Ho, T. (2012). Distributed

storage allocations. IEEE TIT, 58(7):4733–4752.

Peng, P., Noori, M., and Soljanin, E. (2021). Distributed

storage allocations for optimal service rates. IEEE

TCOM, 69(10):6647–6660.

Roshandeh, K. P., Noori, M., Ardakani, M., and Tellam-

bura, C. (2017). Distributed storage allocation for

multi-class data. In Proc. of IEEE ISIT 2017, Aachen,

Germany.

Ruiz-Alvarez, A. and Humphrey, M. (2012). A model and

decision procedure for data storage in cloud comput-

ing. In Proc. of IEEE/ACM CCGrid 2012, Ottawa,

Canada.

Russo Russo, G., Ferrarelli, D., Pasquali, D., Cardellini, V.,

and Lo Presti, F. (2024). QoS-aware ofﬂoading poli-

cies for serverless functions in the Cloud-to-Edge con-

tinuum. FGCS, 156:1–15.

Taha, A., Manzoor, S., and Suri, N. (2017). SLA-based ser-

vice selection for multi-cloud environments. In Proc.

of IEEE EDGE 2017, Honolulu, HI, USA.

Wendell, P., Jiang, J. W., Freedman, M. J., and Rexford, J.

(2010). DONAR: Decentralized server selection for

cloud services. In Proc. of ACM SIGCOMM 2010,

New Delhi, India.

Wu, C., Chen, Y., Qi, Z., and Guan, H. (2022). DSPR:

Secure decentralized storage with proof-of-replication

for edge devices. JSA, 125:102441.

Security-Aware Allocation of Replicated Data in Distributed Storage Systems