Tracking Data Trajectories in IoT
Chiara Bodei
1
and Letterio Galletta
2
1
Dipartimento di Informatica, Universit
`
a di Pisa, Italy
2
IMT Institute for Advanced Studies Lucca, Italy
Keywords:
IoT, Static Analysis.
Abstract:
The Internet of Things (IoT) devices access and process large amounts of data. Some of them are sensitive
and can become a target for security attacks. As a consequence, it is crucial being able to trace data and to
identify their paths. We start from the specification language IOT-LYSA, and propose a Control Flow Analysis
for statically predicting possible trajectories of data communicated in an IoT system and, consequently, for
checking whether sensitive data can pass through possibly dangerous nodes. Paths are also interesting from
an architectural point of view for deciding which are the points where data are collected, processed, commu-
nicated and stored and which are the suitable security mechanisms for guaranteeing a reliable transport from
the raw data collected by the sensors to the aggregation nodes and to servers that decide actuations.
1 INTRODUCTION
In the Internet of Things (IoT), things are smart
and interconnected devices that generate and trans-
mit huge amounts of data over the net. Managing
data is more complex than in traditional systems (see
e.g. (Abu-Elkheir et al., 2013)): it consists in a pro-
duction chain that starts from raw data collected by
sensors, continues with aggregation nodes and pos-
sibly ends with servers that process data and decide
consequent actuations. Secure communication of data
is even more crucial in IoT scenarios, especially in
multi-hop communications, where single nodes can
be physically attacked and data can be eavesdropped
or altered in passing. Thus it is important that IoT
systems (i.e. networks of nodes, where each node in-
teracts with the environment through sensors and ac-
tuators) are aware of the provenience and the trajecto-
ries of its data, especially when they are sensitive or
when they impact critical decisions, such as stopping
an industrial plant or the irrigation of a crop that uses
precision agriculture technologies.
Usually, formal methods offers designers tools to
support the development of systems and to reason
about their properties. We follow this line of re-
search by presenting preliminary results about using
static analysis to study data trajectories in IoT sys-
tems. Technically, we start from the formal specifica-
tion language IOT-LYSA, a process calculus recently
proposed for IoT systems (Bodei et al., 2016b; Bodei
et al., 2017). IOT-LYSA may help designers to adopt
a Security by Design development model. Indeed, de-
signers can exploit the calculus to model the structure
of the system and how its components (smart objects)
interact with each other. Furthermore, they can reason
about the system correctness and robustness by us-
ing the Control Flow Analysis (CFA) of IOT-LYSA.
This static analysis safely approximates the system
behaviour, by predicting how data from sensors may
spread across the system and how objects may inter-
act. Technically, it “mimics” the evolution of the sys-
tem, by using abstract values in place of concrete ones
and by modelling the consequences of each possible
action. Designers can detect possible security vulner-
abilities, inspecting this “abstract simulation” and in-
tervene as early as possible during the design phase.
Here, we propose a variant of this CFA for data
path analysis. This analysis predicts how data flows
from specific data sources and which are their possi-
ble trajectories across the network nodes. It is then
possible to investigate whether the predicted trajecto-
ries include nodes considered potentially dangerous
from a security point of view. Moreover, it is possible
to observe the trajectories of data used to make de-
cisions in critical points of the system specification.
Consequently, our analysis results may help design-
ers in making educated decisions, on the exposure of
both raw and aggregated data.
Because of over-approximatation if the predicted
trajectories do not include dangerous nodes, we can
be sure that at run time they will never be crossed. If
instead they do, there is only the possibility of passing
572
Bodei, C. and Galletta, L.
Tracking Data Trajectories in IoT.
DOI: 10.5220/0007578305720579
In Proceedings of the 5th International Conference on Information Systems Security and Privacy (ICISSP 2019), pages 572-579
ISBN: 978-989-758-359-9
Copyright
c
2019 by SCITEPRESS Science and Technology Publications, Lda. All rights reserved
by risky nodes, but it can be worthwhile to further
investigate.
The paper is organised as follows. In Section 2,
we introduce our methodology with an illustrative ex-
ample. In Section 3 we briefly recall the process cal-
culus IOT-LYSA. In Section 4 we define the CFA, and
reason on data trajectories. We conclude in Section 5.
2 AN IRRIGATION SYSTEM
In this section we illustrate our methodology through
a simple yet realistic scenario similar to the one intro-
duced in (Bodei et al., 2018). We consider a smart
agricultural irrigation system, based on a Wireless
Sensor Network, for monitoring and irrigating grapes
crops. The main task of the system is to regulate irri-
gation according to evapo-transpiration (ET), a vari-
able parameter that measures the crop water demand,
which depends on several factors (e.g. weather, soil
moisture, kind of plant and stage of development). Ir-
rigation is needed when ET exceeds the supply of wa-
ter coming from the soil or from precipitations.
In our model, a combination of wired and wire-
less sensors collects soil data like pH, moisture, tem-
perature and so on. The collected data are sent in a
multi-hop manner to a base station node. The base
station node performs a first elaboration of data and
then transmits the results to the remote server. The
aggregated data are further processed and stored. The
server decides the suitable irrigating actions for each
sub-area in the crop: e.g. if, in one of them, the level
of soil moisture goes down a given threshold then
sprinkler actuators are activated. Users can directly
access the server data at any location.
The corresponding model in IOT-LYSA, de-
scribed in Table 1, is a pool of nodes running in
parallel (this is the meaning of the parallel compo-
sition operator |). Some of the terms are enriched
with labels and tags that support the CFA and whose
meaning will be clarified in the next section. Each
node, uniquely identified by a label , consists of
control processes and, possibly of sensors and actu-
ators. Communication is multi-party: each node can
send information to a set of nodes, provided that they
are in the same transmission range. The commu-
nication patterns are not too complicate, so the ex-
ample can serve the aim of illustrating our frame-
work. Outputs and inputs must match in order to
communicate. In more detail, output is modelled as
hhE
1
,··· ,E
k
ii L.P meaning that the tuple E
1
,··· ,E
k
is sent to the nodes with labels in L. Input is in-
stead modelled as (E
1
,··· ,E
j
; x
j+1
,··· ,x
k
).P and
embeds pattern matching. In receiving an output tu-
ple E
0
1
,··· ,E
0
k
of the same size (arity), the commu-
nication succeeds provided that the first j elements
of the output match the corresponding first elements
of the input (i.e. E
1
= E
0
1
,··· ,E
j
= E
0
j
), and then
the variables occurring in the input are bound to the
corresponding terms in the output. Each base sta-
tion node N
bs
i j
is connected to a bunch of sensors
S
1
bs
i j
,··· ,S
k
bs
i j
that sense the environment in the crop
sub-area controlled by the base station control pro-
cess P
bs
i j
and write the sensed values on its store. The
node also includes other components that we omit
because irrelevant here. Similarly, the action τ de-
notes internal actions of the sensor we are not inter-
ested in. The node N
bs
i
collects data z
l
of its sen-
sors, processes them with the help of filter and ag-
gregation functions f
1
,··· , f
m
and transmits their re-
sults to the Cluster Head node N
ch
j
. The commu-
nication is performed as explained above: e.g. the
output hh1, f
1
(z
1
···z
k
),··· , f
m
(z
1
···z
k
)ii {
ch
1
} per-
formed by P
bs
11
matches the input (1; x
1
1
···x
m
1
) per-
formed by P
ch
1
, and therefore the variable x
1
1
is bound
to f
1
(z
1
···z
k
), the variable x
2
1
is bound to f
2
(z
1
···z
k
)
and so on. The construct [...] implements the it-
erative behaviour of processes and of sensors. Each
node N
ch
j
controls a subset of the base station nodes
N
bs
i j
,··· ,N
bs
i j
that send it their data (x
i
stands for
the array (x
1
i
···x
m
i
)). The Cluster Head node re-
ceives these data, aggregates them with the functions
aggr
1
(),··· , aggr
r
(), and then sends the results to the
Server N
as
. The node N
as
processes the data sent by
all the Cluster Head nodes N
ch
1
,··· ,N
ch
n
and makes
its decision on irrigation. If the server detects that
some water is needed in the area controlled by the
node N
ch
j
(i.e. if the water demand w
j1
exceeds a
given threshold th
j1
), it sends a “start irrigation” or-
der, and conversely, when it detects that there is suf-
ficient water (i.e. if the water demand w
j1
is does not
exceed a given threshold th
0
j1
), it transmits a “stop ir-
rigation” order. The relevant Cluster Head node N
ch
j
transmits the received orders to the corresponding ac-
tuators A
j
triggering the irrigation sprinklers. Further-
more, users N
us
t
can access the data processed and
stored by the server N
as
, e.g. for checking whether
there are the conditions for particular manual pro-
cesses, like pruning. For the sake of simplicity, we
omit the specification of the components of the users’
nodes N
us
t
.
Wireless communication depends on the transmis-
sion range. Secure data transfer between the sensors
and the server, through base station nodes, is crucial.
Some nodes can be insecure and therefore may al-
ter or tamper data passing from there, thus potentially
impacting on the whole irrigation system. As a conse-
Tracking Data Trajectories in IoT
573
Table 1: Irrigation Control System N
as
| N
ch
1
| (N
bs
11
| ···N
bs
s1
) | ··· N
ch
n
| (N
bs
1n
| ·· ·N
bs
sn
) | N
us
1
| ·· ·N
us
h
.
Base Station Node i of Cluster Head j with i [1,s], l [1,k], j [1, n]
N
bs
i j
=
bs
i j
: [P
bs
i j
k S
1
bs
i j
··· k S
k
bs
i j
]
P
bs
i j
= [(z
1
:= 1
a
1i j
).·· · .(z
k
:= k
a
ki j
).hhi
b
i j
, f
1
(z
1
··· z
k
)
f
1i j
,·· · , f
m
(z
1
··· z
k
)
f
mi j
ii {
ch
j
}]
S
l
bs
i j
= [(τ.l
j
:= v
l j
).τ]
Cluster Head j with j [1,n]
N
ch
j
=
ch
j
: [P
ch
j
k P
0
ch
j
k A
j
]
P
ch
j
= [(1; x
1
1
··· x
m
1
)
X
j1
.·· ·.(s; x
1
s
··· x
m
s
)
X
js
.hh j
c
j
,aggr
1
(x
1
,·· · ,x
s
)
g
1 j
,·· · , aggr
r
(x
1
,·· · ,x
s
)
g
r j
ii {
as
}]
P
0
ch
j
= [( j; x)
X
0
j
.h j,xi]
A
j
= [(| j,{StartIrrigation,StopIrrigation}|)]
Server
N
as
=
as
: [Σ
as
k P
as,1
k ··· k P
as,n
k P
as,us
t
]
P
as,1
= [ (1;w
11
,·· · , w
1r
)
Y
1
k
(w
11
th
11
) ?
h1,StartIrrigationi :
(w
11
< th
0
11
) ?
h1,StopIrrigationi]
···
P
as,n
= [ (n;w
n1
,·· · , w
nr
)
Y
n
k
(w
n1
th
n1
) ?
hn,StartIrrigationi :
(w
n1
< th
0
n1
) ?
h1,StopIrrigationi]
User t with t [1, h] N
us
t
=
us
t
: [P
us
t
]
quence the grape crop can be also heavily damaged.
Since our analysis identifies the possible trajectories
of data in the system, we can check whether these tra-
jectories include dangerous nodes.
3 THE CALCULUS IOT-LYSA
Here, we briefly review the process calculus
IOT-LYSA (Bodei et al., 2016b; Bodei et al., 2017).
IOT-LYSA is an adaption of LYSA (Bodei et al.,
2005), a process algebra introduced to specify and
analyse cryptographic protocols and checking their
security properties (Gao et al., 2007; Gao et al., 2008).
Differently from other process calculus approaches
to IoT, e.g. (Lanese et al., 2013; Lanotte and Merro,
2016; Lanotte and Merro, 2018), IOT-LYSA aims at
providing a design framework that includes a static
semantics to support verification techniques and tools
for certifying properties of IoT applications.
Systems in IOT-LYSA consist of a finite number
of nodes, each of which hosts a store for internal com-
munication and a finite number of control processes
(representing the software), sensors and actuators. We
assume that each sensor (actuator) in a node with la-
bel is uniquely identified by an index i I
( j J
,
resp). Data are represented by terms. Annotations
a,a
0
,a
i
,..., ranged over by A, which identify the oc-
currences of terms, are used in the analysis and do
not affect the semantics. The syntax in presented in
Table 2.
We assume as given a finite set K of secret keys
owned by nodes, exchanged at deployment time in
a secure way, as it is often the case. Terms come
with annotations a A. The encryption function
{E
1
,··· ,E
r
}
k
0
returns the result of encrypting values
E
i
for i [1,r] under the shared key k
0
. We assume
to have perfect cryptography. The term f (E
1
,··· ,E
r
)
is the application of function f to r arguments; we as-
sume given a set of primitive functions, typically for
aggregating or comparing values. We assume the sets
V, I
, J
, K be pairwise disjoint.
A node : [B] is uniquely identified by a label L
that may represent further characterising information
(e.g. node location). Sets of nodes are described
through the (associative and commutative) operator |
for parallel composition. The system 0 has no nodes.
Inside a node : [B] there is a finite set of components
described by the parallel operator k. We impose that
there is a single store Σ
: X I
V , where X , V
are the sets of variables and of values resp.
The store is essentially an array whose indexes are
variables and sensors identifiers i I
. We assume
that store accesses are atomic, e.g. through CAS in-
structions (Herlihy, 1991). The other node compo-
nents are control processes P, and sensors S (less than
#(I
)), and actuators A (less than #(J
)) the actions of
which are in Act.
The prefix hhE
1
,··· ,E
r
ii L implements a simple
form of multi-party communication: the tuple ob-
tained by evaluating E
1
,...,E
r
is asynchronously sent
to the nodes with labels in L that are “compatible” (ac-
ICISSP 2019 - 5th International Conference on Information Systems Security and Privacy
574
Table 2: Syntax.
E 3 E ::= annotated terms M 3 M,N ::= terms
M
a
annotated term v value (v V )
with a A i sensor location (i I
)
x
{E
1
,·· · , E
r
}
k
0
encryption with key k
0
K
f (E
1
,·· · , E
r
) function on data
N 3 N ::= systems of nodes B 3 B ::= node components
0 empty system Σ
node store
: [B] single node ( L) P process
N
1
| N
2
par. composition S sensor (label i I
)
A actuator (label j J
)
B k B par. composition
P ::= control processes
0 inactive process
hhE
1
,·· · , E
r
ii L. P asynchronous multi-output L L
(E
1
,·· · , E
j
; x
j+1
,·· · , x
r
)
X
.P input (with matching and tag)
decrypt E as {E
1
,·· · , E
j
; x
j+1
,·· · , x
r
}
k
0
in P decryption with key k
0
(with match.)
E?P : Q conditional statement
h iteration variable
µh. P tail iteration
x
a
:= E.P assignment to x X
h j,γi. P output of action γ to actuator j
cording, among other attributes, to a proximity-based
notion). The input prefix (E
1
,···,E
j
;x
j+1
,···,x
r
)
X
re-
ceives a r-tuple, provided that its first j elements
match the corresponding input ones, and then as-
signs the variables (after “;”) to the received val-
ues. Otherwise, the r-tuple is not accepted. As
in (Bodei et al., 2015), each input in the syntax of
processes P has a tag X X, which is exploited
to support the analysis and does not affect the dy-
namic semantics. A process repeats its behaviour,
when defined through the tail iteration construct
µh.P (h is the iteration variable), intuitively rendered
with [...] in the motivating example. The process
decrypt E as {E
1
,··· ,E
j
; x
j+1
,··· ,x
r
}
k
0
in P tries to
decrypt the result of the expression E with the shared
key k
0
K. If the pattern matching succeeds, the pro-
cess continues as P and the variables x
j+1
,...,x
r
are
suitably assigned.
A sensor can perform an internal action τ or put the
value v, gathered from the environment, into its store
location i. An actuator can perform an internal ac-
tion τ or execute one of its actions γ, received from its
controlling process. Sensors and actuators can iterate.
The semantics is based on a standard structural
congruence and a two-level reduction relation de-
fined as the least relation on nodes and its compo-
nents, where we assume the standard denotational
interpretation [[E]]
Σ
for evaluating terms. As exam-
ples of semantic rules, we show the rules (Ev-out)
and (Multi-com) in Table 3, that drive asynchronous
IOT-LYSA multi-communications. In the first rule,
to send a message hhv
1
,...,v
r
ii obtained by the eval-
uation of hhE
1
,...,E
r
ii, a node with label spawns a
new process, running in parallel with the continua-
tion P; this new process offers the evaluated tuple to
all the receivers with labels in L. In the second rule,
the message coming from
1
is received by a node la-
belled
2
, provided that: (i)
2
belongs to the set L of
possible receivers, (ii) the two nodes satisfy a com-
patibility predicate Comp (e.g. when they are in the
same transmission range), and (iii) that the first j val-
ues match with the evaluations of the first j terms in
the input. Moreover, the label
2
is removed by the
set of receivers L of the tuple. The spawned process
terminates when all the receivers have received the
message (L =
/
0).
4 CONTROL FLOW ANALYSIS
Here we present a CFA for approximating the ab-
stract behaviour of a system of nodes and for tracking
the trajectories of data. This CFA follows the same
schema of the one in (Bodei et al., 2016b; Bodei et al.,
2016a) and in particular of the one in (Bodei and Gal-
letta, 2017) for IOT-LYSA. However, here we use
different abstract values. Intuitively, abstract values
“symbolically” represent runtime data so as to encode
where these data have been introduced. Finally, we
show how to use the CFA results to check which are
the possible trajectories of these data.
Abstract values correspond to concrete values for
sensors, data, functions, and encryptions, and also
record the annotations. Since the dynamic seman-
Tracking Data Trajectories in IoT
575
Table 3: Communication semantic rules.
(Ev-out)
V
r
i=1
v
i
= [[E
i
]]
Σ
Σ k hhE
1
,·· · , E
r
ii L. P k B Σ k hhv
1
,·· · , v
r
ii L.0 k P k B
(Multi-com)
2
L Comp(
1
,
2
)
V
j
i=1
v
i
= [[E
i
]]
Σ
2
1
: [hhv
1
,·· · , v
r
ii L. 0 k B
1
] |
2
: [Σ
2
k (E
1
,·· · , E
j
;x
a
j+1
j+1
,·· · , x
a
r
r
)
X
.Q k B
2
]
1
: [hhv
1
,·· · , v
r
ii L \ {
2
}.0 k B
1
] |
2
: [Σ
2
{v
j+1
/x
j+1
,·· · , v
r
/x
r
} k Q k B
2
]
tics may introduce encrypted terms with an arbitrarily
nesting level, we have the special abstract values >
a
that denote all the terms with a depth greater than a
given threshold d. During the analysis, to cut these
values, we will use the function b−c
d
, defined as ex-
pected. Formally, abstract values are defined as fol-
lows, where a A.
ˆ
V 3 ˆv ::= abstract terms
(>,a) value denoting cut
(v,a) value for clear data
( f (ˆv
1
,··· , ˆv
n
),a) value for aggregated data
({ ˆv
1
,··· , ˆv
n
}
k
0
,a) value for encrypted data
For simplicity, hereafter we write them as
>
a
,ν
a
,{ ˆv
1
,··· , ˆv
n
}
a
k
0
, and indicate with
i
the
projection function on the i
th
component of the
pair. We naturally extend the projection to sets, i.e.
ˆ
V
i
= { ˆv
i
| ˆv
ˆ
V }, where
ˆ
V
ˆ
V . In the abstract
value v
a
, v abstracts the concrete value from sensors
or computed by a function in the concrete seman-
tics, while the first value of the pair {ˆv
1
,··· , ˆv
n
}
a
k
0
abstracts encrypted data. The second component
records the annotation associated to the correspond-
ing term. Note that once given the set of encryption
functions occurring in a node N, the abstract values
are finitely many.
To extract all the annotations of an abstract value,
included the ones possibly nested in it, we use the fol-
lowing auxiliary function.
Definition 4.1. Give an abstract value ˆv
ˆ
V , we de-
fine the set of labels A( ˆv) as follows.
A(>,a) = A(v,a) = {a}
A( f ( ˆv
1
,··· , ˆv
n
),a) = {a}
S
n
i=1
A( ˆv
i
)
A({ ˆv
1
,··· , ˆv
n
}
k
0
,a) = {a}
S
n
i=1
A( ˆv
i
)
Trajectories. We now introduce the notion of data
trajectories, composed by micro-trajectories repre-
senting single communication hops.
Definition 4.2. Given a set of labels L, a set of in-
put tags X, we define a micro-trajectory µ as a pair
((,
0
),X) (L × L) × X. A trajectory τ is a list
of micro-trajectories [µ
1
,...,µ
n
], such that µ
i
,µ
i+1
with µ
i
= ((
i
,
0
i
),X
i
) and µ
i+1
= ((
i+1
,
0
i+1
),X
i+1
),
0
i
=
i+1
.
In our analysis, trajectories can be obtained, start-
ing from a set of micro-trajectories and by suitably
composing them in order. Trajectories can be com-
posed if the head of the second trajectory is equal to
tail of the first. In this case the two trajectories can
be merged. Technically, we use a closure of a set of
micro-trajectories, the inductive definition of which
follows.
Definition 4.3.
((,
0
),X M. [((,
0
),X)] Clos
X
(M);
[L,((,
0
),X)], [((
0
,
00
),X
0
),L
00
] M.
[L,((,
0
),X),(
0
,
00
),X
0
),L
00
] Clos
X
(M).
We assume that designers provide the analysis
with a classification of the “dangerous” nodes and
links or of bad flows.
CFA Validation and Correctness. We now have
all the ingredients to define our CFA to approxi-
mate communications and data stored and exchanged
and, in particular, the micro-trajectories. We spec-
ify our analysis in a logical form through a set of
inference rules expressing the validity of the analy-
sis results. The analysis result or estimate is a tuple
(
b
Σ,κ,Θ,T,ρ) (a pair (
b
Σ,Θ) when analysing a term),
where
b
Σ, κ,Θ,T , ρ are the following abstract do-
mains:
the union
b
Σ =
S
L
ˆ
Σ
of the sets
ˆ
Σ
: X I
2
b
V
of abstract values that may possibly be associ-
ated to a given location in I
or a given variable in
X ,
a set κ : L L ×
S
k
i=1
b
V
i
of the messages that
may be received by the node , and
a set Θ : L A 2
b
V
of the information of the
actual values computed by each labelled term M
a
in a given node , at run time.
a set ρ : X L ×
S
k
i=1
b
V
i
is the sets of output
tuples that may be accepted by the input variables
X.
a set T = A (L × L) × T of possible micro-
trajectories related to the abstract values.
ICISSP 2019 - 5th International Conference on Information Systems Security and Privacy
576
The component T is new, and also the combined use
of these five components is new and allows us to po-
tentially integrate the present CFA with the previous
analyses of IOT-LYSA.
An available estimate has to be validated correct.
This requires that it satisfies the judgements defined
according to the syntax of nodes, node components
and terms. They are defined by a set of clauses. Here,
we just show some examples. The judgements for la-
belled terms have the form (
b
Σ,Θ) |=
M
a
. For each
term M
a
occurring in the node , the corresponding
judgement requires that Θ()(a) includes all the ab-
stract values ˆv associated to M
a
, e.g. if the term is
x
a
, Θ()(a) includes the abstract values bound to x
collected in
b
Σ
. The judgements for nodes have the
form (
b
Σ,κ,Θ,T,ρ) |= N. The rule for a single node
: [B] requires that B is analysed with judgements
(
b
Σ,κ,Θ,T,ρ) |=
B. As examples of clauses, we con-
sider the clauses for communication in Table 4.
An estimate is valid for multi-output, if it is valid
for the continuation of P and the set of messages com-
municated by the node to each node
0
in L, in-
cludes all the messages obtained by the evaluation
of the r-tuple hhM
a
1
1
,··· ,M
a
r
r
ii. More precisely, the
rule (i) finds the sets Θ()(a
i
) for each term M
a
i
i
, and
(ii) for all tuples of values ( ˆv
1
,··· , ˆv
r
) in Θ()(a
1
) ×
··· × Θ()(a
r
) it checks whether they belong to κ(
0
)
for each
0
L. Symmetrically, the rule for input re-
quires that the values inside messages that can be sent
to the node , passing the pattern matching, are in-
cluded in the estimates of the variables x
j+1
,··· ,x
r
.
More in detail, the rule analyses each term M
a
i
i
, and
requires that for any message that the node with la-
bel can receive, i.e. (
0
,hhˆv
1
,··· , ˆv
j
, ˆv
j+1
,..., ˆv
r
ii)
in κ(), provided that the two nodes can communi-
cate (i.e. Comp(
0
,)), the abstract values ˆv
j+1
,..., ˆv
r
are included in the estimates of x
j+1
,··· ,x
r
. Fur-
thermore, the micro-trajectory ((,
0
),X) is recorded
in the T component for each annotation related (via
A) to the abstract value ˆv
i
, to record that the ab-
stract value ˆv
i
coming from the node can reach the
node labelled
0
, in the input with tag X, e.g. if ˆv
i
=
( f ((v
i1
,a
i1
),(v
i2
,a
i2
)),a
i
), then the micro-trajectory
is recorded in T (a
i
), T (a
i1
) and T (a
i2
). Finally, the ρ
component records the sets of output tuples that can
be bound in the input with tag X.
Example 4.4. In our running example, every valid
estimate (
b
Σ,κ,Θ,ρ,T ) must include at least the
following entries, assuming d = 4.
Θ(
bs
i j
)(a
lij
) {i
a
lij
}
ˆ
Σ
bs
i j
(z
i
) {i
a
i j
}
ρ(X
ji
) {(
ch
j
,hhi
b
i j
, f
1
(1
a
1i j
,...,k
a
ki j
)
f
1i j
,...ii}
κ(
ch
j
) {(
ch
j
,hhi
b
i j
, f
1
(1
a
1i j
,...,k
a
ki j
)
f
1i j
,...ii}
ˆ
Σ
ch
j
(x
w
i
) { f
i
(i
a
lij
)}
κ(
as
) {(
ch
j
,hh j
c
j
,aggr
1
(
f
1 j
)
g
1 j
,...,aggr
r
(
f
s j
)
g
r j
ii}
ˆ
Σ
ch
j
(w
jt
) {aggr
1
(
f
i j
)
g
1 j
}
T (a
lij
) 3 ((
bs
i j
,
ch
j
),X
ji
),((
ch
j
,
as
),Y
j
)
Indeed, an estimate must satisfy the checks of the
CFA rules. The validation of the system requires, in
particular, that i
a
li j
is in
ˆ
Σ
bs
i j
(z) for the rule for vari-
ables, while for the rule for output, the inclusion in
κ(
ch
j
), and so on.
Our analysis respects the operational semantics of
IOT-LYSA, as witnessed by the following subject re-
duction result. It is also possible to prove the ex-
istence of a (minimal) estimate, as in (Bodei et al.,
2016b). The proofs follow the usual schema and ben-
efit from an instrumented denotational semantics for
expressions, the values of which are pairs hv, ˆvi, where
v is a concrete value and ˆv is the corresponding ab-
stract value. The store (Σ
i
with an undefined value)
is accordingly extended. The semantics uses the pro-
jection on the first component.
The following subject reduction theorem estab-
lishes the correctness of our CFA, by relying on the
agreement relation between the concrete and the
abstract stores. Its definition is immediate, since
the analysis only considers the second component of
the extended store, i.e. the abstract one: Σ
i
b
Σ
iff
w X I
such that Σ
i
(w) 6= implies (Σ
i
(w))
2
b
Σ
(w).
Theorem 4.5 (Subject Reduction). If
(
b
Σ,κ,Θ,T,ρ) |= N and N N
0
and Σ
i
in N it
is Σ
i
b
Σ
, then (
b
Σ,κ,Θ,T,ρ) |= N
0
and Σ
i
0
in N
0
it
is Σ
i
0
b
Σ
.
Checking Trajectories. We now show that by in-
specting the results of our CFA, we detect all the pos-
sible micro-trajectories of the data produced in the
system of nodes that, put together, provide the overall
trajectories.
The following corollary shows that we do track the
trajectories of IoT data. The first item guarantees that
κ and ρ predict all the possible inter-node communi-
cations, while the second item shows that our analy-
sis records the micro-trajectory in the T component of
each abstract value possibly involved in the commu-
nication.
Corollary 4.6. Let N
hhv
1
,...,v
r
ii
1
,
2
,X
N
0
denote a
reduction in which the message sent by node
1
is
received by node
2
with an input tagged X. If
Tracking Data Trajectories in IoT
577
Table 4: Communication CFA rules.
V
k
i=1
(
b
Σ,Θ) |=
M
a
i
i
(
b
Σ,κ,Θ, T,ρ) |=
P
ˆv
1
,·· · , ˆv
r
:
V
r
i=1
ˆv
i
Θ()(a
i
)
0
L : (,hh ˆv
1
,·· · , ˆv
r
ii) κ(
0
)
(
b
Σ,κ,Θ, T,ρ) |=
hhM
a
1
1
,·· · , M
a
r
r
ii L. P
V
j
i=1
(
b
Σ,Θ) |=
M
a
i
i
(
0
,hh ˆv
1
,·· · , ˆv
r
ii) κ() : Comp(
0
,)
(
V
r
i= j+1
ˆv
i
ˆ
Σ
(x
i
)
(
0
,hh ˆv
1
,·· · , ˆv
r
ii) ρ(X) a A( ˆv
i
).((,
0
),X) T (a)
(
b
Σ,κ,Θ, T,ρ) |=
P)
(
b
Σ,κ,Θ, T,ρ) |=
(M
a
1
1
,·· · , M
a
j
j
; x
a
j+1
j+1
,·· · , x
a
r
r
)
X
.P
(
b
Σ,κ,Θ,T,ρ) |= N and N
hhv
1
,...,v
r
ii
1
,
2
N
0
then it
holds:
(
1
,hhˆv
1
,..., ˆv
r
ii) κ(
2
) (
1
,hhˆv
1
,··· , ˆv
r
ii)
ρ(X), where ˆv
i
= v
i
2
.
((
1
,
2
),X) T (a), for all a A( ˆv
i
), for all i
[ j + 1,r].
Given a term E annotated by a, the over-
approximation of its possible trajectories is obtained
by computing the trajectory closure of the set com-
posed by all the pairs ((,
0
),X) in T (a).
Tra jectories(E
a
) = Clos
X
(T (a))
Example 4.7. Back to our example, we can now de-
termine the possible trajectories of data, e.g. the ones
of the term annotated with a
li j
. By applying the defi-
nition of closure Clos
X
(i
a
li j
) to the entries in T (a
li j
),
we can easily obtain that Tra jectories(i
a
i j
) includes
[((
bs
i j
,
ch
j
),X
ji
),((
ch
j
,
as
),Y
j
)]). This allows us to
check which are the nodes the data may pass from,
in this case N
bs
i j
and N
ch
j
, and which are the corre-
sponding inputs, here X
ji
and Y
j
. This communica-
tion pattern is admittedly simple to illustrate our ap-
proach. It is easy to verify that the above CFA results
reflect the dynamic behaviour.
Now, given a classification of the “dangerous”
nodes and links, we can analyse the trajectories of
each piece of data of the analysed system. We can
therefore inspect the paths possibly followed by sen-
sible data and also be suspicious about data produced
or passed by unreliable nodes. We can also detect
possible illegal or bad flows from one point to another
based on security levels. This is particularly crucial in
a setting where encryption and other security mech-
anisms can be costly and power consuming. More
in general, our analysis enables traceability of data.
For every exchanged message hhv
1
,...,v
r
ii, the CFA
keeps track of the path of each of its components v
i
and, in turn, for each v
i
it keeps recursively track of
the path of the data used to compose it.
5 CONCLUSIONS
We proposed a CFA, based on IOT-LYSA, for track-
ing the propagation of data and for identifying their
possible trajectories, as illustrated by a motivating ex-
ample that offers a simple but non-trivial application
of our methodology.
The analysis lends itself for many investigations.
On the one hand, it can be used to evaluate the quality
of the data managed by the considered system, both in
the small and in the large. We can answer questions
such as how secure are certain data crucial for critical
decisions, or if the provenience of the data processed
in a particular node offers sufficient security guaran-
tees. We can also check whether a system respects
policies on information flows among nodes.
On the other hand, the collection of possible tra-
jectories of data allows us to discover patterns in gen-
eral movements of data. We could in fact determine
which data move together or in a similar way, thus
observing possible emerging patterns. Furthermore,
we can find which are the paths or segments of paths
that are more used, and therefore may need special
attention and suitable security mechanisms.
CFA results on the possible paths followed by data
can also be exploited in an early phase of system de-
sign, as a supporting technique. Designers can be
helped in understanding the potential vulnerabilities
related to the presence of dangerous nodes and in de-
termining in time possible modifications and validity
checks.
In future, we would like to understand if it is pos-
sible to ensure that the nodes continue to behave in
a reasonable way even in the presence of not com-
pletely reliable data, by linking our approach to that
used in (Nielson et al., 2013; Nielson et al., 2015).
There, the authors use the Quality Calculus, a process
calculus for programming software components with
a sort of backup plan in case the ideal behaviour fails
due to unreliable communication or data.
Our present analysis would also be integrated with
the taint CFA in (Bodei and Galletta, 2017), where
ICISSP 2019 - 5th International Conference on Information Systems Security and Privacy
578
data are marked as tainted when sensitive, and as tam-
perable when coming from places where they can be
tampered.
REFERENCES
Abu-Elkheir, M., Hayajneh, M., and Ali, N. A. (2013). Data
management for the Internet of Things: Design prim-
itives and solution. Sensors, 13(11):15582–15612.
Bodei, C., Brodo, L., and Focardi, R. (2015). Static ev-
idences for attack reconstruction. In Programming
Languages with Applications to Biology and Security,
LNCS 9465, pages 162–182. Springer.
Bodei, C., Buchholtz, M., Degano, P., Nielson, F., and Niel-
son, H. R. (2005). Static validation of security proto-
cols. Journal of Computer Security, 13(3):347–390.
Bodei, C., Degano, P., Ferrari, G.-L., and Galletta, L.
(2016a). A step towards checking security in IoT. In
Procs. of ICE 2016, EPTCS 223, pages 128–142.
Bodei, C., Degano, P., Ferrari, G.-L., and Galletta, L.
(2016b). Where do your IoT ingredients come from?
In Procs. of Coordination 2016, LNCS 9686, pages
35–50. Springer.
Bodei, C., Degano, P., Ferrari, G. L., and Galletta, L.
(2017). Tracing where IoT data are collected and
aggregated. Logical Methods in Computer Science,
13(3).
Bodei, C., Degano, P., Ferrari, G.-L., and Galletta, L.
(2018). Sustainable precision agriculture from a pro-
cess algebraic perspective: A smart vineyard. Atti Soc.
Toscana di Sci. Nat., Memorie Serie B, 125:39–43.
Bodei, C. and Galletta, L. (2017). Tracking sensitive and
untrustworthy data in IoT. In Procs. of ITASEC 2017,
CEUR 1816, pages 38–52.
Gao, H., Bodei, C., and Degano, P. (2008). A formal analy-
sis of complex type flaw attacks on security protocols.
In Proc. of AMAST’08, LNCS 5140, pages 167–183.
Springer.
Gao, H., Bodei, C., Degano, P., and Nielson, H. (2007). A
formal analysis for capturing replay attacks in crypto-
graphic protocols. In Proc. of ASIAN’07, LNCS 4846,
pages 150–165. Springer.
Herlihy, M. (1991). Wait-free synchronization. ACM Trans.
Program. Lang. Syst., 13(1).
Lanese, I., Bedogni, L., and Felice, M. D. (2013). Internet
of Things: a process calculus approach. In Procs of
SAC ’13, pages 1339–1346. ACM.
Lanotte, R. and Merro, M. (2016). A semantic theory of the
Internet of Things. In Procs. of Coordination 2016,
LNCS 9886, pages 157–174. Springer.
Lanotte, R. and Merro, M. (2018). A semantic theory of the
Internet of Things. Inf. Comput., 259(1):72–101.
Nielson, H. R., Nielson, F., and Vigo, R. (2013). A calculus
for quality. In Proc. of FACS 2012, LNCS 7684, pages
188–204. Springer.
Nielson, H. R., Nielson, F., and Vigo, R. (2015). A calculus
of quality for robustness against unreliable communi-
cation. J. Log. Algebr. Meth. Program., 84(5):611–
639.
Tracking Data Trajectories in IoT
579