Tracking Dependent Information Flows

Zeineb Zhioua

, Yves Roudier

, Rabea Boulifa Ameur

, Takoua Kechiche

and Stuart Short

SAP Labs France/EURECOM, Mougins, France

I3S-CNRS, Universite de Nice Sophia Antipolis, Biot, France

Telecom ParisTech, Biot, France

SAP Labs France, Mougins, France

zeineb.zhioua@sap.com, yves.roudier@i3s.unice.fr, rabea.ameur-boulifa@telecom-paristech.fr, takoua.kechiche@sap.com,

stuart.short@sap.com

Keywords:

Security Guidelines, Formal Speciﬁcation, Model Checking, Information Flow Analysis, Program Depen-

dence Graph, Labeled Transition System.

Abstract:

Ensuring the compliance of developed software with security requirements is a challenging task due to im-

precision on the security guidelines deﬁnition, and to the lack of automatic and formal means to lead this

veriﬁcation. In this paper, we present our approach that aims at integrating the formal speciﬁcation and veri-

ﬁcation of security guidelines in early stages of the development life cycle by combining the model checking

together with information ﬂow analysis. We formally specify security guidelines that involve dependent infor-

mation ﬂows as a basis to lead formal veriﬁcation through model checking, and provide precise feedback to

the developer.

1 INTRODUCTION

Security guidelines are mainly meant to specify bad

as well as good programming practices that can pro-

vide guidance and support to the developer in ensur-

ing the quality of his developed software with respect

to security, and consequently, to reduce the program

exposure to vulnerabilities when delivered and run-

ning in the execution environment. Security guide-

lines are deﬁned by the security expert(s) in the Re-

quirements phase of the development lifecycle, and

include essential elements (Chen, 2011) such as the

object or the asset to be protected (user private in-

formation for example), the goal (required security

property), and the security mechanisms to be applied

in order to ensure the requirement satisﬁability. Se-

curity guidelines have also been deﬁned by different

organizations such as CERT Coding Standard (CERT,

b) and OWASP (OWASP, c) (OWASP, b). They intro-

duce good programming practices to be followed by

developers to ensure the security of sensitive assets.

However, guidelines suffer from ambiguities and the

lack of precision (Zhioua et al., 2016), and are usually

presented in an informal and imprecise way. Huge ef-

fort was carried out to build the guidelines catalogs,

but to provide the means allowing the automatic ver-

iﬁcation of the adherence to those guidelines that re-

quire security expertise to interpret, implement and

verify them.

We propose in this paper our approach that pro-

vides the means to formally specify the security

guidelines, and to verify their satisﬁability using for-

mal proofs.

The paper is organized as follows; Section 2 pro-

vides the motivation behind this work. In Section 3,

we depict our approach and present its main phases.

Section 4 illustrates the enhancements we carried out

on the Program Dependence Graph construction. In

Section 5, we formalize a security guideline in MCL

Formalism (Model Checking Language), and we val-

idate our formal speciﬁcation and veriﬁcation on a

concrete example. Section 6 discusses some limita-

tions of our approach, followed by a discussion on

existing approaches that dealt with guidelines speciﬁ-

cation. Section 8 concludes the paper.

2 MOTIVATION

2.1 Information Flow Analysis

Different security mechanisms, such as access con-

trol and encryption allow to protect sensitive data, but

Zhioua, Z., Roudier, Y., Boulifa, R., Kechiche, T. and Short, S.

Tracking Dependent Information Flows.

DOI: 10.5220/0006209301790189

In Proceedings of the 3rd International Conference on Information Systems Security and Privacy (ICISSP 2017), pages 179-189

ISBN: 978-989-758-209-7

179

they fall short in providing assurance about where and

how the data will propagate, where it will be stored,

or where it will be sent or processed. This entails

the need for controlling information ﬂow using static

code analysis. This same idea is emphasized by An-

drei Sabelfeld, and Andrew C. Myers (Sabelfeld and

Sands, 2009), who deem necessary to analyze how

the information ﬂows through the program. The main

objective of information ﬂow analysis (Denning and

Denning, 1977) is to verify that the program satisﬁes

data conﬁdentiality and integrity policies.

2.2 Security Guidelines

The OWASP Foundation (OWASP, a) introduces a

set of guidelines and rules to be followed in order

to protect data at rest. However, the guidelines are

presented in an informal style, and their interpreta-

tion and implementation require security expertise,

as stressed in (Zhioua et al., 2016). In the OWASP

Storage Cheat Sheet (OWASP, b), OWASP introduces

the guideline ”Store unencrypted keys away from the

encrypted data”

explaining the encountered risks

when the encryption key is stored in the same loca-

tion as encrypted data. This guideline recommends

to store encryption key and the encrypted data in dif-

ferent locations. As the reader can see, the guide-

line involves 2 information ﬂows (encryption key and

encrypted data) that are dependent. Their speciﬁca-

tion and identiﬁcation on the code level require an ad-

vanced information ﬂow analysis capable of handling

complex information ﬂows, such is the case for this

guideline. OWASP provides a set of security guide-

lines that should be met by developers, but does not

provide the means to ensure their correct implementa-

tion. We aim at covering this gap through the formal

speciﬁcation of security guidelines and their formal

veriﬁcation using formal proofs.

2.3 Sample Code

Let’s analyze the sample code in Figure 1 to ver-

ify whether the guideline ”Store unencrypted keys

away from the encrypted data” is met or not. The

developer Bob encrypts the secret data credit card

number, and stores the cipher text into a ﬁle. At

line 115, Bob creates a byte array y used as param-

eter for the instantiation of a SecretKeySpec named

k (line 116). At line 119, Bob stores key k in a

ﬁle, through the invocation of method save

to ﬁle

https://www.owasp.org/index.php/Cryptographic

Storage Cheat Sheet#Rule Store unencrypted

keys away from the encrypted data

(Figure 2). Once created, key k is provided as pa-

rameter to the method save to ﬁle(String data, String

ﬁle) (Figure 2) Bob then encrypts the secret vari-

able creditCardNumber using method private static

byte[] encrypt(Key k, String text) which uses key k

as parameter. The encrypted data is then stored us-

ing method save

to ﬁle(String data, String ﬁle) (Fig-

ure 2). As the reader can see, the data key k and

encrypted

cc are stored respectively in ﬁle keys.txt

and encrypted cards.txt. One may conclude that the

guideline is met, as key k and encypted

cc are stored

in separate ﬁles. However, the two ﬁles are located in

the same ﬁle system, which constitutes a violation of

the guideline. The speciﬁcation and identiﬁcation of

the dependent information ﬂows between key k, en-

crypted

cc and ﬁle location is a challenging task that

we could achieve through the framework we propose

in this paper.

Figure 1: Sample code.

Figure 2: save to ﬁle method.

3 APPROACH

We propose our framework that enables the formal-

ization of security guidelines and the exploitation of

this speciﬁcation in the implementation and Veriﬁca-

tion&Validation phases of the engineering process.

Our framework is based on formal proofs for the

translation of security requirements into good pro-

gramming practices. We focus mainly on verifying

whether those good programmingpractices are met or

not. Figure 3 illustrates our approach and highlights

the relevant phases for the transformation of security

ICISSP 2017 - 3rd International Conference on Information Systems Security and Privacy

180

guidelines from natural language into exploitable for-

mulas that can be automatically veriﬁed over the pro-

gram to analyze.

We aim at separating the duties and make the dis-

tinction between the main stakeholders in our frame-

work; the security expert(s) and the developer.

In a preliminary phase, the security expert(s) car-

ries out the translation of security guidelinesfrom nat-

ural language into a formal way (Section 3.1). This

phase results in generic security guidelines that can

be instantiated on different programs logics.

The developer writes the code and invokes the

framework that veriﬁes whether the rules speciﬁed by

the security expert are correctly applied in his soft-

ware. First, the framework constructs the program

model (Section 3.2), that is used as a basis to lead the

formal veriﬁcation of the security guidelines (Section

3.3). We stress that the developer does not have to

deal with the speciﬁed formulas.

We aim at reducing the intervention of the secu-

rity expert for the identiﬁcation on the code of the

critical data that are at the heart of the security guide-

lines, such as encryption key in our example. How-

ever, in some other cases, the identiﬁcation of sensi-

tive data requires knowledge and awareness about the

code logic and semantics.

One crucial step of the work is the explicit

mapping between abstract security guidelines formal

speciﬁcation, and concrete statements on the code.

This is handled in the Security Knowledge Base (Sec-

tion 3.4). In our framework, we do not aim at prov-

ing the program correct, but to verify that it adheres

to speciﬁc security guidelines written, formulated and

formalized by security expert(s).

The proposed framework is depicted in Figure 3:

3.1 Step 1: Formal Speciﬁcation of

Security Guidelines

We make the strong assumption that the security ex-

pert formally speciﬁes the security guidelines by ex-

tracting the key elements (the labels), and builds upon

them the formulas/patterns based on formalism. The

built formulas can be supported by standard model

checking tools. One crucial operation in this phase

is the speciﬁcation of simple as well as dependent in-

formation ﬂows, such is the case for the guideline we

consider in this paper. The outcome of this phase is

generic security guidelines that can be instantiated on

different codes. For instance, the action label save,

which deﬁnes the operation of saving a given data in

a speciﬁc location, can be instantiated on save

to DB,

save

to ﬁle, save to array, etc, depending on the in-

voked instruction and its parameters. This instantia-

tion operation is also handled in our Security Knowl-

edge Base (Section 3.4).

3.2 Step 2: Construction of Augmented

Program Dependence Graph

We aim at constructing a data structure enabling the

representation and the extraction of multiple informa-

tion ﬂows at the same time. The outcome of this step

is a Program Dependence Graph (PDG) augmented

with information and details obtained from deep de-

pendencyanalysis on the program. The standard PDG

contains both control and data dependencies between

program instructions, and has the ability to represent

information ﬂows in the program.

We make use of the JOANA IFC tool (Graf et al.,

2013) (Graf et al., 2015) to construct the standard

PDG. The generated PDG is then augmented with de-

tails and information extracted from the veriﬁcation

of security guidelines formulas and patterns.

In our framework, we carry out the information

ﬂow analysis using the JOANA tool (Graf et al.,

2013) to capture the explicit as well as the implicit

dependencies that can be source of covert channels,

and may constitute source of sensitive information

leakage. JOANA analyzes java byte code for the

non-interference property, which demands that pub-

lic events should not be inﬂuenced by secret data.

The analysis performed was formally proven using Is-

abelle (Wasserrab et al., 2009).

Information Flow Analysis aims at capturing the

different dependencies that may occur between the

different PDG nodes, hence, augmenting the gener-

ated standard PDG with relevant details, such as an-

notations mapping the PDG nodes to abstract labels of

the security guidelines. Possible mappings between

Java APIs and abstract labels are handled in the Secu-

rity Knowledge Base (Section 3.4).

3.3 Step 3: Formal Veriﬁcation

This step aims at constructing from the augmented

PDG a formal graph that is accepted by model

checking tools: Security Labeled Transition System

Sec LTS, which is an augmented LTS (Labeled Tran-

sition System) accepted by a model checking tool.

This step is depicted in details in Section 5.2. As pre-

viously mentioned, security guidelines will be mod-

eled in the form of sequence of atomic propositions

or statements representing the behavior of the system.

The security guidelines will then be veriﬁed over the

Security

LTS program representation model through

model checking.

Tracking Dependent Information Flows

181

Figure 3: End-to-end Approach.

The veriﬁcation phase can have the following out-

comes:

• The security guideline is valid over all the feasible

paths.

• The security guideline is violated, and the viola-

tion traces are returned.

The ﬁrst case can be advanced further, meaning

that the veriﬁcation can provide more details to the

developer (or the tester) about circumstances under

which the security guideline is valid. In the second

case, recommendations to make the necessary correc-

tions on the program can then be proposed.

3.4 Security Knowledge Base

Security Knowledge Base is a centralized repository

gathering the labels of the formulas mapped to APIs,

instructions, libraries or programs. This will help the

automatic detection of labels on the system model.

We designed the Security Knowledge Base in a way

allowing to represent the different relationships be-

tween Java methods and the abstract labels used to

compose the security guidelines formulas by the se-

curity expert. We populated the Security Knowledge

Base using a Java classes parser that we developed

;

for the different Java classes used in the program to

analyze, we launch programmatically the parsing of

https://github.com/zeineb/Java-classes-parser

this given class (html code, javadoc), and we extract

all the relevant details, such as the description, the at-

tributes, the constructors, the methods signatures and

their parameters. Then is performed a semi-automatic

semantic analysis to detect key elements from the

Java methods details (return type, method description,

etc.), such as the key-word secure, key, print, input,

etc. Then, the security expert establishes the map-

ping of those key words used to build the formulas to

the possible Java language instructions. For example,

the method (the constructor) SecretKeySpec(byte[]

key, String algorithm) that constructs a secret key

from a byte array, will be mapped to the abstract la-

bel ”create

key”; this label is described in our Secu-

rity Knowledge Base as ”encryption key”, a sensitive

information that should be kept secret. The Security

Knowledge Base can also be perceived as an exten-

sible dictionary gathering the labels with respect to

their semantics and to the concepts they represent. For

instance, the label ”create

key” can also be mapped

to the method invocation generateKey() of the Java

class KeyGenerator that allows to generate a secret

key. In our Security Knowledge Base, we offer a wide

range of labels that can be used to build the security

guidelines formulas. The set of labels can be extended

by the security expert if new security concepts are in-

troduced.

ICISSP 2017 - 3rd International Conference on Information Systems Security and Privacy

182

4 AUGMENTED PDG

The starting key element for this step is the stan-

dard PDG generated by the JOANA tool (Graf et al.,

2015) from the program bytecode. In this PDG, con-

trol and (explicit/implicit) data dependencies are cap-

tured, which constitutes a strong basis to perform a

precise analysis. However, we tested JOANA on dif-

ferent sample codes presenting implicit violations, but

JOANA failed to capture some of them. For instance,

We noticed that there are implicit dependencies that

are not captured (such as Java Reﬂection dependen-

cies), and they constitute the source of the undetected

violations, such is the case for storage location we

consider in this paper. We carried out the effort of

enhancing the PDG and capturing the missing depen-

dencies that we translate into edges on the PDG.

4.1 Automatic Annotations

The JOANA tool proposes two kinds of annota-

tions together with their security levels specifying the

source (SOURCE) and the target (SINK) of the infor-

mation ﬂow, in addition to the DECLASS annotation

allowing to reduce the security level of the annotated

node. We made modiﬁcations on the source code of

JOANA tool, and added customized annotations re-

ferring to the abstract labels such as hash, userInput,

isPassword, encrypt, save, etc. in addition to the pre-

deﬁned annotations SOURCE and SINK.

As a second step, the automatic detection of the

labels on the PDG is performed, and here we re-

fer to the Security Knowledge Base (Section 3.4)

that already contains the concrete possible mappings

between known APIs, methods, methods parameters

mapped to the abstract labels of the security guide-

lines speciﬁcation. For instance, the SecretKeySpec

object k is instantiated at line 116. The constructor

SecretKeySpec(byte[],String) stands out in our Secu-

rity Knowledge Base as method invocation mapped

to the abstract label create

key. Hence, the variable

(line 116) is annotated create key. In the sample code

of Figure 1, the methods encrypt and save to ﬁle are

implemented by the developer. Hence, the automatic

annotations on those methods will fail, as this oper-

ation requires an advanced semantic knowledge base

and a semantic analysis to be performed over the code

in order to determine the method names matching en-

cryption operation or storage. The semantic analy-

sis is not in the scope of this paper. We can also be

faced with the case where the methods are declared

with insigniﬁcant names, which makes the automatic

annotations unfeasible. In Figure 3, we provide the

method save

to ﬁle. As the reader can see, at line 148,

Figure 4: Augmented Program Dependence Graph for the

sample code. Strong edges represent the control ﬂows, the

dashed edges referto explicit and implicit data ﬂows. Nodes

are labeled with their corresponding instructions line num-

bers.

Bob invokes the method PrintWriter.print(String) that

is mapped in the Security Knowledge Base to the la-

bel save

toﬁle. The augmented PDG is represented in

Figure 4.

For a developer or a tester who is not aware about

the semantics of methods performing security oper-

ations, detecting possible sources (resp. sinks) and

their respective sinks (resp. sources) appears tedious.

4.2 Multiple Annotations on the Same

Node

JOANA tool offers the possibility to annotate a node

with one single kind of annotations: SOURCE, SINK

or DECLASS. We added the possibility of having mul-

tiple annotations on the same node; this will appear

to be useful in different cases where for example the

same data (the same node) is at the heart of more than

one guideline.

Tracking Dependent Information Flows

183

4.3 Annotation Propagation

We augmented the PDG using the propagation of an-

notations when already annotated data are copied or

concatenated. For example, if key k (annotated as

create

key) was assigned to another variable g, then

g will also be annotated as create key. This will en-

able to provide precise feedback on data propagation

to the developer, and to extend the analysis of guide-

lines on dependent data.

4.4 File Location Dependency Detection

In the guideline Store unencrypted keys away from

encrypted data, we encounter the imprecise and im-

plicit notion of location that can be expressed in

different manners such as ﬁle location, insert in

database, add to an array, etc. We worked to-

wards closing this gap through the arrangement of

labels into classes. For example, the set save con-

tains labels such as save to ﬁle, save to database,

save

to array, etc. We performed advanced infor-

mation ﬂow analysis with the objective of capturing

the implicit ﬁle location dependency; we captured

the parameters (ﬁle names) of the speciﬁc method

PrintWriter.print(String) invocations, and we com-

pared their values. This comparison indicated that

the two ﬁles are in the same ﬁle system. This results

in the creation of a new edge of kind DEPEND be-

tween the nodes matching the invocations of Print-

Writer.print(String). The new edge is represented as

red dashed edge on the augmented PDG in Figure 4.

5 FORMAL VERIFICATION

With the objective of proposing a framework that pro-

vides help and guidance to developer in verifying that

his program satisﬁes given security guidelines, we

translate java programs into a formal description (e.g.,

ﬁnite state machines, process algebra, etc.), which is

precise in meaning and amenable to formal analysis.

As our main objective is to automatically verify pro-

grams, we need to construct from the augmented PDG

a model that is accepted by a model checking tool,

and that can be veriﬁed automatically through model

checking techniques. Indeed, model checking is an

automatic technique for verifying behavioral proper-

ties of a system model by an exhaustive enumerating

of its states. In order to carry out this operation, we

need ﬁrst to express security guidelines in the suitable

formalism.

5.1 Formal Speciﬁcation of Security

Guidelines

For expressing the properties, we use MCL logic

(Mateescu and Thivolle, 2008). MCL (Model Check-

ing Language) is an extension of the alternation-free

regular µ-calculus with facilities for manipulating

data in a manner consistent with their usage in the

system deﬁnition. The MCL formula are logical

formula built over regular expressions using boolean

operators, modalities operators (necessity operator

denoted by [ ] and the possibility operator denoted by

h i) and maximal ﬁxed point operator (denoted by µ).

For instance, the guideline ”Store unencrypted

keys away from the encrypted data” will be encoded

directly by the following formula MCL:

[true*.{create_key ?key:String}.true*.

({save !key ?loc1:String}.true*.

{encrypt ?data:String !key}.true*.

{save !data ?loc2:String}.true*.

{depend !loc1 !loc2}

{encrypt ?data:String !key}.true*.

{save !key ?loc1:String}.true*.

{save !data ?loc2:String}.true*

.{depend !loc1 !loc2})] false

This formula presents ﬁve actions: the action

{create

key ?key:String} denoting encryption key key

(of type String) is created, the actions {save !key

?loc1:String}, { save !data ?loc2:String}, {encrypt

?data:String !key} denoting respectively the storage

of the corresponding key in location loc1, the stor-

age of the corresponding data in location loc2, the

encryption of data using key, and the particular ac-

tion true denoting any arbitrary action. Note that ac-

tions involving data variables are enclosed in braces

({ }). Another particular action that we make use of in

this formula is {depend !loc1 !loc2}, denoting the im-

plicit dependency between the ﬁle locations loc1 and

loc2; we captured this implicit dependency through

advanced information ﬂow analysis on the code.

This formula means that for all execution traces,

undesirable behavior never occurs (false). The unex-

pected behavior is expressed by this sequence of ac-

tions: if encryption key k is saved in loc1, and k is

used to encrypt data that is afterwords stored in loc2,

then if loc1 and loc2 are dependent, the guideline is

violated. The second undesirable behavior, expressed

in the second sequence of the formula, means that if

encryption of data using k occurs before the storage

of k in loc1, and if loc1 and loc2 are dependent, then

the guideline is violated.

ICISSP 2017 - 3rd International Conference on Information Systems Security and Privacy

184

5.2 From Program Dependence Graph

to Labeled Transition System

We focus in this section on the transformation of aug-

mented PDG into a formal model allowing to proceed

to the formal veriﬁcation using model checking tech-

niques.

As usual in the setting of distributed and concur-

rent applications, we give behavioral semantics of an-

alyzed programs in terms of a set of interacting ﬁnite

state machines, called LTS (Arnold, 1994).

Deﬁnition 1 (Labeled Transition System). A La-

beled Transition System (LTS for short) is a 4-tuple

hQ, q

, L, →i where

• Q is a ﬁnite set of states;

• q

∈ Q is the initial state;

• L is a countable set of labels;

• →⊆ Q× L× Q is the transition relation.

An LTS is a structure consisting of states with

transitions, labeled with actions between them. The

states model the program states; the transitions en-

code the actions that a program can perform at a given

state. We distinguish two types of actions: actions en-

coding sequential program (representing standard se-

quential instructions, including branching and assign-

ment) and a call to the method (local or remote), and

actions encoding the result of tracking of explicit and

implicit dependencies between variables within pro-

gram.

The LTS labels can mainly be of three types: ac-

tions, data and dependencies.

• Actions: they refer mainly to all program instruc-

tions, representing standard sequential instruc-

tions, including branching and method invoca-

tions.

• Value passing: as performed analysis involves

data, generated LTSs are parametrized, i.e, tran-

sitions are labeled by actions containing data val-

ues.

• Dependencies: in addition to program instruc-

tions, we added transitions that bring (implicit and

explicit) data dependencies between two state-

ments with the objective of tracking data ﬂows.

Indeed, transitions on LTS show the dependencies

between the variables in the code. We label this

kind of transition by depend var1 var2 where var1

and var2 are two dependent variables.

It is important to note that the augmented PDG

and the LTS are isomorphic, in a sense that both

graphs have the same number of edges and the same

number of nodes. LTS is similar to the augmented

PDG, except that the LTS labels are on edges (tran-

sitions) and not on nodes. The same actions (instruc-

tions) on the PDG nodes are translated into transitions

on the LTS, which is a faithful representation of the

captured dependencies translated into edges between

nodes. This property is of paramount importance due

to the capability it offers for faithfulness of the anal-

ysis support, and also to export the analysis results

(violating traces) on the PDG built from the source

code. Figure 5 represents the LTS corresponding to

the augmented PDG of Figure 4.

Figure 5: Labeled Transition System for the sample code

given in Figure 1.

Tracking Dependent Information Flows

185

5.3 Model Checking

We can carry out the model-checking analysis directly

on the LTS. For the sake of simplicity, hiding and

renaming are used to compute a minimized Labeled

Transition System, which is an ”operable” model (see

Figure 6). First, some irrelevant actions (for the an-

alyzed properties) are ”hidden”;they are replaced by

τ actions (denoted i in Figure 6). Second, we rename

the actions by their synonyms (entry points) in the Se-

curity Knowledge Base.

Figure 6: Minimized Labeled Transition System for the

sample code given in Figure 1.

We made use of the checker EVALUATOR of the

CADP toolsuite (Lang et al., 2002) to verify the prop-

erty we formalized in Section 5.1.

The veriﬁcation result is false, indicating that the

guideline is violated due to the implicit ﬁle location

dependency indicating that the two ﬁle locations are

in the same ﬁle system. In addition to a false, the

model checker produces a trace illustrating the viola-

tion from the initial state, as shown in Figure 7.

We are able to track all variables within our model,

in fact even if variable changes name in the code we

can from original guideline generate automatically an

appropriate formula to check the properties with the

new name. For instance, the property can be formal-

ized in MCL as follows:

[true*.{create_key ?key:String}.true*.

(({save !key ?loc1:String}.true*.

{encrypt ?data:String !key}.true*.

{save !data ?loc2:String}.true*.

{depend !loc1 !loc2}

|{depend !key ?key1:String}.{save !key1

?loc1:String}.true*.{encrypt ?data:String

!key1}.true*.{save !data ?loc2:String}.

true*.{depend !loc1 !loc2})

({encrypt ?data:String !key}.true*.

{save !key?loc1:String}.true*.

{save !data ?loc2:String}.true*.

{depend !loc1 !loc2}

| {depend !key ?key1:String}.

{encrypt ?data:String !key1}.

true*.{save !key1 ?loc1:String}.true*.

{save !data ?loc2:String}.true*.

{depend !loc1 !loc2}))] false

This MCL formula encodes the same guideline

”Store unencrypted keys away from the encrypted

data”. It allows to specify further dependencies and

their combinations; it checks also the case where key

k is renamed (by introducing the instruction String x

= k.toString() between line 116 and 118, and renam-

ing k by x in the rest of the code).

5.4 Feedback to the Developer

One crucial step in our framework is the feedback rep-

resentation to the developer in a way that allows him

to understand the source of the violation, and to be

able to make the needed corrections to ﬁx it. It is then

necessary to export the model checking output on the

PDG built from the source code, as it is the closest

representation of the program code.

JOANA tool generates the PDG from the Java byte

code, and performs the analysis on the PDG. How-

ever, interpreting the results on the byte code level

is not straightforward, neither is its mapping to the

source code. With the objectiveof covering this short-

coming, we built a PDG from the program source

code, and we made possible the bidirectional map-

ping between the PDG source code and the PDG byte

code. The two graphs are augmented as explained in

Section 4.

The veriﬁcation carried out on the Sec

LTS re-

turns the trace(s) violating the security guideline;

those results are then returned on the PDG we con-

structed from the source code to provide more details

to the developer about where and why the violation

occurred. The returned trace (Figure 7) shows the ex-

act path where the guideline violation occurred.

ICISSP 2017 - 3rd International Conference on Information Systems Security and Privacy

186

Figure 7: Violation trace.

6 LIMITATION

The framework we proposed in this paper constitutes

a proof-of-concept regarding the feasibility of our ap-

proach. However, our framework falls short in spec-

ifying guidelines involving imprecise and ambigu-

ous notions such as guideline IDS15-J:Do not allow

sensitive information to leak outside a trust bound-

ary (CERT, a) from CERT Coding Standard source

(CERT, b). The notion of trust boundary requires well

deﬁned system boundaries, which is not trivial when

it comes to distributed applications.

Other guidelines involve quantitative information

ﬂow theories, and this is also some other limitation of

the formal speciﬁcation we cover in our framework.

For instance, the guideline Limit quantity of data en-

crypted with one key recommends the use a new en-

cryption key when the amount of encrypted data goes

beyond a certain threshold. It is obvious to the reader

than our speciﬁcation and veriﬁcation method falls

short in covering this guideline for the dynamic as-

pect it involves, and that can’t be captured statically.

For the guidelines involving the semantic aspect,

such as password security rules, the automatic detec-

tion of the password data on the program cannot be

performed automatically. The annotation of the pass-

word requires a deep knowledgeon the program logic,

and the intervention of the developer/security expert

to annotate the password is required.

7 RELATED WORK

The speciﬁcation and veriﬁcation of security guide-

lines have also been addressed in the literature.

In the technical report (Aderhold et al., 2010) of

the joint work between TU Darmstadt and Siemens

AG, the authors provide formalizations of secure cod-

ing guidelines with the objective of providing precise

reference points. The authors make use of the LTL

formalism to specify the guidelines; however, LTL

leverage eventsand actions to model security policies,

and puts more focus on actions rather than data. The

mapping between the labels of LTL formulas and the

program instructions is performed by the developer,

which is an overhead to developers.

SecureDIS (Akeel et al., 2016) makes use of

model checking together with theorem-proving to

verify and generate the proofs. The authors adopt

the Event-B method, an extension of the B-Method,

to specify the system and the security policies. The

authors do not make clear how the policies param-

eters are mapped to the system assets, and they do

not extend the policy veriﬁcation and enforcement at

the program level. The work targets one speciﬁc sys-

tem type (Data Integration System), and is more fo-

cused on access control enforcement policies, speci-

fying the subject, the permissionsand the object of the

policy. However, access control mechanisms are not

sufﬁcient for the conﬁdentiality property, as they can’t

provide assurance about where and how the data will

propagate, where it will be stored, or where it will be

sent or processed. The authors target system design-

ers rather than developers or testers, and consider a

speciﬁc category of policies focused on data leakage

only.

GraphMatch (Wilander and Fak, 2005) (Wilander

and Fak, ),is a code analysis tool/prototype for secu-

rity policy violation detection. GraphMatch considers

examples of security properties covering both posi-

tive and negative ones, that meet good and bad pro-

gramming practices. GraphMatch is more focused on

control-ﬂow security properties and mainly on the or-

der and sequence of instructions, based on the map-

ping with security patterns. However, it doesn’t seem

to consider implicit information ﬂows that can be the

source of back-doors and secret variables leakage.

The Jif (Myers and Liskov, 2000) language im-

plements type-checking that makes use of Decentral-

ized Label Model (DLM) (Myers and Liskov, 1997);

it allows to deﬁne a set of rules to be followed by

programs to prevent information leakage. Jif pro-

grams are type-checked at compile-time, which en-

sures type-safety as well as that rules are applied.

However, the labels, which deﬁne policies for use of

the data, apply only to a single data value, and are not

checked at run-time.

Dimitrova et al. (Dimitrova et al., 2012) proposed

an approach that integrates the dynamic analysis into

the monitoring of information ﬂow properties. The

authors proposed an extension to LTL SecLTL tak-

ing into account the information ﬂow properties such

as non-interference and declassiﬁcation. SecLTL was

then used as a speciﬁcation for model checking. The

authors assume that secret data is provided as in-

put, however, sensitive data can originate from other

Tracking Dependent Information Flows

187

sources such as reading from a database. In addition,

there might be the case where multiple sensitive data

are provided as input, the monitoring of multiple sets

of traces is then required, which can turn to be too

expensive, and may lead to loss of precision.

In their work ”Detecting Temporal Logic Pred-

icates in Distributed Programs Using Computation

Slicing” (Sen and Garg, 2003), Alper Sen and Vijay

K.Grag adopted an approach that models the possible

executions of the program in ﬁnite traces of events,

and performs ”computational slicing”, that is, slicing

with respect to a global predicate. Their approach is

based on the dynamic behavior of the program, which

requires a sufﬁcient number of test cases and is quite

time-consuming, yet it cannot ensure a veriﬁcation of

the entire set of paths of the program to analyze.

Aora¨ı plugin (Stouls and Prevosto, ) provides the

means to automatically annotate C programs with

LTL formulas that translate required properties. The

tool provides the proofs that the C program behav-

ior can be described by an automaton. The mapping

between states and code instructions is made based on

the transition properties that keep track of the pre- and

post- conditions of the methods invocation; those con-

ditions refer to the set of authorized states respectively

before and after the method call. The tool is only fo-

cused on the control dependencies between method

calls, and the analysis is not extended to the data level.

PIDGIN (Andrew et al., 2015) introduces an ap-

proach similar to our work. The authors propose the

use of PDGs to verify security guidelines. The speci-

ﬁcation and veriﬁcation of security properties rely on

a custom PDG query language that serves to express

the policies and to explore the PDG and verify satisﬁ-

ability of the policies. The parameters of the queries

are labels of PDG, which supposes that the developer

is fully aware of the complex structure of PDGs, iden-

tify the sensitive information and the possible sinks

they might leak to. PIDGIN limits the veriﬁcation to

the paths between sinks and sources, however, there

might be information leakage that occurs outside this

limited search graph. The authors do not provide the

proof that their speciﬁcation is formally valid. It is not

also explained how the feedback will be presented to

the developer, or how we might be guided through the

correction phase.

8 CONCLUSION AND FUTURE

WORK

We presented in this paper a ﬁrst proof-of-concept re-

garding the feasibility of our approach that aims at ex-

tending the guidelines veriﬁcation and validation on

the different phases of the software development life-

cycle. We proposed a ﬁrst attempt to ﬁll the gap of

the formal veriﬁcation of guidelines provided in in-

formal way. We stressed the difﬁculty encountered

when the security guideline involves dependent infor-

mation ﬂows that can’t be speciﬁed separately. This

requires security expertise to specify the dependent

information ﬂows. We make the strong assumption

that the security expert extracts the key concepts from

the guidelines textual descriptions and builds upon

them the formulas using the MCL formalism. Our

framework makes use of this speciﬁcation to carry out

the model checking on the Labeled Transition System

we built from the Program Dependence Graph that we

have augmented with details such as the customized

annotations and the implicit dependencies.

The veriﬁcation phase output indicates whether

the guideline is met, or it is violated, and the viola-

tion traces are returned. Using this output, we will be

able to provide a precise and useful feedback to the

developer to understand the source of the violation,

and possibly how to ﬁx it. Future work includes the

representation of the model checking output on the

Program Dependence Graph, and on the code level

in the Integrated Development Environment. We aim

also at covering a wider range of security guidelines,

hence to extend the Security Knowledge Base in or-

der to capture more security concepts, and possibly,

to cover different programming languages.

REFERENCES

Aderhold, M., Cu?llar, J., Mantel, H., and Sudbrock, H.

(2010). Exemplary formalization of secure coding

guidelines. Technical report, TU Darmstadt and

Siemens AG.

Akeel, F., Salehi Fathabadi, A., Paci, F., Gravell, A., and

Wills, G. (2016). Formal modelling of data integra-

tion systems security policies. Data Science and En-

gineering, pages 1–10.

Andrew, J., Lucas, W., and Scott, M. (2015). Exploring

and enforcing security guarantees via program depen-

dence graphs. PLDI 2015 Proceedings of the 36th

ACM SIGPLAN Conference on Programming Lan-

guage Design and Implementation, pages 291–302.

Arnold, A. (1994). Finite transition systems. Semantics of

communicating sytems. Prentice-Hall. ISBN 0-13-

092990-5.

CERT. Do not allow sensitive information to leak outside a

trust boundary.

CERT. Sei cert oracle coding standard for java.

Chen, Z., editor (2011). Speciﬁcation and Management

of Security Requirements for Service-Based Systems.

Proquest, Umi Dissertation Publishing.

ICISSP 2017 - 3rd International Conference on Information Systems Security and Privacy

188

Denning, D. E. and Denning, P. J. (1977). Certiﬁcation

of programs for secure information ﬂow. Commun.

ACM, 20(7):504–513.

Dimitrova, R., Finkbeiner, B., and Rabe, M. N. (2012).

Leveraging Applications of Formal Methods, Veri-

ﬁcation and Validation. Technologies for Mastering

Change: 5th International Symposium, ISoLA 2012,

Heraklion, Crete, Greece, October 15-18, 2012, Pro-

ceedings, Part I, chapter Monitoring Temporal Infor-

mation Flow, pages 342–357. Springer Berlin Heidel-

berg, Berlin, Heidelberg.

Graf, J., Hecker, M., and Mohr, M. (2013). Using joana

for information ﬂow control in java programs - a prac-

tical guide. In Proceedings of the 6th Working Con-

ference on Programming Languages (ATPS’13), Lec-

ture Notes in Informatics (LNI) 215, pages 123–138.

Springer Berlin / Heidelberg.

Graf, J., Hecker, M., Mohr, M., and Snelting, G. (2015).

Checking applications using security apis with joana.

8th International Workshop on Analysis of Security

APIs.

Lang, F., Garavel, H., and Mateescu, R. (2002). An

overview of cadp 2001. European Association for

Software Science and Technology (EASST) Newslet-

ter, 4.

Mateescu, R. and Thivolle, D. (2008). A model check-

ing language for concurrent value-passing systems.

In Proceedings of the 15th International Symposium

on Formal Methods, FM ’08, pages 148–164, Berlin,

Heidelberg. Springer-Verlag.

Myers, A. C. and Liskov, B. (1997). A decentralized model

for information ﬂow control. SIGOPS Oper. Syst. Rev.,

31(5):129–142.

Myers, A. C. and Liskov, B. (2000). Protecting privacy us-

ing the decentralized label model. ACM Trans. Softw.

Eng. Methodol., 9(4):410–442.

OWASP. About the open web application security project.

OWASP. Cryptographic storage cheat sheet.

OWASP. Owasp secure coding practices quick reference

guide.

Sabelfeld, A. and Sands, D. (2009). Declassiﬁcation:

Dimensions and principles. J. Comput. Secur.,

17(5):517–548.

Sen, A. and Garg, V. K. (2003). Detecting temporal

logic predicates in distributed programs using compu-

tation slicing. In Principles of Distributed Systems,

7th International Conference, OPODIS 2003 La Mar-

tinique, French West Indies, December 10-13, 2003

Revised Selected Papers, pages 171–183.

Stouls, N. and Prevosto, V. Aorai plugin tutorial – frama c.

Wasserrab, D., Lohner, D., and Snelting, G. (2009). On

pdg-based noninterference and its modular proof. In

Proceedings of the 4th Workshop on Programming

Languages and Analysis for Security, pages 31–44.

ACM.

Wilander, J. and Fak, P. Modeling and visualizing security

properties of code using dependence graphs.

Wilander, J. and Fak, P. (2005). Pattern matching security

properties of code using dependence graphs.

Zhioua, Z., Roudier, Y., Short, S., and Boulifa Ameur,

R. (2016). Security guidelines: Requirements engi-

neering for verifying code quality. In ESPRE 2016,

3rd International Workshop on Evolving Security and

Privacy Requirements Engineering, September 12th,

2016, Beijing, China, co-located with the 24th IEEE

International Requirements Engineering Conference,

Beijing, CHINA.

Tracking Dependent Information Flows

189