MODINF: Exploiting Reiﬁed Computational Dependencies for

Information Flow Analysis

Jens Van der Plas

, Jens Nicolay

, Wolfgang De Meuter

and Coen De Roover

Software Languages Lab, Vrije Universiteit Brussel, Pleinlaan 2, Brussels, Belgium

Keywords:

Information Flow Control, Data Flow Analysis, Taint Analysis, Static Analysis, Modular Analysis.

Abstract:

Information Flow Control is important for securing applications, primarily to preserve the conﬁdentiality and

integrity of applications and the data they process. Statically determining the ﬂows of information for security

purposes helps to secure applications early in the development pipeline. However, a sound and precise static

analysis is difﬁcult to scale. Modular static analysis is a technique for improving the scalability of static analy-

sis. In this paper, we present an approach for constructing a modular static analysis for performing Information

Flow Control for higher-order, imperative programs. A modular analysis requires information about data de-

pendencies between modules. These dependencies arise as a result of information ﬂows between modules, and

therefore we piggy-back an Information Flow Control analysis on top of an existing modular analysis. Addi-

tionally, the resulting modular Information Flow Control analysis retains the beneﬁts of its modular character.

We validate our approach by performing an Information Flow Control analysis on 9 synthetic benchmark pro-

grams that contain both explicit and implicit information ﬂows.

1 INTRODUCTION

Information Flow Control (IFC) is the practice of

detecting and preventing undesirable ﬂows of infor-

mation in an application to preserve certain security

properties of the application and the systems it runs

on. IFC can be used to preserve conﬁdentiality, in-

tegrity and availability, by disallowing secret or sensi-

tive information to ﬂow to public ‘sinks’ and by disal-

lowing untrusted data to end up at sensitive sinks like

a query evaluator. Unwanted ﬂows can be detected

by a static information ﬂow analysis (e.g., (Zanotti,

2002; De Bleser et al., 2017)) that tracks informa-

tion as it moves between sources and sinks. The most

well-known static IFC analysis is taint analysis.

Static information ﬂow analysis can be ap-

plied early in a software development pipeline, but

analysing non-trivial applications is challenging with

respect to scalability and precision. To improve scal-

ability, modular analysis can be performed, where

instantiations of modules (or ‘components’) are an-

alyzed separately. In non-trivial applications, com-

https://orcid.org/0000-0002-7475-576X

https://orcid.org/0000-0003-4653-5820

https://orcid.org/0000-0002-5229-5627

https://orcid.org/0000-0002-1710-1268

ponents can be inter-dependent. For example, when

treating functions as components, a function can call

another, or functions can access and modify shared re-

sources. The component dependency graph that arises

results of data ﬂow between the different components.

In this paper, we exploit the insight that modular

analyses depend on data (or information) ﬂow to han-

dle inter-component dependencies, and that this can

be the basis for a modular information ﬂow analysis.

We adapt and extend MODF (Nicolay et al., 2019), a

generic modular static analysis for higher-order, im-

perative programs, into an information ﬂow analy-

sis usable for IFC. MODF is function-modular, its

components correspond to function calls, and soundly

manages inter-component dependencies by tracking

interactions of components with the store (or heap).

IFC also requires tracking dependencies, not only

between components – like modular analysis – but

also inside components. In this paper, we propose to

reuse the dependency tracking mechanism of a mod-

ular analysis that we extend to track data ﬂows within

components and through the program under analy-

sis. We explain our approach by extending MODF

to a modular information ﬂow analysis we call MOD-

INF. MODF is generic and simple enough to represent

other modular analysis modelling inter-component

dependencies based on heap or other effects.

420

Van der Plas, J., Nicolay, J., De Meuter, W. and De Roover, C.

MODINF: Exploiting Reiﬁed Computational Dependencies for Information Flow Analysis.

DOI: 10.5220/0011849900003464

In Proceedings of the 18th International Conference on Evaluation of Novel Approaches to Software Engineering (ENASE 2023), pages 420-427

ISBN: 978-989-758-647-7; ISSN: 2184-4895

 2023 by SCITEPRESS – Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)

MODINF is capable of detecting information ﬂow

as a result of both data dependence (explicit informa-

tion ﬂow) and control dependence (implicit informa-

tion ﬂow). We implemented a prototype of MODINF

and validated it on benchmark programs containing

a mix of language features (assignment, higher-order

functions,...), and different explicit and implicit ﬂows.

2 BACKGROUND

MODINF, is situated at the conﬂuence of IFC and

(modular) static analysis, which we introduce here.

2.1 Information Flow Control

The goal of Information Flow Control (IFC) (Hedin

and Sabelfeld, 2012; Scull Pupo et al., 2018; Russo

and Sabelfeld, 2010) is to detect and prevent ﬂows of

information that decrease the security of an applica-

tion or the systems it runs on, e.g., to preserve proper-

ties such as conﬁdentiality, integrity, and availability.

Information enters an application at one or more

sources and leaves the application at various sinks.

Conceptually, the information that appears at sources

is tagged with a particular value, depending on the

type of source and the security properties to be ver-

iﬁed. When preserving conﬁdentiality, sources tag

their values e.g., with a label indicating its conﬁ-

dentiality level. Tags are drawn from a join semi-

lattice (Denning, 1976); when multiple tags have to

be joined, a unique least upper bound is always de-

ﬁned. The simplest example of such a lattice only has

two elements: H, denoting highly-sensitive informa-

tion, and L, denoting the sensitivity of the information

is low. This lattice has the partial order L ⊑ H; the join

operation is deﬁned as L ⊔ L = L and H otherwise.

Tags are propagated as the program manipulates

information. There are two dimensions along which

this happens: data dependence (giving rise to explicit

information ﬂow) and control dependence (giving rise

to implicit information ﬂow).

2.1.1 Explicit Information Flow

When new information is derived from previous in-

formation, this information is data-dependent on the

existing information. In this case, the newly derived

information has as label the unique least upper bound

of the set of tags belonging to the existing informa-

tion. This ﬂow is called an ‘explicit’ information ﬂow.

Consider e.g., the following simple Scheme pro-

gram that reads two values from user input and prints

their sum. The values of x and y thus come from

sources and can be labeled accordingly. The value

of z carries as label the join of the labels of x and y.

( d e f i n e x ( r ea d ) )

( d e f i n e y ( r ea d ) )

( d e f i n e z ( + x y ) )

( d i s p l a y z )

2.1.2 Implicit Information Flow

When the derivation of new information is dependent

on a condition (also information), then the new infor-

mation is control-dependent on that condition, and it

is tagged with the label of the condition. This ﬂow is

called an ‘implicit’ information ﬂow.

The following code exempliﬁes such a ﬂow.

Based on user input, the value of the variable result

may change. Thus, although no information ﬂows di-

rectly from input to result, the value of the latter

still depends on the value of the former.

( d e f i n e r e s u l t # f )

( d e f i n e i n p u t ( r e a d ) )

( i f (> i n p u t 0 ) ( s e t ! r e s u l t # t ) )

2.1.3 Declassiﬁcation and Endorsement

Joining different tags of values due to data and con-

trol dependencies results in monotonically-increasing

tags: derived information can never be less sensitive,

more trustworthy, etc. than any of the information

from which it was derived. However, at some point

programs do have to release some information, and to

do so safely speciﬁc operations can be applied that re-

sult in a ‘decrease’ of a tag (Chong and Myers, 2004).

Take the example of preserving the conﬁdentiality

of information using tags that indicate the information

secrecy level. Before storing some data in a database,

an application could ﬁrst encrypt this data, resulting

in encrypted data that has a lower secrecy level than

the original data. In the context of preserving conﬁ-

dentiality, such an encryption function is called a ‘de-

classiﬁer’. Similarly, when protecting the integrity or

availability of systems and applications, ‘sanitizers’

or ‘endorsers’ in information ﬂows have the ability to

decrease the untrustworthiness of information.

2.2 Modular Static Analysis

Static analysis is used to determine semantic proper-

ties of programs without actually executing those pro-

grams (Cousot and Cousot, 1977). Typically, static

analyses over-approximate the actual run-time be-

haviour to be sound and terminate within a reason-

able time, where ‘reasonable’ depends on the context.

This, however, means that results may be imprecise:

MODINF: Exploiting Reiﬁed Computational Dependencies for Information Flow Analysis

421

the analysis models all possible run-time behaviour,

but potentially also behaviour that can never occur.

The relation between speed, precision, and sound-

ness is complex (Andreasen et al., 2017): small

changes in any of these aspects may result in large,

unpredictable changes in results and the ability of the

analysis to produce useful answers. The main chal-

lenge is to produce useful answers in a fair amount of

time. Much research has been focused on techniques

for increasing static analysis performance without

negatively impacting other criteria such as precision

too much or at all (e.g., (Might and Shivers, 2006)).

Modularization is one technique to improve anal-

ysis performance (Cousot and Cousot, 2002; Nicolay

et al., 2019). A modular analysis analyses parts of

the program separately, and composes the results to

obtain information about the entire program. These

parts are referred to as modules. At runtime, multi-

ple instantiations of these modules may exist, which

the analysis may distinguish as well. For example,

a given function can be called multiple times. The

reiﬁcations within the analysis of these instantiations,

called components, are analysed in isolation. Mod-

ules can vary from coarse-grained (e.g., a thread deﬁ-

nition (Sti

evenart et al., 2019)) to ﬁne-grained (e.g., a

function deﬁnition (Nicolay et al., 2019)). The corre-

sponding components are then a thread and a function

call. A component contains the corresponding mod-

ule and a context that allows more components to be

distinguished, thereby increasing analysis precision.

Modular analysis can be more efﬁcient than

a whole-program counterpart for various reasons:

memory consumption is reduced, components can be

analysed in parallel (Van Es et al., 2020; Sti

evenart

et al., 2021), and components will not be reanalyzed

due to non-relevant changes (Van der Plas et al., 2020;

Van der Plas et al., 2023).

Ideally, components do not depend on each other.

Yet, in all but the most trivial cases, components do

depend on information computed by other compo-

nents. A function may e.g., use the return value of

another. Thus, some ordering must be obeyed so that

a component is only analyzed after all its dependen-

cies have been analyzed. Computing inter-component

dependence can happen before or during the analy-

sis (Nicolay et al., 2019). In case of a cyclic depen-

dencies or self-dependence (e.g., due to recursion), no

such ordering may exist; components may be anal-

ysed multiple times to take into account information

computed by the analyses of the components it de-

pends on. In all cases, component dependencies can

be tracked using heap access: when a component

writes to a memory location, any component reading

this location depends on it.

2.3 Effect-Driven Modular Analysis

MODF (Nicolay et al., 2019) is a type of mod-

ular analysis, expressed as an abstract interpreta-

tion (Cousot and Cousot, 1977), computing essen-

tial control and value ﬂow properties that found many

other non-trivial static analyses. Its ﬁxed-point com-

putation consists of two alternating phases. An intra-

component analysis analyses a component in isola-

tion. Doing so, it infers effects representing its inter-

actions with the global store σ and the discovery of

other components. (MODF uses global store widen-

ing (Shivers, 1991), i.e., a single value store is shared

among all components.) An inter-component analy-

sis uses these effects to decide which components to

analyse using a worklist. For example, if a component

reads at an address in σ (indicated by a read effect), it

becomes dependent on it. If this address is later writ-

ten to and the value at the address changes (indicated

by a write effect), all dependent components must be

reanalysed. Thus, the inter-component analysis can

then add all dependent components to its worklist.

MODF is function-modular: modules are func-

tions and components represent function calls. When

a function call is encountered, the analysis does not

step into this function but generates a call effect for

the component representing the call in the analysis. If

this component has not yet been analysed, the inter-

component analysis schedules it for analysis. The

analysis then retrieves the return value of the com-

ponent from σ (or ⊥ if it had not yet been analysed).

MODF reaches a ﬁxed point as soon as the work-

list is empty. It can be used with different represen-

tations of abstract values and different context sensi-

tivities. The latter allows multiple components to be

used for a single function, increasing the precision of

the analysis. There can e.g., be a different component

for every calling location of a function. Effect-driven

analyses can also be created for other module granu-

larities such as threads (Sti

evenart et al., 2019).

2.3.1 Example of a MODF Analysis

We exemplify MODF using the program in Listing 1,

visualising its analysis in Figure 1. When represent-

ing values by their type and using no context sensitiv-

ity, the program in Listing 1 is analysed as follows:

1. The Main component, representing the program

entry point, is analysed. Upon the deﬁnition of

num, a write effect is generated. When the call

to plus-n is encountered, a new component is

created and added to the worklist. As no return

value for plus-n was computed yet, ⊥ is retrieved

from σ and a read effect is registered on this return

value. Finally, a return value is written to σ.

ENASE 2023 - 18th International Conference on Evaluation of Novel Approaches to Software Engineering

422

Listing 1: Example program.

( d e f i n e num 0 )

( d e f i n e ( p l u s − n n ) ( s e t ! num (+ n num ) ) ( p r i n t ) )

( d e f i n e ( p r i n t ) ( d i s p l a y ( s t r i n g − a p p e n d ” c u r r e n t : ”

( number−>s t r i n g num ) ) ) )

( p l u s − n 10 )

Figure 1: MODF analysis of the program in Listing 1.

2. The plus-n component is analysed. Reading n and

num generates read effects. Writing num generates

no write effect as the value of num in σ remains

unchanged (Int). As a call to print is encoun-

tered, a new component is created and added to

the worklist. After retrieving its return value (⊥)

from σ, the return value of plus-n is written to σ.

3. The print component is analysed. Reading num

generates a read effect. Then, the return value of

print (void) is written to σ. As a write effect is

generated, plus-n is added to the worklist again.

4. The analysis continues until the worklist is empty,

indicating that a ﬁxed-point has been reached.

3 APPROACH

We exploit the inter-component dependencies of a

modular analysis, which model the inter-component

information ﬂow of the application under analysis, as

a basis for an IFC analysis. We extend the modular

analysis to also compute intra-component informa-

tion ﬂow information, thus obtaining the complete in-

formation ﬂow, containing both explicit and implicit

ﬂows, that can be used to perform taint analysis.

The beneﬁt of our approach is that it does not re-

quire the design of an analysis with a speciﬁc taint

lattice. Instead, we use the data ﬂow information that

already exists within a modular analysis and complete

it with the intra-component data ﬂow. In particular,

we reuse the mechanism of the modular analysis for

tracking inter-component dependencies based on ac-

cess to shared resources, and extend it so that intra-

component dependencies are tracked as well. Typi-

cally, this means that only modest changes to the anal-

ysis are required. The resulting taint analysis also in-

herits (and beneﬁts from) the modularity, context sen-

sitivity, lattice implementations, or any other property

or mechanism of the underlying analysis from which

it was derived. Our approach thus does not depend

on these properties or mechanisms, nor does it require

additional effort in designing and implementing them.

MODF represents the essence of a modular anal-

ysis: is does not prescribe how the intra-component

analysis must be performed, but only speciﬁes two

constraints which enbody two essential ingredients of

any modular analysis: ﬁnite per-component analysis

and tracking of inter-component dependencies using

effects occurring on shared resources. Thus, while we

base our work on MODF, our approach can be instan-

tiated with any modular analysis that contains these

two ingredients and exposes information about them.

We assume the existence of three constructs to

provide information about taint in a program:

• (source x) returns the value of the variable x

and marks this as originating from a source;

• (sink x) indicates that the value of the variable

x has reached a sink, and also returns this value;

• (sanitize x) returns the value of the variable x

but indicates that this value has been sanitized.

In a real-world setting, existing (library) functions for

interacting with the outside world would play the role

of sources, sinks, and sanitizers. We abstract them

here to keep our approach general, and also for prac-

tical (testing) purposes (Section 6).

4 FROM MODF TO MODINF

We chose MODF as the analysis to instantiate

our approach with, because it is a straightforward,

generic and modular ﬂow analysis that models inter-

component dependencies. The resulting modular IFC

analysis is called MODINF.

In addition to inter-component data ﬂow, MOD-

INF also has to track intra-component data ﬂow.

It does so by extending the inter-component data

ﬂow tracking offered by MODF. MODF infers inter-

component data ﬂow by detecting how values ﬂow

between addresses in the analysis store σ. The store

maps addresses to values, and represents the heap of

the analysis. Consequently, MODINF infers intra-

component data ﬂow during the intra-component

MODINF: Exploiting Reiﬁed Computational Dependencies for Information Flow Analysis

423

analysis using the same mechanism of monitoring

store operations. In short, whenever the analysis of

a component reads a value from a certain address

in the store, this value is labeled with the address.

This means that addresses piggy-back on top of val-

ues as they ﬂow through the analyzed program. In

the next sections, we describe how MODINF handles

both types of information ﬂow (explicit and implicit)

in more detail.

4.1 Explicit Information Flow

Explicit information ﬂow arises due to value ﬂow in

the program. When a program is executed, values

are propagated through its operations, and values are

read from and stored in the heap. Similarly, during an

analysis, abstract values are propagated through ab-

stract operations, and abstract values are read from

and stored in the store.

During the analysis of a component, when values

are propagated, the labels of these values need to be

propagated together with the values ﬂowing through

in the analysis. Doing so allows the analysis to keep

track of how values ﬂow between addresses in the

analysis store (recall that the labels indicate the ad-

dresses in the store from which a value originates).

When a value is used in a computation, its labels

are added to the result of the computation; when mul-

tiple values are used in the computation, the result

carries the labels of all the argument values. After

all, when one value originates from address a and an-

other value originates from an address b, then the re-

sult from an operation on both values originates from

information in both a and b. Similarly, when e.g., a

pointer is dereferenced, the resulting value inherits the

labels of the pointer, as the resulting value depends on

the value of the pointer. Thus, the labels attached to

a value indicate all addresses in the store the value is

inﬂuenced by/originating from.

4.2 Implicit Information Flow

Implicit information ﬂows arrive from conditions in

the program, such as branching in an if statement or

dynamic function calls: based on a condition, a piece

of code may or may not be executed, or the function

to be called depends on a value. A taint analysis must

take these implicit ﬂows into account, because the

conditional branching may depend on a tainted value,

and the choice of function to be executed as well.

To handle these implicit ﬂows, extra ﬂow informa-

tion is needed. In general, this information originates

at the condition on which is branched (e.g., the predi-

cate in an if statement): all data that impacts a condi-

tion also impacts its branches and result. Thus, when

a value in a branch is written to the store, the analysis

does not only take its labels into account but also the

labels of the condition. These labels are also added to

the result value of the conditional computation, as this

is dependent on the value of the condition as well.

Consider e.g., the following program. Explicit

data ﬂow in this program arises from the parameter

n, whose value ﬂows to the result of maybe-inc. As

cond also inﬂuences the result, the labels attached to

its value are added to the return value of maybe-inc.

( d e f i n e ( may b e−inc n cond ) ( i f cond (+ n 1 ) n ) )

A similar system is needed for dynamic func-

tion calls. To propagate implicit information ﬂow

across function boundaries, which are also compo-

nent boundaries, upon a function call, the current im-

plicit information ﬂow labels, as well as the explicit

ﬂow labels of the function value, are ‘attached’ to the

called component. When the component is analysed,

these labels are considered as well when a value is

written to the analysis store. However, this implies

that when new labels are encountered upon a func-

tion call, the corresponding component needs to be

reanalysed (so the newly added labels can be propa-

gated during its analysis)

. Consider e.g., the program

in Listing 2. During the analysis of a, the analysis in-

fers that the argument of f is a boolean and discovers

no implicit ﬂows to be added to f. Thus, during the

analysis of f, no implicit ﬂows are taken into account.

If then, however, b is analysed, an implicit ﬂow for f

is found. However, since the abstract value for x re-

mains the same, a modular analysis would not reanal-

yse f, thus ignoring the implicit ﬂows for f.

Listing 2: Reanalysis of f is needed to propagate labels.

( d e f i n e ( f x ) ( s i n k x ) )

( d e f i n e ( a ) ( f # t ) )

( d e f i n e ( b ) ( d e f i n e v # t )

( d e f i n e v−s ( s o u r c e v ) )

( i f v−s ( f # t ) ) )

( a )

( b )

When conditional ﬂows are nested (e.g., nested if

statements), implicit ﬂow information arising from all

conditions must be taken into account as they all con-

tribute to the path that is taken by the program under

analysis. Also, control ﬂow depends on value ﬂow

and vice versa. The value of a predicate (value ﬂow)

in a conditional statement such as if determines e.g.,

which branch of the conditional is executed (control

The authors assume that implicit ﬂows over function

boundaries can also be added after the analysis by using the

inferred write and call effects, thereby avoiding these extra

component analyses, but have not explored this path further.

ENASE 2023 - 18th International Conference on Evaluation of Novel Approaches to Software Engineering

424

ﬂow). If a tainted value induces control ﬂow, then

any value ﬂow that happens as a result of this control

ﬂow must also be tainted. We say that the branch or

function body is executed in a tainted context.

4.3 Interactions with the Global Store σ

When a value is propagated by the analysis, so are

its labels. Keeping these labels attached to the values

at all times is unnecessary, however, and would cause

the sets of labels of each value to keep growing. These

sets would also give little information on how values

actually ﬂow through the program, but only contain

all addresses that may have inﬂuenced the value.

Instead of keeping labels attached at all times, we

remove labels from values prior to writing to σ. As

such, the store only contains values but no labels. For

every address in σ, we keep track of the labels that

were attached to the values written, and we merge the

labels corresponding to the explicit and implicit infor-

mation ﬂow together at the time of the store write. We

thus obtain a kind of data ﬂow graph, where the edges

are directed backwards. As such, the analysis tracks

how values ﬂow between addresses in σ, and thus also

how values ﬂow within the analyzed program.

4.4 Taint Derivation

The data ﬂow information, computed as just ex-

plained, can be used as a basis for taint analysis.

When a value is marked as originating from a

source, it is labeled with a speciﬁc label which can

e.g., carry information indicating the type of taint. A

sanitizer causes the value ﬂow to be threaded through

σ using a speciﬁc sanitization address to allow the

traversal (described next) to stop looking for tainted

values as sanitization removes taint. When encoun-

tering a sink, the analysis also threads the value ﬂow

through σ, using a speciﬁc sink address, allowing the

traversal to ﬁnd sinks as starting points for tracing.

At the end of the analysis, the data ﬂow infor-

mation can be used to detect harmful ﬂows by trac-

ing the data ﬂow backwards starting from sink ad-

dresses. A sanitization address causes the trace to be

abandoned as no tainted ﬂow can originate from it.

When a source label is found, however, there exists a

non-sanitized ﬂow from the corresponding source to

at least one sink. Hence, a security risk may exist in

the application. Our analysis then reports a tuple con-

taining the source and sink (but could e.g., also report

the entire ﬂow path).

5 IMPLEMENTATION

We have implemented MODINF in MAF (Van Es

et al., 2020), a framework for the construction of

modular analyses. Our implementation can analyse

Scheme programs that are enriched with the source,

sink, and sanitize constructs presented earlier.

Scheme, a dynamically-typed higher-order language,

is very difﬁcult to analyse since control ﬂow and data

ﬂow are intertwined. The concepts introduced in this

work can therefore be transferred to other highly-

dynamic languages, like JavaScript, Java, and C++.

Only minor changes were needed to extend an ex-

isting modular analysis in MAF with intra-component

data-ﬂow information. We extended the representa-

tion of abstract values so that labels can be piggy-

backed. When values are joined, the union of the sets

carried along those values is computed. When an op-

eration is applied to values, the result is labeled with

the union of the label sets of the arguments. When an

abstract pointer is dereferenced, the obtained value is

labeled with the labels of the pointer.

To propagate implicit ﬂow information across

function boundaries, we store for every component a

set of all implicit taints that were present during any

call to that component. The analysis of a component

considers this set as being part of the implicit ﬂows

and adds it to the implicit ﬂows caused by condition-

als upon every write to the analysis store.

6 VALIDATION

We performed a preliminary validation of our work

using 9 hand-crafted programs that reﬂect the vari-

ous ways in which taints may ﬂow through a program.

The goal of this validation is to ensure that our anal-

ysis ﬁnds all vulnerable ﬂows, i.e., that it is sound, in

the presence of various complex value ﬂows. To facil-

itate our validation, we constructed the smallest pos-

sible programs that contain complex ﬂows. Exploring

properties of MODINF such as performance and scal-

ability is an interesting avenue of future work.

Concretely, we considered the following 9 small

programs (each between 5 and 10 LOC):

bad-ﬂow-retrigger-needed: Shown in Listing 2.

Contains a ﬂow from a source to a sink that, given

the worklist algorithm used by this validation (de-

scribed later), can only be found if a component is

reanalysed after new implicit ﬂows are found.

implicit-ﬂow: Contains conditional branching based

on a tainted value.

sanitization-in-tainted-context: Shown in Listing

MODINF: Exploiting Reiﬁed Computational Dependencies for Information Flow Analysis

425

3. Sanitizes a value and feeds it to a sink in a

tainted context, for which the analysis should still

consider this as a harmful ﬂow.

sanitized-ﬂow: A ﬂow originating from a source

passes a sanitizer and ﬂows to a sink. Hence, this

program should be considered safe.

side-effecting-function: Calls a function in a tainted

context. The called function changes the value of

a variable which ﬂows to a sink afterwards.

simple-ﬂow: A tainted value ﬂows to a sink.

sink-in-tainted-context: A tainted value is passed

through a sanitizer to a sink. The ﬂow of this value

to the sink depends on the original tainted value.

Hence, there is a harmful (implicit) ﬂow.

tainted-function-choice-1: A side-effecting func-

tion to be executed is selected from a list based

on a tainted value. The side-effect inﬂuences the

value ﬂowing to a sink.

tainted-function-choice-2: Shown in Listing 4. The

return value of a function call ﬂows to a sink.

However, the function might have been overrid-

den (depending on a tainted value).

Listing 3: The sanitization-in-tainted-context.

benchmark.

( d e f i n e x # t )

( d e f i n e x−s ( s o u r c e x ) )

( i f x−s ( l e t ( ( s a n ( s a n i t i z e x −s ) ) ) ( s i n k s a n ) ) )

Listing 4: The tainted-function-choice-2. bench-

mark.

( d e f i n e a # t )

( d e f i n e a2 ( s o u r c e a ) ) ; Value comes from a s o u r c e .

( d e f i n e ( b x ) x )

( d e f i n e ( s e t − b ) ( s e t ! b ( lambda ( x ) # f ) ) )

( i f a2 ( s e t − b ) ) ; b d e p e n d s on a2 .

( d e f i n e r e s ( b 1 0 ) )

( s i n k r e s ) ; R e s u l t o f ( b 1 0 ) f l o w s t o a s i n k .

We instantiated MODINF with a type domain

(representing values by their type except booleans

which are represented concretely when possible;

pointers, closures and primitive functions are repre-

sented using sets), without context sensitivity and a

last-in-ﬁrst-out worklist algorithm. (Other lattice rep-

resentations, context sensitivities and worklist algo-

rithms are possible as well.) Our analysis was able to

detect all harmful ﬂows within the programs. Thus,

our preliminary evaluation shows that our analysis

does not have false negatives on several small hand-

crafted programs containing challenging value ﬂow.

We also did not ﬁnd any false positives in the results

of the analysis, though this may be caused by the lim-

ited size of the programs used. Therefore, future work

should evaluate the precision of our analysis using

bigger programs, that are already supported by our

implementation, in which false positives may arise.

7 RELATED WORK

To the best of our knowledge, there are no related ap-

proaches that describe the relation or transition be-

tween modular static analysis and static IFC analysis.

Some related static analysis approaches also use

the store to determine dependence, but for other pur-

poses. (Nicolay et al., 2011) attempts to parallelize

binding expressions by computing dependencies be-

tween these expressions based on address reads and

writes. (Sti

evenart et al., 2015) detects concurrency

bugs based on conﬂicts involving shared addresses.

(Nicolay et al., 2017) investigates function purity

based on reading and writing of store addresses vis-

ible from the point of view of all callers on the stack.

The majority of static IFC analysis approaches

and implementations (see e.g. (Pauck et al., 2018) for

an overview) are not modular by design. MODINF

differs from most of the existing work as it does not

use a state space in which a value is tagged with a

taint label from a lattice. Instead, MODINF tracks ad-

dress dependencies as values are read and written in

the store, without propagating taint labels explicitly.

Modular (static) IFC analyses are far and few be-

tween. A notable example of a modular static taint

analysis is LGTM

, a code analysis platform devel-

oped by GitHub that is capable of performing taint

analysis of modules in JavaScript applications. It

achieves modularity by either not stepping into other

modules, or relying on a manually provided speciﬁca-

tion of taint ﬂows, requiring a trade-off between accu-

racy and effort (Staicu et al., 2020). In contrast, our

approach does not rely on manual speciﬁcations.

8 CONCLUSION AND FUTURE

WORK

In this paper, we introduced MODINF, a novel way

to information ﬂow analysis that leverages the inter-

component data ﬂow information inferred by a mod-

ular analysis. We extended this data ﬂow informa-

tion with intra-component data ﬂow information and

differentiate between ‘explicit’ information ﬂow, in-

dicating data dependence, and ‘implicit’ information

ﬂow, indicating control dependence. Using this ﬂow

https://lgtm.com/, soon to be integrated in GitHub

code scanning.

ENASE 2023 - 18th International Conference on Evaluation of Novel Approaches to Software Engineering

426

information, we obtain an information ﬂow analysis

that can detect harmful ﬂows in computer programs.

We validated our approach using 9 hand-crafted pro-

grams with complex data ﬂows and veriﬁed that all

harmful ﬂows were discovered by the analysis.

Our work shows that an information ﬂow analysis

can be obtained by making only minor changes to a

modular analysis. The resulting analysis is modular,

meaning that it scales well to large programs, and in-

dependent of a particular lattice or context sensitivity.

Future work may consider distinguishing different

types of taint tags, e.g., to reﬂect levels of information

sensitivity where low-sensitive data may e.g., be al-

lowed at some sinks in a program. Another improve-

ment would be to extend our validation and evalua-

tion using larger programs, allowing e.g., to evaluate

the precision of the analysis (i.e., the number of false

positives found by the analysis). Our implementation

already supports these bigger programs.

ACKNOWLEDGEMENTS

This work was partially supported by the Research

Foundation – Flanders (FWO) (grant No. 11F4822N)

and by the Cybersecurity Initiative Flanders.

REFERENCES

Andreasen, E. S., Møller, A., and Nielsen, B. B. (2017).

Systematic approaches for increasing soundness and

precision of static analyzers. In SOAP 2017, Proc.,

pages 31–36.

Chong, S. and Myers, A. C. (2004). Security policies for

downgrading. In CCS 2004, Proc., pages 198–209.

Cousot, P. and Cousot, R. (1977). Abstract interpretation: a

uniﬁed lattice model for static analysis of programs by

construction or approximation of ﬁxpoints. In POPL

1977, Proc., pages 238–252.

Cousot, P. and Cousot, R. (2002). Modular Static Pro-

gram Analysis. In CC 2002, Proc., pages 159–179.

Springer.

De Bleser, J., Sti

evenart, Q., Nicolay, J., and De Roover, C.

(2017). Static Taint Analysis of Event-driven Scheme

Programs. In ELS, pages 80–87.

Denning, D. E. (1976). A lattice model of secure informa-

tion ﬂow. Commun. ACM, 19(5):236–243.

Hedin, D. and Sabelfeld, A. (2012). A perspective on

information-ﬂow control. In Software safety and se-

curity, pages 319–347. IOS Press.

Might, M. and Shivers, O. (2006). Improving ﬂow analyses

via γcfa: Abstract garbage collection and counting. In

ICFP 2020, Proc., pages 13–25.

Nicolay, J., De Roover, C., De Meuter, W., and Jonck-

ers, V. (2011). Automatic Parallelization of Side-

Effecting Higher-Order Scheme Programs. In SCAM

2011, Proc., pages 185–194. IEEE.

Nicolay, J., Sti

evenart, Q., De Meuter, W., and De Roover,

C. (2017). Purity analysis for JavaScript through ab-

stract interpretation. Journal of Software: Evolution

and Process, 29(12).

Nicolay, J., Sti

evenart, Q., De Meuter, W., and De Roover,

C. (2019). Effect-driven Flow Analysis. In VMCAI

2019, Proc., pages 247–274. Springer.

Pauck, F., Bodden, E., and Wehrheim, H. (2018). Do an-

droid taint analysis tools keep their promises? In ES-

EC/FSE 2018, Proc., pages 331–341.

Russo, A. and Sabelfeld, A. (2010). Dynamic vs. static

ﬂow-sensitive security analysis. In CSF 2010, pages

186–199. IEEE.

Scull Pupo, A. L., Christophe, L., Nicolay, J., Roover, C. d.,

and Gonzalez Boix, E. (2018). Practical information

ﬂow control for web applications. In RV 2018, Proc.,

pages 372–388. Springer.

Shivers, O. (1991). Control-Flow Analysis of Higher-Order

Languages. Doctoral dissertation, Carnegie Mellon

University, Pittsburgh, PA, USA.

Staicu, C.-A., Torp, M. T., Sch

afer, M., Møller, A., and

Pradel, M. (2020). Extracting taint speciﬁcations for

javascript libraries. In ICSE 2020, Proc., pages 198–

209.

Sti

evenart, Q., Nicolay, J., De Meuter, W., and De Roover,

C. (2015). Detecting Concurrency Bugs in Higher-

Order Programs through Abstract Interpretation. In

PPDP 2015, Proc., pages 232–243.

Sti

evenart, Q., Nicolay, J., De Meuter, W., and De Roover,

C. (2019). A general method for rendering static anal-

yses for diverse concurrency models modular. Journal

of Systems and Software, 147:17–45.

Sti

evenart, Q., Van Es, N., Van der Plas, J., and De Roover,

C. (2021). A parallel worklist algorithm and its explo-

ration heuristics for static modular analyses. Journal

of Systems and Software, 181:111042.

Van der Plas, J., Sti

evenart, Q., and De Roover, C. (2023).

Result Invalidation for Incremental Modular Analy-

ses. In Dragoi, C., Emmi, M., and Wang, J., editors,

VMCAI 2023, Proc., volume 13881 of Lecture Notes

in Computer Science, pages 296–319. Springer.

Van der Plas, J., Sti

evenart, Q., Van Es, N., and De Roover,

C. (2020). Incremental Flow Analysis through Com-

putational Dependency Reiﬁcation. In SCAM 2020,

Proc., pages 25–36. IEEE Computer Society.

Van Es, N., Sti

evenart, Q., Van der Plas, J., and De Roover,

C. (2020). A Parallel Worklist Algorithm for Modular

Analyses. In SCAM 2020, Proc., pages 1–12. IEEE.

Van Es, N., Van der Plas, J., Sti

evenart, Q., and De Roover,

C. (2020). MAF: A Framework for Modular Static

Analysis of Higher-Order Languages. In SCAM 2020,

Proc. IEEE Computer Society.

Zanotti, M. (2002). Security typings by abstract interpre-

tation. In International Static Analysis Symposium,

pages 360–375. Springer.

MODINF: Exploiting Reiﬁed Computational Dependencies for Information Flow Analysis

427