ITERATED TRANSFORMATIONS AND QUANTITATIVE METRICS

FOR SOFTWARE PROTECTION

Mariusz H. Jakubowski, Chit W. (Nick) Saw and Ramarathnam Venkatesan

Microsoft Research, Redmond, WA, U.S.A.

Keywords:

Software protection, Tamper-resistance, Obfuscation, Security metrics.

Abstract:

This paper describes a new framework for design, implementation and evaluation of software-protection

schemes. Our approach is based on the paradigm of iterated protection, which repeats and combines sim-

ple transformations to build up complexity and security. Based on ideas from the ﬁeld of complex systems,

iterated protection is intended as an element of a comprehensive obfuscation and tamper-resistance system,

but not as a full-ﬂedged, standalone solution. Our techniques can (and should) be combined with previously

proposed approaches, strengthening overall protection.

A long-term goal of this work is to create protection methods amenable to analysis or estimation of security

in practice. As a step towards this, we present security evaluation via metrics computed over transformed

code. Indicating the difﬁculty of real-life reverse engineering and tampering, such metrics offer one approach

to move away from ad hoc, poorly analyzable approaches to protection.

1 INTRODUCTION

Open systems, such as PCs and mobile devices, have

long suffered from malicious tampering and reverse

engineering by hackers. To facilitate more secure

code execution, researchers have devised and imple-

mented many approaches that complicate observa-

tion and modiﬁcation of software. These include

obfuscation (Collberg et al., 1997; Collberg et al.,

1998b), anti-tampering measures (Aucsmith, 1996;

Horne et al., 2001; Jacob et al., 2007), data hid-

ing (Collberg et al., 1998a; El-khalil and Keromytis,

2004), and other protective transformations (Wang

et al., 2000; Anckaert et al., 2004; Tan et al., 2006;

Anckaert et al., 2007a). Such techniques have served

well in various contexts, but few have offered a prac-

tical security analysis to estimate how long a pro-

tected program will remain uncracked in practice.

While solutions exist under certain engineering as-

sumptions (Dedi´c et al., 2007), a current open prob-

lem is to develop practical protection techniques that

support a realistic security evaluation.

In this paper, we propose a new general approach

to devise transformations that protect code while en-

abling practical assessment of security via quantita-

tive metrics (Anckaert et al., 2007b). The central

idea is to combine and iterate simple transforma-

tions, such as injection of opaque predicates (Coll-

berg et al., 1998b), oblivious hashing (Chen et al.,

2002; Jacob et al., 2007), and control-ﬂow transfor-

mations (Collberg et al., 1997). These transforma-

tions may be far simpler than traditionally applied

techniques, and need not create much obfuscation or

tamper-resistance on their own; the main idea is to

build up complexity by repeated application and re-

combination of simple operations, creating a cascad-

ing effect. Particularly simple yet useful transforma-

tions include injection of inert “chaff” code, as well

as conversion of variable references to be performed

via newly created pointers. Indeed, rather than rely-

ing on well known transformations, iterated protec-

tion may facilitate controllable security and simpler

tool implementation by repeatedly applying straight-

forward primitives.

Our approach was inspired by the ﬁeld of com-

plex systems, which studies how simple transforma-

tion rules affect the state of abstract systems over

time. For example, cellular automata (CA) such as

the Game of Life (Wolfram, 2002) are represented

as arrays or grids of cells, each in a particular state

(e.g., discrete binary 0 or 1), and updated in discrete

time steps. A function called an update rule is ap-

plied to each neighborhood (e.g., a 3x3 square) in the

grid to yield the state of the center cell after the next

time step. Some surprisingly simple CA can perform

universal computation via emulation of Turing ma-

359

M. Jakubowski M., W. (Nick) Saw C. and Venkatesan R. (2009).

ITERATED TRANSFORMATIONS AND QUANTITATIVE METRICS FOR SOFTWARE PROTECTION.

In Proceedings of the International Conference on Security and Cryptography, pages 359-368

DOI: 10.5220/0002220103590368

 SciTePress

chines. In essence, iteration of very simple functions

over binary arrays can lead to arbitrary (or emergent)

behavior of the system. By analogy, iteration of sim-

ple transformations can lead to similar complexity in

software code.

In general, complex systems cannot be “short-cut”

to predict states at arbitrary future times; the sys-

tem must actually be run to determine what happens.

Thus, we may not be able to model or predict the

effects of iterating simple transformations over soft-

ware. In terms of security analysis, we do not typi-

cally attempt to predict the outcomes of iterated trans-

formations; instead, we evaluate security via metrics

computed over the ﬁnal transformed code. In particu-

lar, we use metrics from a study on quantitative eval-

uation of obfuscation (Anckaert et al., 2007b), along

with additional metrics devised for analysis of iter-

ated and recombined transforms. Via heuristics, ex-

periments and analysis, such estimates of complexity

can be associated with actual security. A spectrum of

various metrics offers a means of evaluating complex-

ity and security in terms of real-life tampering and re-

verse engineering.

The rest of this paper is structured as follows. In

Section 2, we provide more explanation and back-

ground on the iteration approach. A list of some

practical protective transformations is found in Sec-

tion 3. Section 4 describes a metric-based approach

towards evaluating the security of iterated and recom-

bined transformations. A tool implementation and ex-

perimental results are the topic of Section 5. We pro-

vide a ﬁnal assessment and future directions in Sec-

tion 6.

2 ITERATED PROTECTION

We propose iterated protection as a general frame-

work for design, analysis, and implementation of

software-protection techniques. This methodology

involves the iterated application and recombination

of various obfuscating transformations (or primitives)

over code, with the output of each successive transfor-

mation serving as input to the next. Via this strategy,

even simple and easy to implement primitives can be

cascaded to yield effective obfuscation.

As an example, the technique of oblivious hash-

ing (OH) (Chen et al., 2002; Jacob et al., 2007) can

serve as a tamper-resistance primitive. A single OH

transformation injects code to hash a program’s run-

time state (i.e., data and control ﬂow), thus ascertain-

ing execution integrity of the original code. Applying

OH again to the transformed program protects both

the original program and the ﬁrst OH round. In gen-

eral, each new OH round veriﬁes the integrity of both

the original program and all previous OH rounds.

To increase security further, arbitrary other prim-

itives can be combined and iterated with the OH

rounds. For ease of design and implementation, such

primitives can be quite simple; e.g., conversion of

variable references to pointer references, and even

source-to-source translation among different code di-

alects or languages. Via iteration, the interaction of

simple primitives can achieve the effect of far more

complex obfuscation operators.

An important general principle of the iteration ap-

proach is usage of transformations that appear to be

“weak” or not particularly obfuscating. It is not nec-

essary to eliminate all weaknesses from each trans-

formation operator; instead, we rely on the iterated,

combined effect of multiple operators to augment one

another’s security, essentially “ﬁlling in” both known

and unknown holes. To verify this, overall security

can be measured by quantitative metrics, as we dis-

cuss in detail later.

2.1 Related Areas

2.1.1 Cryptography

Iterated protection is related to the concept of rounds

in cryptographic schemes such as hash functions and

block ciphers (Menezes et al., 1996). Often chosen

heuristically to resist known and unknownattacks, the

number of rounds determines security and efﬁciency.

Each individual round may be easily breakable, and

a small number of iterated rounds can usually be at-

tacked successfully. However, once the number of

rounds becomes large enough, an algorithm may sur-

vive in practical use for many years, despite improved

cryptanalysis and more powerful computing systems.

In this spirit, any single obfuscation method can

be treated as a round of an obfuscation algorithm.

The individual techniques may be very simple and not

particularly secure when used alone, but allow us to

bootstrap to a desired security level when applied it-

eratively. Much like in cryptography, iteration of ob-

fuscation primitives can achieve the ”confusion” and

”diffusion” effects necessary for security evaluation

via quantitative metrics or heuristic arguments.

The analogy between round-based cryptography

and iterated obfuscation is not perfect, mainly be-

cause constructions like hash functions and block ci-

phers are highly specialized. In contrast, obfuscation

should be able to operate on universal programs, mak-

ing analysis and even heuristic arguments difﬁcult or

impossible. However, this also leads to more exten-

sive possibilities for obfuscation, especially for spe-

SECRYPT 2009 - International Conference on Security and Cryptography

360

ciﬁc purposes.

2.1.2 Complex Systems

Another related area is the ﬁeld of complex sys-

tems (Wolfram, 2002), which studies aggregate be-

havior of systems of states controlled by iterative

evolution rules. Such rules are essentially functions

whose inputs are sets of states at timet and whose out-

puts are individual states at time t + 1. A main theme

is the frequent emergence of complex, essentially un-

predictable behaviors over time in large systems of

simple agents governed by simple rules. Such emer-

gent behavior occurs in a variety of natural and ab-

stract scenarios, such as weather, vehicle trafﬁc, eco-

nomic markets, cellular automata and software sys-

tems.

A program to be protected can be considered as

a system of states (e.g., program statements, vari-

ables, and objects), with protection primitives serv-

ing as evolution rules. Emergent program structure

and behavior can arise as a result of applying sim-

ple primitives iteratively. In such a program, we may

observe characteristics that are not easily explained

in terms of either the original program or the simple

nature of each individual obfuscation round. For ex-

ample, if an obfuscation primitive performs code out-

lining (i.e., transforming code sections into separate

functions), call graphs of arbitrary shapes and proper-

ties can arise via iteration.

2.2 Security Modeling

Recent theoretical work (Barak et al., 2001; Gold-

wasser and Kalai, 2005) has put obfuscation on a

formal foundation, and certain schemes (Lynn et al.,

2004; Wee, 2005) have been shown secure in this

framework. Earlier results on oblivious RAMs (Gol-

dreich and Ostrovsky, 1996) involved a somewhat

different obfuscation model based on randomizing

memory-access patterns. However, such theoretical

work has not yielded practical obfuscation schemes

for typical real-life programs, where an important

goal is often simply to provide a lower bound on

breaking complexity (e.g., preventing hacks for a new

game from appearing for at least a couple of weeks

or months). In practice, “ad hoc” approaches, such

as code encryption and integrity checks, are currently

most popular.

For efﬁciency of implementation, commonly used

cryptography is often not proved strictly secure in

a formal model. Perceived security is based on de-

sign heuristics and long-term cryptanalysis by the re-

search community, not on security proofs. Certain al-

gorithms with provable security are known, but tend

to be impractical and seldom used. Even such algo-

rithms may fail when attacks violate their models or

complexity assumptions turn out to be unfounded.

As with popular block ciphers and hash functions,

a formal security proof may be neither known nor

necessary for our iterative methods to be useful in

practice. In addition to security metrics and heuris-

tic arguments, practical security could be determined

by releasing a system to be attacked by security re-

searchers, both academic and commercial. In some

sense, this could allow us to put obfuscation on a

cryptographic foundation, at least in terms of estab-

lishing a new area of cryptanalysis devoted to obfus-

cation.

Formal security analysis of iterated protection

may still be possible, at least for speciﬁc instances

of transformations. However, given the current real-

ity of software protection, a heuristic approach like

that in practical cryptography may yield more imme-

diate and useful results. In addition, we suggest that

real-life security may be reasonably assessed through

quantitative metrics (Anckaert et al., 2007b) com-

puted over transformed code.

3 PROTECTIVE

TRANSFORMATIONS

This section presents a number of transformations

(or protection operators) suitable for iteration and re-

combination. Practical application of iterated pro-

tection involves mainly selecting sequences of oper-

ators and their parameters, including the number of

iterations to be performed by each operator instance.

Some of the operators described here are simple trans-

formations derived from earlier work, while others

are geared speciﬁcally towards the iterated-protection

framework (and thus need not provide much protec-

tion or obfuscation when used standalone instead of

iteratively). We present these operators in the context

of our tool implementation, as described in Section 5.

The sections below group operators into several

categories, based on the main intended purpose of the

operators. Some functionality overlap exists among

various operators, but this grouping helps to classify

and organize different transformations.

3.1 Tamper-resistance Operators

These operators serve mainly to inject code that veri-

ﬁes runtime integrity of execution.

ITERATED TRANSFORMATIONS AND QUANTITATIVE METRICS FOR SOFTWARE PROTECTION

361

3.1.1 Oblivious Hashing

This operator is for injection of oblivious-hashing

(OH) code (Chen et al., 2002; Jacob et al., 2007), in-

cluding hash initialization, actual hashing, and hash

veriﬁcation. OH helps to provide tamper-resistance

by verifying the integrity of both computations and

control ﬂow. The basic idea is to maintain special

hash variables during runtime, updating these hashes

upon every state change (e.g., after variable assign-

ments and control-ﬂowtransfers). At chosen points in

the program, hashes may be veriﬁed explicitly (e.g.,

by comparing against precomputed values) or implic-

itly (e.g., by using a hash to decrypt crucial data or

code).

The implementation supports two main ap-

proaches to OH: (1) Hash pre-computation, which

computes and stores “correct” hashes for a set of user-

provided (or automatically generated) inputs that ex-

ercise all relevant code paths; and (2) the code-replica

method, which creates individualized copies of ba-

sic blocks (or other code sections) and compares the

hashes produced by independent execution of these

redundant code sections.

int x = 123;

if (GetUserInput() > 10)

{

x = x + 1;

}

else

{

printf("Hello\n");

}

Figure 1: Sample code before application of OH.

As an example, Figure 1 shows sample C++ code

before application of OH, and Figure 2 lists the same

code after injection of one OH round. Figure 3 shows

the code resulting from injection of two OH rounds.

Note that with two OH rounds, hash variables of the

second round are used to verify hash variables of the

ﬁrst round; i.e., the second OH round veriﬁes both the

original program variables and the ﬁrst OH round.

3.1.2 State-change Veriﬁcation

This tamper-checking operator injects code to verify

the operation of individual instructions and small sets

of instructions. For example, given an instruction that

increments a variable by N, the difference between

the new and old variable versions should be N; the

operator injects code to check this explicitly at run-

time. The intent is to introduce tamper-resistance at

INITIALIZE_HASH(hash1);

int x = 123;

UPDATE_HASH(hash1, x);

if (GetUserInput() > 10)

{

UPDATE_HASH(hash1, BRANCH_ID_1);

x = x + 1;

UPDATE_HASH(hash1, x);

}

else

{

UPDATE_HASH(hash1, BRANCH_ID_2);

printf("Hello\n");

}

VERIFY_HASH(hash1);

Figure 2: Sample code after one round of OH.

a low level without requiring speciﬁc inputs or code

replicas, as with OH.

3.2 Control-ﬂow Operators

These transformations serve mainly to increase the

complexity of control ﬂow in target code.

3.2.1 Opaque Predicates

This is a traditional operator that injects code to com-

pute predicates and corresponding branches that are

either never or always taken (Collberg et al., 1998b).

Alternately, both branch paths may be possible with

varying probabilities. While the tool user knows this

in advance, this property is difﬁcult for other parties

to deduce from the code. The effect is to add extra

edges to the control-ﬂow graphs (CFGs) of functions

while only modestly impacting performance and code

size.

3.2.2 Branch Transformations

This operator performs branch ﬂattening, which

moves one or more branch operations into a sin-

gle dispatch block that performs the actual tests and

jumps. The operator also implements branch diffu-

sion, which arranges for a single branch to be scat-

tered and merged with other branches. These opera-

tions are a form of control-ﬂow ﬂattening or obfusca-

tion (Wang, 2000).

SECRYPT 2009 - International Conference on Security and Cryptography

362

INITIALIZE_HASH(hash1);

INITIALIZE_HASH(hash2);

int x = 123;

UPDATE_HASH(hash1, x);

UPDATE_HASH(hash2, x);

UPDATE_HASH(hash2, hash1);

if (GetUserInput() > 10)

{

UPDATE_HASH(hash1, BRANCH_ID_1);

UPDATE_HASH(hash2, BRANCH_ID_1);

UPDATE_HASH(hash2, hash1);

x = x + 1;

UPDATE_HASH(hash1, x);

UPDATE_HASH(hash2, x);

UPDATE_HASH(hash2, hash1);

}

else

{

UPDATE_HASH(hash1, BRANCH_ID_2);

UPDATE_HASH(hash2, BRANCH_ID_2);

UPDATE_HASH(hash2, hash1);

printf("Hello\n");

}

VERIFY_HASH(hash1);

VERIFY_HASH(hash2);

Figure 3: Sample code after two rounds of OH.

3.3 Generic Obfuscation Operators

These operators are geared towards various obfuscat-

ing transformations that increase the difﬁculty of un-

derstanding code.

3.3.1 Pointer Conversion

This transformation converts variable references to

pointer references (by creating a new pointer for each

variable and modifying variable references to use this

pointer). This conversion can also transform func-

tion calls to be performed via function pointers. This

is mainly a means of obfuscation via pointer indi-

rection, and is effective especially when iterated and

combined with operators that inject new variables.

int x = GetTickCount();

printf("%d\n", x);

Figure 4: Sample code before application of pointer conver-

sion.

To illustrate, Figure 4 shows original sample C++

code, and Figure 5 lists the same code after one itera-

int * ptr_x_0;

int x;

ptr_x_0 = &x;

unsigned int tmp_151 =

(* (unsigned int (__stdcall *)())

&GetTickCount)();

int tmp_152 = (int) tmp_151;

*(int *) ptr_x_0 = tmp_152;

char * tmp_ptr_154 = (char *) "%d\n";

printf(tmp_ptr_154, * (int *) ptr_x_0);

Figure 5: Sample code after one iteration of pointer conver-

sion (tool output).

int * ptr_x_2;

int ** ptr_ptr_x_0_1;

int * ptr_x_0;

int x;

ptr_ptr_x_0_1 = &ptr_x_0;

ptr_x_2 = &x;

*(int **) ptr_ptr_x_0_1 = ptr_x_2;

unsigned int tmp_151 =

(* (unsigned int (__stdcall *)())

&GetTickCount)();

int tmp_152 = (int) tmp_151;

int * tmp_ptr_159 = * (int **)

ptr_ptr_x_0_1;

* (int *) tmp_ptr_159 = tmp_152;

char * tmp_ptr_154 = (char *) "%d\n";

int * tmp_ptr_160 = * (int **)

ptr_ptr_x_0_1;

printf(tmp_ptr_154, * (int *)

tmp_ptr_160);

Figure 6: Sample code after two iterations of pointer con-

version (tool output).

tion of pointer conversion. Figure 6 shows the effects

of two iterations. The latter two ﬁgures list the ac-

tual code output by the source-to-source transforma-

tion tool described in Section 5.

3.3.2 Dataﬂow Stopping

This dataﬂow-obfuscation operator creates a copy of

a variable at a random (or speciﬁed) point in a target

function, overwriting the original variable and replac-

ing all later references with the copy. This helps to

hinder dataﬂow analysis.

3.4 Individualization Operators

These operators help primarily to diversify code, cre-

ating individualized copies that prevent easy reuse or

retargeting of one particular break. This also allevi-

ITERATED TRANSFORMATIONS AND QUANTITATIVE METRICS FOR SOFTWARE PROTECTION

363

ates the software “monoculture” problem, where ma-

licious programs work more or less equally well on

most systems that run the same installed code.

3.4.1 Code Replication

This is a code-duplication operator that implements

various methods to create redundant, individualized

copies of code sections. This is useful for the code-

replica approach of OH, as well as for other obfusca-

tion and tamper-resistance operations.

3.4.2 Random Code Generation

This operator injects random expressions generated

from a simple grammar. After recursive generation of

a random parse tree, the operator generates code from

the tree in a compiler-like manner. The main purpose

is to hide existing program code in tightly integrated,

randomized chaff code generated by this operator.

3.4.3 Chaff Code Generation

This code-injection operator inserts random expres-

sions that corrupt and restore existing program vari-

ables, resulting in more thorough integration with tar-

get code. A variable is corrupted after an assignment

(def) and restored prior to each reference (use). Cor-

ruption and restoration may occur at randomly se-

lected locations between defs and uses.

Our current approach creates shadow variables to

save correct values of corrupted program variables for

restoration. An alternative is to corrupt variables re-

versibly and un-corrupt prior to uses. However, com-

plex, unpredictable control ﬂow complicates the task

of matching up the corrupt and restore operations, un-

less the corruption is simple and generic (e.g., always

the same operations). Nonetheless, either shadow

variables or simple corruption/un-corruptionmay suf-

ﬁce if additional operators obfuscate the code output

by this chaff generator.

For corruption, this operator uses assignments to

random expressions produced by the random-code-

generation operator. Via randomly built parse trees,

such expressions may reference existing program

variables, helping to integrate chaff code more se-

curely. These expressions may read uninitialized vari-

ables, leading to compiler warnings; however, this is

intentional and helps with obfuscation.

3.5 Non-Transformation Operators

These operators do not actually transform code, but

perform other useful functionality. In our tool design,

implementing various tasks is often best done simply

by representing them as operators inserted at the de-

sired positions in the transformation pipeline.

3.5.1 Source Generation

This operator generates source code from the tool in-

termediate representation (IR), but does not modify

the IR. This can be used to generate source code at

any point during processing, typically after all pro-

tection operators have ﬁnished. Since this operator

transforms the IR instructions into source, obfusca-

tion is naturally introduced into the output sources (in

the same spirit as “obfuscation” due to compiling C++

into x86 code, for example). However, while this op-

erator ﬁts naturally as a “protection operator” in our

implementation architecture (Section 5), it is not an

obfuscation primitive per se.

3.5.2 Metric Evaluation

This operator provides functionality to compute com-

plexity and security metrics over code. As with source

generation, metric evaluation ﬁts well into the archi-

tecture of our tool as just another operator, despite the

fact that no transformations occur for metric compu-

tation. We describe metrics in the next section.

3.6 Other Operators

The above listing is meant to provide only a sampling

of possible transformations. Depending on security

goals and application contexts, the possibilities for

other operators are virtually unlimited. We again em-

phasize that such operators may be nearly trivial and

very easily implemented; the combined effect of iter-

ating many such operators in different orders can cre-

ate far more complexity than typical individual trans-

formations.

4 COMPLEXITY EVALUATION

VIA METRICS

In general, complex systems do not lend themselves

to accurate prediction of future state. Such systems

must be run forward or allowed to evolve, and state

can be inspected at any time. Thus, instead of predic-

tive modeling, we use a posteriori metrics to assess

complexity and security in a quantitative manner. In

other words, we evaluate various functions over code

to quantify its properties in terms of complexity and

security.

As a starting point, we use three speciﬁc met-

rics investigated in a quantitative study of obfusca-

SECRYPT 2009 - International Conference on Security and Cryptography

364

tion (Anckaert et al., 2007b). These are used in a rel-

ative fashion; i.e., the metrics are computed over both

original and transformed code, and the differences be-

tween these metric values serve as indicators of how

much complexity was added by the transformations.

The actual metrics are as follows:

• Instruction Count. This is simply the number of

instructions in code, and serves as a very rough

indicator of code complexity.

• Cyclomatic number. This is equal to e − n + 2,

where e and n are the numbers of edges and nodes,

respectively, in a function’s control-ﬂow graph

(CFG). Intuitively, this indicates the number of

decision points where control ﬂow can take alter-

nate paths.

• Knot Count. This is the number of crossings in

a function’s CFG, assuming the CFG is drawn in

a speciﬁc manner; i.e., with nodes laid out lin-

early in order of address, and with edges all drawn

on one side of the node list. Heuristically, this

estimates the lack of typically expected structure

in the CFG, as well as potential complexity of

control-ﬂow transfers in the CFG.

We also use other metrics designed to capture var-

ious complexity properties of code. Some examples

are as follows:

• Number of Variables per Instruction. This is

computed simply as v/c, where v the number of

variables in a function’s symbol table, and c is the

number of instructions in the function’s interme-

diate representation. Intuitively, this measures the

potential complexity of data handling within the

function.

• Variable Indirection. This is measured as p/v,

where p is the number of pointers in a function,

and v is the total number of variables. This is de-

signed to capture the complexity of using pointers

to access data indirectly.

• Operational Indirection. This is computed as

r/R, where r is the number of pointer references

in a function, and R is the total number of variable

references; i.e., this is the fraction of references

performed through pointers.

• Code Homogeneity. This measures the unifor-

mity or “local indistinguishability” of instruction

sequences throughout functions. This could be

computed via histograms or frequency tables of

instructions in selected portions of code.

• Dataﬂow Complexity. This is a data-centric ana-

log of CFG-complexity metrics like cyclomatic

number and knot count. One means of measure-

ment is the complexity of a graph where each

node represents a variable, and a directed edge

between variables exists if the ﬁrst variable inﬂu-

ences the value of the other variable. If such a

graph is complete, all variables inﬂuence one an-

other. Thus, the metric may compute how close

the graph is to a complete graph, or how “random”

the graph appears to be.

5 IMPLEMENTATION AND

EXPERIMENTAL RESULTS

We have implemented a Phoenix-based (Microsoft

Corporation, 2008) toolkit that protects high-level

code by iterated transformations. As emphasized ear-

lier, this toolkit should be used to complement other

methods, not to create a standalone, all-inclusive ob-

fuscation solution. However, the source-based trans-

formations used by the tool may help to imple-

ment other techniques. Moreover, arbitrary protective

transformations in other tools can often be iterated (or

modiﬁed to make iteration possible and effective).

The current section describes the architecture and

usage of the tool, followed by experimental results on

several SPEC CPU2006 benchmarks.

5.1 Phoenix-based Implementation

The tool implementation relies on the following sys-

tems:

• Phoenix (Microsoft Corporation, 2008): A Mi-

crosoft compiler and analysis framework based on

a common intermediate representation (IR). We

use Phoenix mainly as a code-processing engine

that reads input code, both source and binary, and

passes this to our tools for processing.

• .NET and CLR (Common Language Runtime): A

next-generation Microsoft development and run-

time environment. The tool is implemented using

C# in the .NET framework.

Assuming the availability of some means to pro-

cess input source code, iterated obfuscation lends it-

self to straightforward implementation. Relying on

Phoenix as a code-processing engine, our basic tool

design is centered on the concept of protection op-

erators, which serve as primitives for iteration and

recombination. Such an operator is a class that im-

plements some protective transformation, such as OH

or conversion of variable references to pointer refer-

ences. Typically, each operator is derived from an ab-

stract base operator class, which encapsulates useful

basic functionality common to all operators.

ITERATED TRANSFORMATIONS AND QUANTITATIVE METRICS FOR SOFTWARE PROTECTION

365

Table 1: Metrics for one round of pointer conversion.

Benchmark Code Size Cyclomatic No. Knot Count Variable Density Indirection Performance

401.bzip2 1.035 1.000 1.000 1.513 1.663 1.572

429.mcf 1.063 1.000 1.000 1.646 1.325 1.055

458.sjeng 1.025 1.000 1.000 1.309 1.780 1.267

Table 2: Metrics for 5 rounds of pointer conversion.

Benchmark Code Size Cyclomatic No. Knot Count Variable Density Indirection Performance

401.bzip2 4.043 1.000 1.000 4.095 3.214 8.550

429.mcf 5.363 1.000 1.000 4.624 2.186 2.938

458.sjeng 3.114 1.000 1.000 2.731 3.703 5.379

At runtime, the tool applies a sequence of protec-

tion operators to each input function, as speciﬁed by

a user-created conﬁguration ﬁle. This text ﬁle con-

tains an ordered list of operators speciﬁed by name,

along with parameters for each operator (e.g., num-

ber of iterations to perform and names of functions to

obfuscate). The tool executes the operators in order,

as listed in the conﬁguration ﬁle. Alternately, a secret

key and additional user input may select a randomized

subset of operators, number of iterations, order of ap-

plication, and other parameters.

Our current tool works as a Phoenix compiler

backend (C2) plug-in, operating on input C++ source

code. A compiler (Visual C++) parses input source

code into a high-level intermediate form (CIL, or

C Intermediate Language). Phoenix converts CIL

into its own universal high-level intermediate repre-

sentation (HIR). The Phoenix backend then passes

each HIR function to our plug-in tool, which applies

the transformations and passes the function back to

Phoenix for further processing and eventual native-

code generation. As described above, the tool uses

conﬁguration ﬁles to steer its operation.

5.2 Experimental Results

The tables in this section present experimental re-

sults from running the Phoenix-based tool on se-

lected SPEC CPU2006 benchmarks (data compres-

sion, transportation scheduling, and chess). For each

benchmark, we computed metrics on the SPEC source

code both before and after sample sets of transforma-

tions; we then calculated the ratios of these values.

The results indicate how the metrics change as a result

of applying the transformations. A value of 1.0 indi-

cates that the corresponding metric was unaffected;

values greater than 1.0 indicate higher complexity.

Values less than 1.0 show lower complexity, which

may occur as a result of higher complexity in other

metrics. To estimate overall complexity, the metrics

should be interpreted in combination.

The metrics in the tables include code size (in

terms of the number of IR instructions), cyclomatic

number, knot count, variable density, and operational

indirection. These metrics are cumulative over all

benchmark source functions. In addition, the right-

most value in each table indicates the performance hit

due to the transformations; e.g., a value of 1.5 indi-

cates that the transformed code took 1.5 as much time

in our tests.

As a ﬁrst example, Table 1 shows the effect of

applying a single round of pointer conversion. This

increases the instruction count slightly, but does not

affect any metrics related to the CFG. The extra

pointer variables cause increases in the variable den-

sity and operation indirection. Finally, depending

on the benchmark, the performance hit is variable.

This shows that pointer conversion should sometimes

be applied selectively, avoiding performance-critical

variables such as loop indices in compression algo-

rithms (401.bzip2).

Table 2 shows the results of applying 5 rounds of

pointer conversion. Every round approximately dou-

bles the number of variables, including new pointers

to existing pointers from all earlier rounds. Thus, the

effect on some metrics is exponential.

Table 3 shows the results of injecting 10 opaque

predicates into each function. This also creates some

additional non-pointer variables, increasing variable

density but reducing operational indirection. The

cyclomatic number and knot count also increase,

since randomly injected opaque branches may strad-

dle other branches. This comes at relatively little ex-

pense in code size and especially performance.

Table 4 adds a round of OH to 10 rounds of opaque

predication. Since OH was applied to hash every rele-

vant variable, including performance-critical loop in-

dicates, the results are expensive in space and time.

This shows that OH may need to be applied selec-

tively.

Table 5 shows an example where iterating multi-

ple rounds of several transformations results in a pro-

SECRYPT 2009 - International Conference on Security and Cryptography

366

Table 3: Metrics for 10 rounds of opaque predication.

Benchmark Code Size Cyclomatic No. Knot Count Variable Density Indirection Performance

401.bzip2 1.191 1.580 1.899 1.277 0.712 1.036

429.mcf 1.397 2.137 4.241 1.525 0.672 1.019

458.sjeng 1.192 1.409 1.713 1.149 0.734 1.068

Table 4: Metrics for 10 rounds of opaque predication plus a round of OH.

Benchmark Code Size Cyclomatic No. Knot Count Variable Density Indirection Performance

401.bzip2 2.979 1.580 1.862 0.903 1.331 12.727

429.mcf 3.543 2.137 3.952 0.692 0.886 6.403

458.sjeng 3.337 1.409 1.707 0.719 1.587 12.360

Table 5: Metrics for 10 rounds of opaque predication, 2 rounds of OH, 3 rounds of random-code injection, and 3 rounds of

pointer conversion.

Benchmark Code Size Cyclomatic No. Knot Count Variable Density Indirection Performance

401.bzip2 48.788 3.285 2.258 0.696 3.919 68.164

429.mcf 61.310 4.147 5.234 0.427 2.347 41.966

458.sjeng 59.147 2.883 2.032 0.517 4.784 80.683

tective code envelope that dwarfs the code size of the

actual SPEC benchmarks. While this results in ﬁnal

code several dozen times larger and slower, such pro-

tection can still be used in areas where performance

is not critical (e.g., typical DRM and license checks).

We note that all our experiments involved apply-

ing operators over the entire code of the selected

benchmarks. Thus, effects on code size and per-

formance are sometimes signiﬁcant, especially when

transformations impact performance-sensitive pro-

gram elements. While the tables show such worst-

case scenarios, typical usage may involve selective

application of operators. For example, users may

specify performance-critical variables and code sec-

tions where operators should limit or omit processing.

Also, users may indicate which application sections

should be protected, though transformations should

be applied elsewhere as well (to avoid attracting at-

tention to security-sensitive parts). Additional com-

plexity and unpredictability are possible via a user-

speciﬁed secret key used to select operators and pa-

rameters. Via these and other means, a balance be-

tween performance, code size and metric complexity

may be achieved for different applications.

6 CONCLUSIONS AND FUTURE

WORK

This paper presented a framework for design and im-

plementation of software protection via iteration and

recombination of simple primitives. As in complex

systems, such a process can lead to cascading com-

plexity and emergent behavior via the interaction of

multiple transformations. The nature of individual

transformations, as well as number of iterations and

order of application, can make dramatic differences

in the ﬁnal output code. We demonstrated the use

of quantitative metrics to evaluate the complexity and

corresponding security of transformed code. Such an

approach may be useful as part of a comprehensive

software-protection system.

Future work will involve designing and imple-

menting additional protection operators, as well as

analyzing their security and beneﬁts. A main goal

is to position this work in a formal context, includ-

ing analysis that accurately estimates the practical re-

sistance of our methods against hacker attacks. We

also plan to investigate how iterated obfuscation can

help other approaches currently under development,

perhaps combining various standalone methods into a

more systematic, comprehensivesolution for software

protection.

REFERENCES

Anckaert, B., Jakubowski, M.H., Venkatesan, R., and Boss-

chere, K. D. (2007a). Run-time randomization to mit-

igate tampering. In 2nd International Workshop on

Security (IWSEC 2007), Nara, Japan.

Anckaert, B., Madou, M., De Sutter, B., De Bus, B.,

De Bosschere, K., and Preneel, B. (2007b). Program

obfuscation: a quantitative approach. In QoP ’07:

Proceedings of the 2007 ACM workshop on Quality of

protection, pages 15–20, New York, NY, USA. ACM.

ITERATED TRANSFORMATIONS AND QUANTITATIVE METRICS FOR SOFTWARE PROTECTION

367

Anckaert, B., Sutter, B. D., and Bosschere, K. D. (2004).

Software piracy prevention through diversity. In DRM

’04: Proceedings of the 4th ACM Workshop on Digi-

tal Rights Management, pages 63–71, New York, NY,

USA. ACM Press.

Aucsmith, D. (1996). Tamper resistant software: An im-

plementation. Information Hiding, Lecture Notes in

Computer Science, 1174:317–333.

Barak, B., Goldreich, O., Impagliazzo, R., Rudich, S., Sa-

hai, A., Vadhan, S., and Yang, K. (2001). On the

(im)possibility of obfuscating programs. In Electronic

Colloquium on Computational Complexity, volume

2139, pages 1–18.

Chen, Y., Venkatesan, R., Cary, M., Pang, R., Sinha, S.,

and Jakubowski, M. H. (2002). Oblivious hashing:

A stealthy software integrity veriﬁcation primitive.

In Information Hiding 2002, Noordwijkerhout, The

Netherlands.

Collberg, C., Thomborson, C., and Low, D. (1997). A tax-

onomy of obfuscating transformations. Technical Re-

port 148, Department of Computer Science, The Uni-

versity of Auckland, New Zealand.

Collberg, C., Thomborson, C., and Low, D. (1998a). Break-

ing abstractions and unstructuring data structures. In

International Conference on Computer Languages,

pages 28–38.

Collberg, C., Thomborson, C., and Low, D. (1998b). Man-

ufacturing cheap, resilient, and stealthy opaque con-

structs. In Principles of Programming Languages,

POPL’98, pages 184–196.

Dedi´c, N., Jakubowski, M. H., and Venkatesan, R. (2007).

A graph game model for software tamper protection.

In Proceedings of the 2007 Information Hiding Work-

shop.

El-khalil, R. and Keromytis, A. D. (2004). Hydan: Hid-

ing information in program binaries. In International

Conf. on Information and Communications Security

(ICICS).

Goldreich, O. and Ostrovsky, R. (1996). Software protec-

tion and simulation on oblivious RAMs. Journal of

the ACM, 43(3):431–473.

Goldwasser, S. and Kalai, Y. T. (2005). On the impossibil-

ity of obfuscation with auxiliary input. In FOCS ’05:

Proceedings of the 46th IEEE Symposium on Founda-

tions of Computer Science.

Horne, B., Matheson, L. R., Sheehan, C., and Tarjan, R. E.

(2001). Dynamic self-checking techniques for im-

proved tamper resistance. In Digital Rights Manage-

ment Workshop, pages 141–159.

Jacob, M., Jakubowski, M. H., and Venkatesan, R. (2007).

Towards integral binary execution: Implementing

oblivious hashing using overlapped instruction encod-

ings. In 2007 ACM Multimedia and Security Work-

shop, Dallas, TX.

Lynn, B., Prabhakaran, M., and Sahai, A. (2004). Positive

results and techniques for obfuscation. In Eurocrypt

’04.

Menezes, A. J., Vanstone, S. A., and Oorschot, P. C. V.

(1996). Handbook of Applied Cryptography. CRC

Press, Inc., Boca Raton, FL, USA.

Microsoft Corporation (2008). Phoenix compiler frame-

work.

Tan, G., Chen, Y., and Jakubowski, M. H. (2006). De-

layed and controlled failures in tamper-resistant soft-

ware. In Proceedings of the 2006 Information Hiding

Workshop.

Wang, C. (2000). A Security Architecture for Survivability

Mechanisms. PhD thesis, University of Virginia.

Wang, C., Hill, J., Knight, J., and Davidson, J. (2000). Soft-

ware tamper resistance: Obstructing static analysis of

programs. Technical Report CS-2000-12, University

of Virginia.

Wee, H. (2005). On obfuscating point functions. In STOC

’05: Proceedings of the Thirty-seventh Annual ACM

Symposium on Theory of Computing, pages 523–532,

New York, NY, USA. ACM Press.

Wolfram, S. (2002). A New Kind of Science. Wolfram Me-

dia Inc., Champaign, IL, USA.

SECRYPT 2009 - International Conference on Security and Cryptography

368