An Extended Case Study about Securing Smart Home Hubs through

N-version Programming

Igor Zavalyshyn, Nuno O. Duarte and Nuno Santos

INESC-ID / Instituto Superior T

ecnico, Universidade de Lisboa, Lisbon, Portugal

Keywords:

Smart Home, Internet of Things, Privacy and Security, Smart Hub, N-version Programming.

Abstract:

Given the proliferation of smart home devices and their intrinsic tendency to ofﬂoad data storage and pro-

cessing to cloud services, users’ privacy has never been more at stake than today. An obvious approach to

mitigate this issue would be to contain that data within users’ control, leveraging already existing smart hub

frameworks. However, moving the storage and computation indoors does not necessarily solve the problem

completely, as the pieces of software handling that data should also be trusted. In this paper, we present a

thorough study to assess whether N-version programming (NVP) is a valid approach in bootstrapping trust in

these data handling modules. Because there are considerable complexity differences among the modules that

process home environment data, our study addresses less complex modules that strictly follow exact speciﬁ-

cations, as well as complex and looser modules which although not following an exact speciﬁcation, compute

the same high level function. Our results shed light on this complexity and show that NVP can be a viable

option to securing these modules.

1 INTRODUCTION

In recent years, several smart home platforms have

become mainstream, such as Samsung SmartThings,

Apple HomeKit and Amazon Echo. However, the

threat of privacy breaches constitutes a major source

of concern for users. Device misconﬁguration is fre-

quent, which can lead to leakage of sensitive data,

e.g., camera feeds (Kelion, 2012), or unauthorized

home device monitoring (Forbes, 2013). Poor de-

sign and/or implementation of the software behind

these devices is also a major security issue (Com-

puterworld, 2016). SmartApps are often overprivi-

leged and can abuse permissions to leak sensitive user

data (Fernandes et al., 2016a).

A major difﬁculty in preventing unwanted sensor

data exﬁltration lies in the fact that many IoT applica-

tions, even if they were to execute entirely at the home

environment, require both permissions to access sen-

sor data (e.g., IP camera’s frames) and to access the

network. These permissions are required to allow the

application to read and process the data, and send the

results to the cloud. However, unless the application

is correctly speciﬁed and implemented, its behavior

can deviate from the expected, e.g., due to a bug, or an

exploit, in order to release raw data over to the cloud,

thus potentially causing a privacy breach.

Our goal is to investigate the adoption of N-

version programming (NVP) as part of the design of

smart hub platforms as a way to enhance security and

prevent leaking raw sensor data to the cloud. Building

on the shoulders of systems like FlowFence (Fernan-

des et al., 2016b) or Privacy Mediators (Davies et al.,

2016), we consider a smart hub where IoT applica-

tions run and process sensor data locally under the

constraint that applications cannot access such data

directly, but through a mediation interface consisting

of a set of trusted functions (TFs). TFs consist of

extensions to the base hub platform that are imple-

mented by third-party developers and that are deemed

to correctly implement common data handling oper-

ations (e.g., face recognition, anonymization func-

tions, etc.). The problem, however, is that if buggy

or even malicious TF implementations are installed

on the hub, serious security breaches can take place.

NVP can help alleviate this problem by leveraging N

different implementations of a single TF.

By using NVP, rather than depending on a single

implementation, each trusted function depends on N

different implementations (versions) that must con-

cur to produce the ﬁnal result. The smart hub feeds

sensor data as input to each of the N function ver-

sions, and determines the overall output result based

on a particular decision policy. For example, with to-

Zavalyshyn, I., Duarte, N. and Santos, N.

An Extended Case Study about Securing Smart Home Hubs through N-version Programming.

DOI: 10.5220/0006854001230134

In Proceedings of the 15th International Joint Conference on e-Business and Telecommunications (ICETE 2018) - Volume 2: SECRYPT, pages 123-134

ISBN: 978-989-758-319-3

123

tal agreement policy, all partial outputs must be equal

otherwise no output is released. A quorum policy re-

quires only a quorum of equal partial responses to be

reached. We envision different versions to be devel-

oped independently by an open community of devel-

opers. Insofar as the developers do not collude, N-

version trusted functions are no longer dependent on

the correctness of any speciﬁc function implementa-

tion as it is the case for existing smart hub solutions.

Although applying NVP to the smart hub architec-

ture is relatively straightforward, the degradation of

utility and performance can undermine the viability of

this technique. The utility is penalized if an N-version

module too often blocks any output to the applica-

tion due to result divergence reasons. Performance

of an N-version module tends to be bound by the

slowest sub-module involved in the output decision.

In our context, the impact to utility and performance

will greatly depend upon how sub-modules are imple-

mented. If sub-modules are developed from scratch,

we expect most of the negative effects to be caused

by implementation or performance bugs introduced

by the developers. On the other hand, if sub-modules

are built upon pre-existing code (e.g., libraries) such

effects may also stem from incoherent speciﬁcations.

The decision policy employed also plays a critical role

in determining the behavior of modules.

In this paper, we provide an extended case study

about the feasibility of NVP for securing smart home

hubs. It seeks to characterize the impact of NVP to

utility and performance of trusted functions. To this

end, we perform an in-depth study focusing primarily

on two main causes: software ﬂaws and speciﬁcation

incoherence. We built multiple test modules perform-

ing a variety of privacy-sensitive functions, such as

image blurring, voice scrambling, k-anonymization,

face recognition, and speech recognition, among oth-

ers. Then we tested them extensively in different N

settings and under different decision policies.

Our in-depth study reveals that NVP has consider-

able potential for practical application within a smart

home environment. In particular, we found that: (1)

for N-versions that implement the same algorithm and

follow the algorithm speciﬁcation, it is possible to

provide an N-module offering high utility as long as

the number of software ﬂaws is residual, (2) for N-

versions that do not follow the same algorithm but

perform the same task, we observe that although mod-

ule utility can be negatively affected by output diver-

gence, it can be increased leveraging decision poli-

cies tailored to the problem domain space, and (3) N-

version trusted function module performance is typ-

ically bound by its slowest version, a condition that

can be mitigated by leveraging versions redundancy.

TellWe at he r

Home App

Home Hub

Weather

Web Service

Sensors

Actuators

Hub

Admin

Home Environment

Hub

Proxy

Figure 1: Appiﬁed privacy-preserving home hub.

Next, we provide a more extensive overview of

our motivation, approach, and goals. In Section 3,

we introduce a smart hub architecture based on NVP.

Then, we present the main contributions of this work:

a comprehensive study of the impact of NVP on TF

utility (Sections 4 and 5) and performance (Section 6).

2 OVERVIEW

2.1 Privacy-preserving Home Hubs

Figure 1 represents a privacy-preserving home hub

platform (Davies et al., 2016; Fernandes et al., 2016b)

in which security-sensitive sensor data can be aggre-

gated and processed according to the privacy prefer-

ences of the user. The home hub is designed as an “ap-

piﬁed” platform that allows for third-party developers

to write home apps which users install on the home

hub. In the ﬁgure, a home app called TellWeather

waits for an audio command (e.g., “Tell weather in

LA”), issues an HTTP request to a weather service,

converts the response into audio signal, and forwards

it to a speaker. The home hub provides an admin-

istration interface through which the homeowner can

access the hub directly or tunneled through a proxy

and manage it, e.g., install or uninstall apps, register

devices, and set up privacy policies.

The hub platform provides app developers with

API functions to interact with the devices. This API

allows a home app to perform numerous operations,

such as collecting data from sensor devices (e.g., au-

dio from microphones, images from cameras), send-

ing data to actuators (e.g., audio signal to speak-

ers, or video streams to displays), accessing Internet

services, and performing various data computations

(e.g., speech or face recognition, or data anonymiza-

tion). The operations that a home app is allowed to

execute are controlled by a security policy: the home

app must explicitly request the hub administrator for

permissions to perform certain operations, in particu-

lar access to device APIs.

SECRYPT 2018 - International Conference on Security and Cryptography

124

2.2 Trusted Functions: Goods and Ills

To prevent unlimited access to sensor devices,

privacy-preserving home hubs allow their APIs to be

extended with trusted functions (TFs) aimed to im-

plement high-level operations that mediate access be-

tween the application and the raw data. In some cases

a TF interposes between the application and a data

source, e.g., a camera device. The motivation for such

a TF can be, for instance, to provide a face recogni-

tion service over raw image data collected from the

camera without revealing the raw data to client home

apps. TFs can also mediate access to data sinks, for

example to encrypt or anonymize sensitive data be-

fore sending it to a remote server. Some home hub

solutions support TFs at data sources (Davies et al.,

2016), others at data sinks (Mortier et al., 2016), and

others in both (Fernandes et al., 2016b). Once in-

stalled into the hub, trusted functions can be invoked

by local home apps running on the hub. TFs must be

developed by third-parties and installed by the hub ad-

ministrator. TF developers are fully trusted to imple-

ment them correctly. As long as TFs are correctly im-

plemented, they constitute an effective approach to se-

curely processing sensitive data. However, malicious

TF implementations can perform serious attacks:

A1. Incomplete results: during processing, a mali-

cious TF could intentionally omit parts of the results

in an effort to disturb users’ actions, e.g., hide the part

“and B” when recognizing the user voice command

“record game A and B”.

A2. Incorrect results: similarly to the previous at-

tack, a malicious TF could introduce incorrect re-

sults or replace correct with incorrect results, in order

to trick the user into performing harmful operations,

e.g., replace the name of the person the user wants

to call with a premium number, when recognizing the

user call request voice command.

A3. Data inferences: in collusion with a malicious

application, a malicious TF could not only perform

the operation it intended but also make inferences on

the raw data and disclose it to the application, e.g.,

identify the people in the room in addition to recog-

nizing the user voice command.

A4. Raw data leakage: the most devastating attack

is the one where a malicious TF colludes with a mali-

cious application and leaks raw data, e.g., send a raw

camera frame as face recognition output.

2.3 Leveraging N-Version Programming

While the effects of attacks A1 and A2 can also stem

from naive implementations, which are difﬁcult to

distinguish, we argue that attacks A3 and A4 are the

sole product of lack of platform control over TF out-

puts. As a result we seek to understand whether re-

lying on multiple TF implementations can mitigate

these attacks. In particular, we aim to investigate the

feasibility of N-version programming (NVP) to pre-

vent malicious TF implementations from exﬁltrating

sensitive data outside of the home premises without

the user’s knowledge or consent.

TF implementations are expected to follow a TF

speciﬁcation. We assume that the TF speciﬁcation is

publicly available among home app developers and

home hub users. As for a TF implementation, the TF

binary needs to be publicly released, possibly even

after being properly obfuscated. An NVP-based TF

system must be able to detect the deviations in the

functions outputs and react accordingly.

The N-version decision algorithm used to merge

the outputs of multiple trusted function implemen-

tations must be efﬁcient in terms of execution time

and utility. Too strict algorithm will render the func-

tion useless, while the relaxed one might alter the se-

curity guarantees. Overall, the overhead introduced

by employing N-version technique should not be sig-

niﬁcantly higher compared with a single version of

trusted function execution.

Our main adversary consists of the potentially

buggy or malicious code of a trusted function imple-

mentation. This implementation may try to output the

sensitive user data as is without processing it but such

a result will not be consistent across the outputs of all

other implementations of the function, and will be ig-

nored by the decision algorithm. We assume that var-

ious implementations of the same trusted function do

not collude and are developed independently. We also

assume that the software and hardware platform of the

hub where the trusted function executes is secure, and

that home apps and TFs execute in sandboxed envi-

ronments. It is not our primary goal to secure against

side-channel attacks. The capabilities of the attacker

consist only of the ability to write arbitrary code as

part of trusted function implementations.

3 TRUSTED FUNCTION

MODULES

In this section, we present a general security archi-

tecture for smart home hub platforms based on N-

version programming. In this architecture, home hub

extensions consist of N-version trusted function mod-

ules (henceforth called “modules”). A module pro-

vides the functionality of a single TF implemented

internally in a N-version fashion, with each of the N

An Extended Case Study about Securing Smart Home Hubs through N-version Programming

125

Unit 2

Decision

Block

Unit 3

Unit 1

Input Preprocessor

3-Version Units

Input

Arguments

Decision

Policy

Output

Results

Figure 2: N-version trusted function module (with N=3).

versions being provided by independent developers.

Each of these versions, called units, are required to

implement the same trusted function speciﬁcation.

Whenever an application issues a request, the in-

put parameters are forwarded to all N units and their

outputs are compared with each other before a ﬁnal

output is returned back to the application. Deciding

whether or not a ﬁnal output result is provided and

what that output result will be depends on a decision

policy deﬁned by conﬁguration. In a particular pol-

icy, all N units must produce the same result, which is

then returned as output result, otherwise the applica-

tion is informed that no result was generated. Thus, if

any single unit implementation produces a malicious

output, this output will differ from the remaining N-1

units (assuming no collusion) causing the ﬁnal result

to be suppressed, preventing the malicious unit from

propagating its effects to the application.

Figure 2 shows the internals of a module imple-

mented by 3 units. The input arguments are passed

by the client application and the output results are

returned to the application. The input preprocessor

feeds the input arguments to each unit and the deci-

sion block implements a decision algorithm according

to the provided decision policy. The decision policy is

a conﬁguration parameter decided by the hub admin-

istrator. Each unit is implemented by a program that

runs in an independent sandbox. The input processor

and the decision block logic must belong to the hub

platform, which must also be responsible for setting

up the units’ sandboxes and the datapaths represented

by arrows in Figure 2.

3.1 Module Lifecycle

The lifecycle of each module comprises four stages.

In the speciﬁcation stage, a cooperation between the

platform and community developers results in the

production and public release of module TF speciﬁ-

cations. The decision on the creation of new modules

is based on the community user needs. A speciﬁcation

features either the algorithm or high level function to

be implemented, the input and output data formats, as

well as a group of custom decision policies.

Once the speciﬁcation is out, the module enters

the development stage in which third-party develop-

ers independently implement their TF versions. This

approach is similar to existing community-based soft-

ware projects, e.g. Debian, where the members deﬁne

task requirements and control the development pro-

cess. Each TF version must be packaged and signed

by the developer, and uploaded to the platform repos-

itory. By using a key that is certiﬁed by a certiﬁcate

authority, it will be possible to assess the identity of

the developer and prevent Sybil attacks, i.e., the same

developer releasing and signing multiple malicious

versions of the module’s TF. Once authenticated the

TF version is packaged in the TF module and subse-

quently either made available for users to install in

case of a new module or automatically pushed for

subsequent platform module update.

The next stage is installation of the module on the

home hub. Users can download the latest version of

the module from the repository and instantiate it lo-

cally on the hub. Default module settings work out

of the box, however experienced users may add or

remove module units, and redeﬁne the decision pol-

icy according to their needs. Once the module is in-

stalled, the module enters the execution stage in which

applications running on the hub are allowed to issue

requests to the module. Note that modules may be-

come temporarily out of service in order to perform

software updates (e.g., installing a new unit or updat-

ing an existing one) and may also be permanently re-

moved from the hub.

3.2 Detection of Unit Result Divergence

The decision taking process is at the core of what

makes N-version programming effective at counter-

ing adversarial units. In the perfect scenario, each

unit is assumed to execute one of two possible ver-

sions: benign or adversarial. A version is benign if it

consists of a ﬂawless implementation of the module’s

trusted function speciﬁcation. A version is adversarial

if it deviates from the intended speciﬁcation in order

to tamper with or leak sensitive data. Thus, if de-

viations exist between unit outputs, then at least one

adversarial version is present. Since different security

properties can be attained depending on the number of

units in agreement, we deﬁne three decision policies

providing three agreement conditions:

Total agreement (TA) policy: This policy offers the

strongest security guarantees. All N units must agree

on the same output result in order for an output to be

returned. If this condition holds, the resulting value

is returned, otherwise an error is yielded. Thus, 1 be-

nign version only is required to exist in order to sup-

SECRYPT 2018 - International Conference on Security and Cryptography

126

press the return of a corrupted result. In fact, for an

attacker to be successful, all N versions must be both

adversarial and collude in producing the same output.

Quorum agreement (QA) policy: Only a quorum

Q = bN/2c + 1 units (i.e., a majority) needs to reach

consensus on a common return value. If Q is found,

the module returns the agreed upon value, otherwise it

reports failure. The QA policy is weaker than the TA

policy because Q > 1 benign units need to be present

to thwart an attack. Furthermore, a successful attack

requires Q < N colluding adversarial units.

Multiplex (Mux

) policy: This policy is the weak-

est of all and can no longer be considered to provide

N-versioning security beneﬁts. Under a Mux

policy

the decision block simply selects one unit output to

be fed to the module output. The unit selection is pa-

rameterized by a number 1 < i < N. This policy is

useful mostly for debugging purposes during the test-

ing stage of the module’s lifecycle.

Ideally, the divergence between unit outputs in a

module should occur due to the rational behavior of a

malicious developer who intentionally had not imple-

mented some version according to the trusted function

speciﬁcation of the module. However, other causes

may lead to undesired output divergence that may

cause undesired side-effects, namely: software ﬂaws,

and module incoherence.

3.3 Nondeterministic Inputs

One cause of unit divergence is operational and oc-

curs whenever a speciﬁc trusted function depends on

nondeterministic inputs, e.g., a random number, the

system time or date, etc. If different units obtain

different readings for the same intended input value,

units’ computations will likely return different results

which may lead to failure in reaching a total or quo-

rum agreement conditions and harm module’s utility.

To avoid this problem, all nondeterministic inputs

must be provided by the preprocessor. Sandboxes

must prevent units from issuing nondeterministic sys-

tem calls. If the version code depends on such calls,

the input preprocessor can execute those upon request

and pass the same value to all units. A request is de-

clared by overriding the init method of the class of

input parameters. The init method of this class is in-

voked by the input preprocessor and can be inherited

by a subclass with the purpose of prefetching non-

determistic values. To prefetch an input value in a

module, the trusted function speciﬁcation only needs

to assign this subclass to the type of the respective

input argument. By constraining all units to receive

the same input, this approach prevents the aforemen-

tioned operational causes for divergence.

Description

Image Blurring Module Specification

Pseudocode

To blur an image, compute the

average of the RGB channels of the

pixels surrounding each of the

image's pixels. The pixel area

affected by the blurring process

depends on the input vicinity factor.

For example, for factor 1 the average

includes the pixel itself and the 8

immediately surrounding pixels.

Factor 1 Factor 2 Factor 3

Interface

Testing

Func BLUR(imgname, factor)

imageIn = inputImage(imgname)

Foreach px In imageIn

pxs = getNear(px, factor)

rgb = RGBAvg(pxNeigbors)

setPixel(imageOut, rgb)

End For

outputImage(imageOut)

End Func

Input arguments:

imageIn: ArrayList<Integer[]>

factor: Integer

Output results:

imageOut: ArrayList<Integer[]>

Download BlurTest.jar

To test the blur implementation My:

java –jar BlurTest.jar –fn My

Figure 3: Image blurring module speciﬁcation.

3.4 Software Flaws

A second unintended cause for internal result discrep-

ancy is accidental in nature, and is caused by ﬂaws

in versions’ software that cause the actual unit exe-

cution to deviate from the expected value as deﬁned

in the trusted function speciﬁcation. In addition to

harming module utility, ﬂaws may negatively affect

the correctness of the module. As shown in past stud-

ies, programmers tend to commit the same ﬂaws in

the same code regions, which may end up resulting

in the generation of incorrect results that can eventu-

ally appear at the module’s output depending on how

many units have reached consensus on the same in-

correct value and on the decision policy in place.

To reduce these negative effects, we deﬁne a for-

mat for trusted function speciﬁcations that aims to be

both unambiguous and human readable so as to re-

duce the change of software ﬂaws. Figure 3 depicts a

simpliﬁed version of the speciﬁcation for an image

blurring trusted function. The speciﬁcation format

comprises: a description of the intended functional-

ity, an algorithm representation in the form of pseu-

docode, the interface of the module indicating the in-

put and output parameters and respective types, and a

testing procedure which may include speciﬁc testing

code. While the description and the algorithm repre-

sentation aim to clarify misunderstandings about the

speciﬁcation, the testing parts aim to help debugging.

Since the speciﬁcation is public, the source code of

the testing classes and types of input arguments / out-

put results must be provided.

3.5 Module Incoherence

Module incoherence occurs if two or more units in-

side a module implement different trusted function

An Extended Case Study about Securing Smart Home Hubs through N-version Programming

127

algorithms. For example, a face recognition module

may be based on software that implements face recog-

nition using different techniques. As a result, one ver-

sion may be able to identify a face that a second ver-

sion cannot. Speech recognition is another example

in which different algorithms may yield very diverse

outputs, for instance being able to detect some words

in a whole sentence, but not others.

A natural question that arises when the module is

incoherent is whether it can be used for countering

malicious version implementations. In fact, even as-

suming the absence of software ﬂaws, it will be dif-

ﬁcult to determine whether the divergence of results

is due to a malicious version or due to semantic dif-

ferences between versions themselves. Faced by this

challenge, we take two decisions.

First, we require the modules must be explicitly

speciﬁed as strict or loose. A strict module is one

in which all versions must implement the same al-

gorithm. For this reason, all versions are expected

to strictly implement the algorithm described by the

trusted function speciﬁcation. In contrast, a module

is loose if the implemented algorithm does not sat-

isfy the speciﬁcation. Version developers must clearly

indicate the type of a given version. Otherwise, in-

stalling a loose version on a strict module will cause

internal unit output divergence thereby severely de-

grading the module utility.

Second, to improve the utility of loose modules,

we allow for replacing the standard decision algo-

rithm of the decision block by a customized deci-

sion algorithm (which could be provided along with

the trusted function speciﬁcation). Since the stan-

dard decision algorithm simply tests the equality of

units’ outputs, algorithms that generate slightly dif-

ferent outputs will immediately fail the test which will

considerably impair the module utility. On the other

hand, a customized decision algorithm may perform

domain-speciﬁc tests that may overcome small differ-

ences between outputs. The side-effect, however, is

that by relaxing the equality requirement, an adver-

sary may attempt to exploit that degree of freedom,

e.g., to encode sensitive data to a remote party. Thus,

by deciding whether or not to adopt a customized de-

cision algorithm, an end-user can choose between the

modules’ utility and security.

Until now, we have presented an architecture for

home hub based on N-version trusted function mod-

ules. We have also seen that the utility and security of

each module can be affected by other factors, namely

software ﬂaws and module incoherence. The next

sections focus on studying the impact of both these

factors and on performance evaluation.

4 IMPACT OF SOFTWARE

FLAWS

In this section we study the impact of version software

ﬂaws on the overall behavior of modules. We speciﬁ-

cally focus on strict modules performance. Since they

implement the same algorithm, it allows us to concen-

trate on discrepancies due to software faults. For our

study, we implemented several test strict modules that

feature common privacy-preserving algorithms for a

smart home sensor data.

4.1 Experimental Methodology

We picked ﬁve different algorithms, and gathered

three different implementations for each of them, with

the help of ﬁve different volunteer developers. The

versions for each algorithm were developed indepen-

dently by different developers. For each developer,

we provided a complete speciﬁcation and a testing

tool. The code was to be written in Java. Given the

simplicity of the algorithms involved, we requested

developers to submit their implementations before

and after using the testing tool for debugging. While

the implementations after testing recorded no bugs,

the implementations before testing feature some bugs.

Considering the purpose of this study, here we focus

on the pre-testing implementations. The algorithms

to implement were as follows:

Image Blurring Algorithm: An image blurrer can be

used to protect users’ privacy, namely by anonymiz-

ing the video data gathered by cameras (see Figure 3).

We ran a simple battery test consisting of the blurring

of 10 different pictures over vicinity factors of 1, 2

and 3. Afterwards, we made a byte-wise comparison

between the expected result and the implementation

produced ﬁles, in order to assess the implementations’

correctness. In total, we executed 30 tests.

Voice Scrambling Algorithm: A voice scrambler

can be useful in mitigating attempts to identify the

speaker and other nearby individuals. This algorithm

receives an audio clip as input, and after applying

pitch shifting and distortion, it outputs a modiﬁed au-

dio clip where the voice sounds robotized. With re-

spect to testing, we exercised each implementation

with 30 different audio clips.

Data Encryption Algorithm: RC4 is a stream cipher

algorithm that can be used in encrypting certain home

environment data before transferring it to a certain

recipient. The ﬁnal testing tool features 153K tests

comprising tuples hmessage, key,cyphertexti, where

both message and key were randomly generated with

increasingly longer sizes.

SECRYPT 2018 - International Conference on Security and Cryptography

128

Table 1: Evaluation results of strict modules under total agreement (TA) and quorum agreement (QA) decision policies. For

each decision policy, the resulting output can be: correct (3), incorrect (7), or silent (–).

Module Function

Image Blurring Voice Scrambling Data Encryption Data Hashing K-Anonymization

V1 V2 V3 V1 V2 V3 V1 V2 V3 V1 V2 V3 V1 V2 V3

Single Tests Passed

153K

41K

210

Number of Bugs 0 0 0 0 4 4 0 1 0 0 0 1 1 0 0

N-mode Tests TA: 3, QA: 3 TA: –, QA: 7 TA: –, QA: 3 TA: –, QA: 3 TA: –, QA: 3

Data Hashing Algorithm: MD5 is a well-known

hashing function useful in assessing the integrity of

data. The ﬁnal testing tool featured 41K tests. These

tests consist of tuples hmessage,hashi, where every

message was randomly generated with increasingly

longer sizes.

K-anonymity Algorithm: Lastly, Mondrian is a top-

down greedy algorithm for strict multidimensional

partitioning, with the goal of achieving K-anonymity.

Such an algorithm could be used in anonymizing

home environment data (e.g., power consumption

readings), so that the user could, for example, supply

that information to an interested third party. The test-

ing tool features 210 tests. These tests comprise tu-

ples hdataTuples, k, qids,resulti, where dataTuples

are statically grouped in 5 ﬁles each comprising 1

million entries, and k and qids are automatically gen-

erated and increased anonymity factors and quasi-

identiﬁers respectively.

4.2 Main Findings

Table 1 summarizes the N-version study results,

where V1, V2 and V3 correspond to three different

version implementations. We highlight three main

ﬁndings. First, under the TA decision policy, only

the image blurring module yields an output. This is

possible because all unit implementations passed the

30 tests. Since they produced the same result, the TA

policy concurs on outputting the same result. This

ﬁnding is consistent with the lack of bugs found in

the code which could compromise the resulting out-

put. For the remaining modules, however, faults have

caused some versions to fail individual tests thus un-

dermining the overall result.

Second, under the more relaxed QA decision pol-

icy, we observe that four modules can successfully

reach a consensus and produce an output: the image

blurring module—whose individual implementations

output consistent results—and three additional mod-

ules in which two out of three implementations gen-

erate the same result, thereby allowing a consensus

to be reached. In these cases, functional divergence

occurred due to the existence of bugs. In the data

encryption module, we identiﬁed a bug in V2 that

consisted of a wrong value swap between two vari-

ables. Regarding the data hashing module, we de-

tected one bug in V3 which was later found to be a

variable poorly initialized. In the K-anonymization

module, V1 contained a coding error stemming from

a wrong pseudocode interpretation of the scope of a

variable. Speciﬁcally, a global variable used by sev-

eral functions was supposed to be initialized in a cer-

tain function, but V1’s developer declared the vari-

able as local to that function, leading to issues in the

other functions handling it. Lastly, in one case, the

voice scrambling module produced an incorrect re-

sponse under QA. This happened because two ver-

sions, namely V2 and V3 experienced the same 4 bugs

each. More speciﬁcally, the bugs originated from the

wrong interpretation of a loop upper bound.

Given these numbers, we conclude that when ver-

sions yield different results, NVP actually detects

(except for side-channels) implementation deviations

created with rational intent. The exception being

when the majority of the versions output the same er-

roneous result. Accidental mistakes can cause a re-

duction in the utility of the module. If a very conser-

vative decision policy is employed (TA) this loss will

be considerable (up to 80%). On the other hand, un-

der QA, the utility drop is smaller, as four out of ﬁve

modules can still produce the same result.

5 IMPACT OF MODULE

INCOHERENCE

This section studies the impact of module incoherence

on the modules’ overall behavior and utility. For our

study, we implement two test loose modules which

do not strictly follow the same speciﬁcation, yet com-

pute the same high level function: face recognition

and speech recognition.

The module implementing the face recognition

function uses three existing open source face recog-

nition libraries as building blocks: OpenCV (with

Face module), OpenBR, and OpenFace. The libraries

code remained unchanged but was wrapped around

An Extended Case Study about Securing Smart Home Hubs through N-version Programming

129

Table 2: Success rates of face recognition (Recogn) measured in correct (3), incorrect(7) and no recognition (No Recogn).

OpenCV OpenBR OpenFace

Decision Policy

MS Face API

Total Agree. OpenFace ∩ OpenBR Quorum Agree.

Recogn

3 156 (≈62%) 219 (≈88%) 228 (≈91%) 137 (≈55%) 202 (≈81%) 220 (88%) 249 (≈99%)

7 1 (≈1%) 1 (≈1%) 0 (0%) 0 (0%) 0 (0%) 1 (1%) 0 (0%)

No Recogn 93 (≈37%) 30 (≈11%) 22 (≈9%) 113 (≈45%) 48 (≈19%) 29 (11%) 1 (≈1%)

Total 250 (100%) 250 (100%) 250 (100%) 250 (100%) 250 (100%) 250 (100%) 250 (100%)

the N-version module’s API. Based on these libraries,

we deﬁned several module conﬁgurations. We tested

the effectiveness of the face recognition module when

trained with a training set of 2250 images and a test-

ing set of 250 images. In total, we trained the recogni-

tion of 250 different people with 9 pictures each. All

these images where extracted from the UFI dataset.

Microsoft Face API was used as state of the art face

recognition implementation. It was trained and tested

using the same dataset.

The speech recognition module uses three inde-

pendent speech recognition libraries—Sphinx, Julius,

and Kaldi—and was also tested in different module

settings. Every conﬁguration was exercised with 130

sentence tests from CMU’s AN4 speech recognition

dataset. As with face recognition libraries, we devel-

oped an API wrapper for all the speech recognition li-

braries. We use Google Speech API as state of the art

speech recognition system which requires no training.

5.1 Face Recognition Module Study

Table 2 presents the success rate of our tests for the

three face recognition functions evaluated individu-

ally, and the representative three module conﬁgura-

tions, namely total agreement, quorum agreement and

an intersection of the two functions that showed the

best recognition results.

The ﬁrst important observation is that the efﬁcacy

of the open source libraries is smaller than Microsoft

Face’s, which reaches 99% success rate. OpenCV

stands out as the least effective library (only 62%

success rate). The difference between OpenCV and

OpenBR stems from the algorithms they implement,

namely Eigenfaces and 4SF respectively. The small

difference between OpenBR and OpenFace comes

as a surprise, given that OpenFace implementation

uses neural networks for face recognition, theoreti-

cally more effective than OpenBR’s 4SF.

Table 2 then shows the success rate for three face

recognition module conﬁgurations. Conﬁguration to-

tal agreement consists of a module that employs all

three libraries—OpenCV, OpenBR, and OpenFace—

and yields “success” if and only if all libraries iden-

tify the same individual. Here we can see that the face

recognition accuracy drops considerably to only 55%,

which is explained by the signiﬁcant differences that

exist between the algorithms implemented by each li-

brary. In a second conﬁguration, we used only two

libraries—OpenFace and OpenBR—and in this case

the success rate increased substantially to 81%. The

best results were achieved when we used three li-

braries, but with a merging policy function that out-

puts success every time at least two libraries produce

the same response. In this conﬁguration (quorum), the

success rate reaches 88%, which represents a reduc-

tion of only 3% when compared to OpenFace alone.

Considering these results, we argue that the best

mechanism in merging face recognition results in an

N-version setting is to gather the majority of the re-

sults given by a module’s units. Note, however, that

result intersection is not always a sound solution. If

we consider the case where a module has fewer hon-

est units than intentionally ineffective ones, e.g., units

that produce wrong results with the goal of prevent-

ing face recognition, then the success and consequent

effectiveness of the module is compromised. In order

to address this issue, we believe a reputation based

approach for unit selection could be used.

5.2 Speech Recognition Module Study

Although, word error rate (WER) is the metric gener-

ally used to measure the accuracy of speech recogni-

tion, it cannot be applied to the situation where there

are multiple recognition results. Moreover, in a smart

home scenario, voice commands can still be inter-

preted correctly even if some words are not recog-

nized or come in a wrong order. We, therefore, opted

for a sentence match and word intersection merg-

ing functions as the main performance parameters for

speech recognition modules.

Table 4 shows the results for each library evalu-

ated based on two criteria: sentence match and word

intersection. Sentence matching consists of the ex-

act match between the entire original sentence and the

recognized result returned by each library. Word in-

tersection counts the number of words that exist in the

original sentence and are also present in the recogni-

tion results returned by the library (902 is the total

SECRYPT 2018 - International Conference on Security and Cryptography

130

Table 3: N-version speech recognition conﬁdence.

Decision Policy Total Agreement Sphinx ∩ Julius Sphinx ∩ Kaldi Julius ∩ Kaldi Quorum Agreement

Sentence Match

130

(≈10%)

130

(≈10%)

130

(≈15%)

130

(≈26%)

130

(≈31%)

Word Intersection

455

902

(≈50%)

455

902

(≈50%)

554

902

(≈61%)

557

902

(≈62%)

666

902

(≈74%)

Word Union

753

902

(≈83%)

706

902

(≈78%)

745

902

(≈83%)

735

902

(≈81%)

753

902

(≈83%)

Table 4: Speech recognition conﬁdence.

Implementation Sphinx Julius Kaldi Google

Sentence Match

130

(≈15%)

130

(≈28%)

130

(≈68%)

103

130

(≈79%)

Word Intersection

578

902

(≈64%)

570

902

(≈63%)

719

902

(≈80%)

722

902

(≈80%)

number of words present in all sentences). Table 4

shows that across both these dimensions, Sphinx and

Julius clearly fall behind Kaldi, which offers the high-

est success rates (68% sentence match and 80% word

intersection). At the same time, Kaldi’ numbers are

not far off Google Speech’s.

Table 3 lists multiple module conﬁgurations that

we used to produce speech recognition functions

based on these libraries. Each entry of the table cor-

responds to a speciﬁc module conﬁguration. The

columns indicate which libraries constitute the units

of the module, and the lines indicate the merging

function that was used to produce a successful speech

recognition output. We adopted three merging ap-

proaches: sentence match, which is similar to the cri-

teria used for the individual solutions and issues an

output if all units identiﬁed the same sentence; word

intersection, which returns only the words that all

units identiﬁed successfully; and union, which returns

the union of all words identiﬁed by all units.

As shown in Table 3, sentence match tends to yield

very poor results, displaying a success rate between

10% and 26% between any pair of units. Even when

we consider quorum agreement, i.e., when at least two

out of the three units return the same result, the suc-

cess rate only reaches 31%, which is very far from

Kaldi’s 68%. Still, given that most speech controlled

devices, e.g., Amazon Echo, use a grammar based ap-

proach, where they ask users to repeat words when

they cannot recognize some, sentence match is an un-

reasonable speech recognition metric.

With word intersection, the results improve sig-

niﬁcantly up to 62% between any pair of units, and

up to 74% when we consider the quorum for the re-

sults produced among them. Because of the intersec-

tive nature of the merging functions sentence match

and word intersection, the adoption of an increasing

number of units does not necessarily yield better re-

sults. This happens because the overall success rate is

always bound to the performance of the worst units.

This can be seen in the last column of the table. For

instance, although the pair Julius and Kaldi yields a

62% success rate for the word intersection function,

the addition of Sphinx bounds the three units over-

all success to the result yielded by the worst Sphinx

pairing result, i.e., the result of the pair Sphinx and

Julius (50%). The table also shows that for this type of

functions the best approach is to use a quorum policy,

i.e., the consensus between at least two units, which

yielded success rates of 31% and 74% for sentence

match and word intersection respectively.

Overall the highest success rate is achieved when

word union is employed. As can be seen in the ta-

ble, the function word union yields success rates of

at least 78%, and 83% in the best case, surpassing

even Google Speech. Contrary to sentence match and

word intersection, the success rate of this function is

the same for the combination of all three units and

the quorum consensus (83%). This happens because

quorum also implies the output of all three units. As a

result, both functions produce the same output. Still,

we argue that union is not a fair result merging func-

tion for two reasons. On one hand, semantically, the

union of the output of two or more speech recogni-

tion units may differ signiﬁcantly from a speech rec-

ognizer expected result. On the other hand, this union

function can potentially endanger the privacy of the

user. For instance, as long as there is one rogue unit

that extracts information from the audio source, e.g., a

voice detector that derives the number of people in the

room based on the background sound, the whole mod-

ule could be compromised, as its result would feature

that information.

After analysing these numbers we can draw three

conclusions: (1) exact sentence match is a poor

speech recognition N-version result merging func-

tion, (2) word intersection recognition success rates

are limited by the worst unit, but are reasonable

when used in a quorum consensus approach, and (3)

although word union success rates are the highest

among the conﬁgurations studied, its semantics and

privacy limitations render it unusable in merging N-

version results. Consequently, we argue that quorum-

based word intersection is the best approach of the

three in merging this type of results. Similarly to the

face recognition case, it can also be complemented

An Extended Case Study about Securing Smart Home Hubs through N-version Programming

131

100

Image

Bluring

Voice

Scrambling

Data

Encryption

Data

Hashing

K-Anonymity Speech

Recognition

Face

Recognition

Normalized execution time (%)

V1 V2 V3 QA

1.1s 294ms 341ms 19.5ms 648ms 2.2s 274ms

99.92%

96.16%

98.92%

99.97%

98.74%

94.7%

99.56%

101.29%

99.99%

99.98%

99.99%

100.01%

89.78%

75.76%

99.99%

100.58%

69.76%

71.34%

97.95%

98.97%

25.99%

49.35%

99.99%

39.95%

99.98%

32.25%

99.99%

Figure 4: Strict and loose modules performance.

with a reputation based approach, in order to address

the issue of the intentionally ineffective sub-modules.

6 PERFORMANCE EVALUATION

This section aims at assessing the performance over-

head introduced by our approach as opposed to run-

ning a single instance of these algorithms.

6.1 Experimental Methodology

The performance evaluation comprises the execution

time measurements of each of the aforementioned N-

modules. These measurements feature the execution

time of each of the three units comprising these mod-

ules, and the execution time of the quorum and to-

tal agreement merges. Each of these measurements

consisted of computing the average of 50 tests, each

with the same input. More speciﬁcally, we chose a

1280x720 pixel image and a factor of 2 for the image

blurrer; a 10 second voice clip for the voice scram-

bler; a randomly generated 256-byte key and 1MB

plaintext for the data encryption module; 1MB worth

of randomly generated text for the data hashing mod-

ule; and a set of 100000 tuples and a K-anonymity of

500 for the K-Anonymization module. For the face

recognition module, we provided a training dataset of

150 pictures of three different people, and an addi-

tional picture as test input; and for speech recogni-

tion, we provided a general acoustic and custom lan-

guage models as knowledge base, and a voice clip as

input. The experiments were conducted on a laptop

equipped with an Intel i3-3217U 1.80GHz CPU and

4GB of RAM. Similar computing resources can be

provided by popular smart home hubs, e.g. Google

OnHub or Google Home, that feature dual- or quad-

core 1.5GHz CPUs with 512MB of RAM, which is

enough for running multiple versions of TFs.

6.2 Main Findings

Figure 4 presents the performance results of the strict

and loose modules. This ﬁgure shows the normalized

execution time of each of the modules’ units, as well

as the two merging approaches. For a matter of con-

sistency we take the TA policy as baseline. Note that

the most signiﬁcant performance differences among

the different strict modules’ units relate to either inef-

fective loop implementations, or recurrent use of data

type casts. However, for the loose modules, the main

performance difference stems from units’ underlying

algorithms diversity and implementations.

The ﬁrst ﬁnding is the conﬁrmation that the paral-

lel execution nature of our approach bounds the two

merging approaches’ execution times to the slowest

unit’s execution time. This is most evident for the

strict K-anonymization V3 unit. For loose modules

the difference between unit execution times is even

more noticeable. For the speech recognition module,

V1’s execution took a quarter of the time needed to

execute V3. The same is observed for the face recog-

nition module, where V3 outperformed V2.

Secondly, there is a signiﬁcant execution time dif-

ference between loose module units. Note again that

loose modules rely on heterogeneous versions. As

a result, the underlying algorithms of units and their

complexity may vary, leading to performance differ-

ences. Unlike strict modules, where the performance

of units is usually similar, the impact of the slowest

units on loose modules’ performance is higher.

The third ﬁnding relates to the cost of the merging

approaches. While we deﬁned the TA policy as base-

line to compare the performance of the three units and

merging approaches, we can see that quorum agree-

ment is sometimes more expensive than total agree-

ment. This happens because, total agreement implies

at most two comparisons, i.e., between V1 and V2,

and between V2 and V3, while quorum agreement,

in the worst case, requires three comparisons to yield

a result. On the other hand, in the best case, quorum

agreement can be achieved with one comparison only.

SECRYPT 2018 - International Conference on Security and Cryptography

132

7 DISCUSSION

Traditionally, NVP has raised two main objections.

First, N-version is regarded as demanding signiﬁcant

human resources to implement the N different soft-

ware versions. However, considering our targeted

scenario, this concern may be alleviated by relying

on open source communities for the development

of TF implementations. In fact, such communities

have shown good results in maintaining large scale

projects, e.g., Debian packages, python modules, and

IoT speciﬁc ones, e.g., apps and automation recipes.

A second objection to NVP is the connotation

of poor failure diversity among independent ver-

sions (Knight and Leveson, 1986). With this respect,

it has also been shown (Knight and Leveson, 1986)

that statistically, the number of common errors is rela-

tively low and the diversity of implementations makes

the overall system robust to failures. Therefore, it

is hard for an adversary to exploit a common ﬂaw

across all the N-version modules. Although at a small

scale, our software ﬂaw study seems to conﬁrm this

idea, since in ﬁve different TFs, common ﬂaws oc-

curred only once. Even so, although this occurrence

was detected by simple debugging tools, another rea-

son behind it could be our speciﬁcation effectiveness,

which was not experimentally tested. Nevertheless,

NVP considerably raises the bar for adversaries since

the number of latent vulnerabilities would be smaller

compared to single version executions.

Our approach’s open source nature may also hin-

der TF utility, as the number of naive or malicious

TF units outputting incorrect results may be higher

than that of correct units. We propose two approaches

to address this issue. First, a TF developer reputa-

tion scheme could provide insights regarding the ef-

fectiveness of a TF unit. This information could then

be used to ﬁlter unwanted units when packaging mod-

ules. Second, at least for loose modules, their ef-

fectiveness could beneﬁt from commercial software,

which from our experience, requires little adaptation

effort with our approach.

Performance-wise, the QA policy’s positive re-

sults seem to suggest that the impact of the slowest

unit for both loose and strict modules can be elimi-

nated by taking advantage of unit redundancy. Instead

of waiting for the slowest unit to ﬁnish, the decision

block may process unit outputs up until a majority

is formed. This approach addresses the performance

problem and provides a reasonable tradeoff between

module performance and user privacy.

As for malicious behaviour it is not in our scope

to prevent malicious application attacks. This holds

true for both attacks targeting hub security mecha-

nisms, e.g., sandboxing, and TF module security, e.g.,

bug exploitation by sending crafted inputs to modules.

Nevertheless, to address TF module security, our de-

sign could be complemented with unit address space

randomization techniques (Cox et al., 2006).

8 RELATED WORK

NVP (Chen and Avizienis, 1978) has originally been

used to reduce the likelihood of errors and bugs in-

troduced during the software development. Multiple

independent teams of programmers developed several

versions of the same software and then ran these im-

plementations in parallel.

Since then, NVP has been used in several ﬁelds.

Veeraraghavan et al. (Veeraraghavan et al., 2011) pro-

pose multiple replicas of a program to be executed

with complementary thread schedules to identify and

eliminate data race bugs that can cause errors at run-

time. DieHard (Berger and Zorn, 2006) uses ran-

domized heap memory placement for each replica to

protect the software from memory errors, e.g. buffer

overﬂow or dangling pointers. Imamura et al. (Ima-

mura et al., 2002) applies N-version programming

in the context of genetics to reduce the number and

variance of errors produced in genetic programming.

Some systems (Cadar and Hosek, 2012; Giuffrida

et al., 2013), apply N-version to the process of updat-

ing software, in order to detect and recover from er-

rors and bugs introduced by the new versions. While

these approaches assume there is only one developer

of multiple software versions, we assume multiple in-

dependent developers and versions.

CloudAV (Oberheide et al., 2008) provides an-

tivirus capabilities as a network service and leverages

NVP to achieve better detection of malicious soft-

ware. However, nothing prevents it from exploiting

private user data. Demotek (Goirizelaia et al., 2008)

employs N-version to enhance the reliability and se-

curity of several components comprising an e-voting

system. Still, it assumes the modules are honest, and

its main goal is to make it difﬁcult for an attacker to

compromise the whole system. Overall, none of the

aforementioned systems rely on N-version to boot-

strap trust in system components, focusing instead on

improving reliability and availability.

Additionally, NVP has been used to detect and

prevent system security attacks such as inadvertent

memory access (Cox et al., 2006; Salamat et al.,

2009). This, however, requires a custom memory al-

location manager and modiﬁcations to the OS kernel.

Moreover, these systems trust multiple versions of the

same software and assume only the input data to be

An Extended Case Study about Securing Smart Home Hubs through N-version Programming

133

potentially malicious. NVP has also been leveraged to

ensure personal information conﬁdentiality and pre-

vent information leaks. Most of these systems employ

techniques in which two replicas of the same soft-

ware are executed with different inputs (Yumerefendi

et al., 2007), under different restrictions (Capizzi

et al., 2008) or on different security levels (Devriese

and Piessens, 2010). To the best of our knowledge,

our work is the ﬁrst to study the feasibility of NVP in

securing smart hub platforms.

9 CONCLUSIONS

In this paper, we performed an extensive study on

the use of NVP in order to enhance the security of

TF-based smart hub platforms, which deal with home

sensitive data. Our work comprises a thorough study

on both strict and loose trusted function speciﬁca-

tions. The results provide insights on our approach’s

effectiveness, and foster discussion surrounding util-

ity, performance, and security issues associated with

naive and malicious implementation output results.

ACKNOWLEDGEMENTS

We thank the anonymous reviewers for their com-

ments and suggestions. This work was partially

supported by Fundac¸

ao para a Ci

encia e Tecnolo-

gia (FCT) via projects UID/CEC/50021/2013 and

SFRH/BSAB/135236/2017.

REFERENCES

Berger, E. D. and Zorn, B. G. (2006). Diehard: probabilis-

tic memory safety for unsafe languages. In Proc. of

PLDI.

Cadar, C. and Hosek, P. (2012). Multi-version software up-

dates. In Proc. of ICSE.

Capizzi, R., Longo, A., Venkatakrishnan, V., and Sistla,

A. P. (2008). Preventing information leaks through

shadow executions. In Proc. of ACSAC.

Chen, L. and Avizienis, A. (1978). N-version programming:

A fault-tolerance approach to reliability of software

operation. In Proc. of FTCS-8.

Computerworld (2016). Chinese Firm Admits Its Hacked

Products Were Behind Friday’s DDOS Attack.

http://www.computerworld.com/article/3134097. Ac-

cessed May 2018.

Cox, B., Evans, D., Filipi, A., Rowanhill, J., Hu, W., David-

son, J., Knight, J., Nguyen-Tuong, A., and Hiser, J.

(2006). N-variant systems: A secretless framework

for security through diversity. In Proc. of Usenix Se-

curity.

Davies, N., Taft, N., Satyanarayanan, M., Clinch, S., and

Amos, B. (2016). Privacy Mediators: Helping IoT

Cross the Chasm. In Proc. of HotMobile.

Devriese, D. and Piessens, F. (2010). Noninterference

through secure multi-execution. In Proc. of SP.

Fernandes, E., Jung, J., and Prakash, A. (2016a). Security

Analysis of Emerging Smart Home Applications. In

Proc. of SP.

Fernandes, E., Paupore, J., Rahmati, A., Simionato, D.,

Conti, M., and Prakash, A. (2016b). FlowFence: Prac-

tical Data Protection for Emerging IoT Application

Frameworks. In Proc. of USENIX Security.

Forbes (2013). When ’Smart Homes’ Get Hacked.

http://www.forbes.com/sites/kashmirhill/2013/07/26/

smart-homes-hack. Accessed May 2018.

Giuffrida, C., Iorgulescu, C., Kuijsten, A., and Tanenbaum,

A. S. (2013). Back to the future: Fault-tolerant live

update with time-traveling state transfer. In Proc. of

LISA.

Goirizelaia, I., Selker, T., Huarte, M., and Unzilla, J.

(2008). An Optical Scan E-Voting System Based on

N-Version Programming. IEEE Security & Privacy,

6(3):47–53.

Imamura, K., Heckendorn, R. B., Soule, T., and Foster, J. A.

(2002). N-Version Genetic Programming via Fault

Masking. In Proc. of EUROGP.

Kelion, L. (2012). Trendnet security ﬂaw exposes

video feeds. http://www.bbc.com/news/technology-

16919664. Accessed May 2018.

Knight, J. C. and Leveson, N. G. (1986). An Experimen-

tal Evaluation of the Assumption of Independence in

Multiversion Programming. IEEE Transactions on

Software Engineering, pages 96–109.

Mortier, R., Zhao, J., Crowcroft, J., Wang, L., Li, Q., Had-

dadi, H., Amar, Y., Crabtree, A., Colley, J. A., Lodge,

T., Brown, T., McAuley, D., and Greenhalgh, C.

(2016). Personal Data Management with the Databox:

What’s Inside the Box? In Proc. WCAN CoNEXT.

Oberheide, J., Cooke, E., and Jahanian, F. (2008). CloudAV:

N-Version Antivirus in the Network Cloud. In Proc.

of USENIX Security.

Salamat, B., Jackson, T., Gal, A., and Franz, M. (2009).

Orchestra: intrusion detection using parallel execution

and monitoring of program variants in user-space. In

Proc. of EuroSys.

Veeraraghavan, K., Chen, P. M., Flinn, J., and

Narayanasamy, S. (2011). Detecting and surviving

data races using complementary schedules. In Proc.

of SOSP.

Yumerefendi, A. R., Mickle, B., and Cox, L. P. (2007).

Tightlip: Keeping applications from spilling the

beans. In Proc. of NSDI.

SECRYPT 2018 - International Conference on Security and Cryptography

134