Secure Grouping and Aggregation with MapReduce

Radu Ciucanu

, Matthieu Giraud

, Pascal Lafourcade

and Lihua Ye

LIMOS, Universit

e Clermont Auvergne, Aubi

ere, France

Harbin Institute of Technology, Harbin, Weihai, Shenzhen, China

Keywords:

Database Queries, MapReduce, Security, Grouping, Aggregation.

Abstract:

MapReduce programming paradigm allows to process big data sets in parallel on a large cluster. We focus on

a scenario where the data owner outsources her data on an honest-but-curious server. Our aim is to evaluate

grouping and aggregation with SUM, COUNT, AVG, MIN, and MAX operations for an authorized user. For

each of these ﬁve operations, we assume that the public cloud provider and the user do not collude i.e., the

public cloud does not know the secret key of the user. We prove the security of our approach for each operation.

1 INTRODUCTION

We address the fundamental problem of how to group

and aggregate data from a relation in a privacy-

preserving manner using MapReduce. We assume

that the data is externalized in the cloud by the data

owner and there is a user that queries it. We consider

the following ﬁve aggregation operations, which are

precisely those included in the SQL standard: SUM,

COUNT, AVG, MIN, and MAX.

We start by a running example to present the con-

cepts of grouping and aggregation, and of MapRe-

duce computations. Then, we present our problem

statement and illustrate with the same example the

privacy issues related to grouping and aggregation

with MapReduce.

Example 1. Assume there is a university storing the

relation R corresponding to the list of professors with

their associated department and salary. The grouping

and aggregation operation on the relation R, in the

case where we assume one group attribute and one

aggregate function, is denoted by γ

A,θpBq

pRq, where A

is the grouping attribute and θ is one of the ﬁve ag-

gregation operations applied on the attribute B dif-

ferent from the grouping attribute. In this example

(Figure 1), we consider the attribute “Department”

as the grouping attribute and SUM is the aggregation

operation applied on attribute “Salary”. Hence, for

each department we sum all the associated salaries.

Since Alice and Bob are in the Computer Science de-

partment, the sum of salaries associated to the Com-

puter Science department is 1900 ` 1800 “ 3700. In

the same way, we sum the salaries of Mallory and Os-

Name Department Salary

Alice Computer Science 1900

Mallory Mathematics 1750

Bob Computer Science 1800

Eve Physics 2000

Oscar Mathematics 1600

Figure 1: Relation R.

Department SUM (salary)

Computer Science 3700

Physics 2000

Mathematics 3350

Figure 2: Result of γ

Department,SUMpSalaryq

pRq.

car from the Mathematics department. Since Eve is

the only one in the Physics department, the sum cor-

responds to the salary of Eve which is equal to 2000.

For the query γ

Department,SUMpSalaryq

pRq, we obtain the

relation presented in Figure 2. Aggregation operati-

ons COUNT, AVG, MIN, or MAX work similarly.

Grouping-and-aggregation with MapReduce. An

algorithm to perform grouping and aggregation with

MapReduce is presented in Chapter 2 of (Leskovec

et al., 2014). First, a set of nodes has chunks of the

relation. The map function creates for each tuple a

key-value pair where key is equal to the value of the

grouping attributes in the considered tuple, and value

is equal to the value of the aggregation attribute of

the considered tuple. Then, the key-value pairs are

grouped by key, i.e., key-value pairs output by the

map phase which have the same key are sent to the

same reducer. For each key, the reduce function ap-

348

Ciucanu, R., Giraud, M., Lafourcade, P. and Ye, L.

Secure Grouping and Aggregation with MapReduce.

DOI: 10.5220/0006843803480355

In Proceedings of the 15th International Joint Conference on e-Business and Telecommunications (ICETE 2018) - Volume 2: SECRYPT, pages 348-355

ISBN: 978-989-758-319-3

Data owner

Public Cloud

A,θpBq

pRq

User

Figure 3: The system architecture.

plies the aggregate function on the associated values

of the considered key.

Example 2. Following Example 1, we perform grou-

ping and aggregation with MapReduce on the rela-

tion R where the grouping attribute is the attribute

“Department”, the aggregation attribute is the attri-

bute “Salary”, and the operation is the SUM. We

start grouping and aggregation with MapReduce by

applying the map function. Since the grouping attri-

bute is the attribute “Department” and that the ag-

gregation attribute is the attribute “Salary”, the map

function emits the pairs pComputer Science, 1900q,

pMathematics, 1750q, (Computer Science, 1800),

pPhysics, 2000q, and pMathematics, 1600q. Pairs

sharing the same key (i.e., same value of the grou-

ping attribute) are sent on the same reducer. Then, the

reduce function performs on each reducer the aggre-

gation, consisting here of the sum, and we obtain the

pairs pComputer Science, 3700q since 1900` 1800 “

3700, etc. We present the ﬁnal result in Figure 2.

Problem Statement. We assume three participants:

the data owner, the public cloud and the user (pre-

sented in Figure 3). The data owner stores a relation

R in the distributed ﬁle system of some public cloud

provider. A user (who does not know the relation R)

is authorized to perform a grouping and aggregation

operation on R.

We assume that the public cloud is honest-but-

curious, i.e., it executes dutifully the computation task

but tries to learn the maximum of information on tu-

ples of R. In order to preserve the privacy of the data

owner, the cloud should not learn any plain input data,

contrary to what happens for standard algorithms as

found in Chapter 2 from (Leskovec et al., 2014) and

exempliﬁed above.

We assume that the relation R is initially spread

over a set R of nodes, each of them storing a chunk

of R i.e., a set of elements of R. The ﬁnal result

A,θpBq

pRq is spread over a set of nodes Q before it

is sent to the user’s nodes U. We expect that none of

the nodes in Q can learn any information about rela-

tion R, or about the ﬁnal result.

Notice that a straightforward solution would re-

quire the use of a fully homomorphic encryption

scheme e.g., (Gentry, 2009). Indeed, a fully ho-

momorphic encryption scheme would allow to exe-

cute directly in the encrypted domain all operations

needed for computing a grouping and aggregation

operation. Unfortunately, such an approach would

solve our problem only from a theoretical point of

view because making a fully homomorphic encryp-

tion scheme work in practice remains an open ques-

tion (as noted e.g., in (Gentry, 2009)).

Contributions. We revisit the standard algorithms

for MapReduce grouping and aggregation (as found

in Chapter 2 from (Leskovec et al., 2014)) to gua-

rantee the privacy of the data owner. More precisely,

neither the public cloud nor the user learn information

about the input data that belongs to the data owner.

Our approach, denoted SP for Secure-Private, works

for each of the considered ﬁve aggregation operations.

In each case, the SP approach is efﬁcient from both

computational and communication points of view, in

the sense that the overhead is linear for each of the

two complexity measures.

Our technique is essentially based on two encryp-

tion schemes: (i) the well-known Paillier’s cryptosy-

stem (Paillier, 1999), which is partially homomorphic

i.e., it is additive homomorphic for COUNT, SUM,

and AVG operations, and (ii) the order-preserving

symmetric encryption scheme (Agrawal et al., 2004)

for MIN and MAX operations.

We summarize in Figure 4 the trade-offs between

computation cost and communication cost for our SP

approach vs the standard MapReduce approach for

grouping and aggregation for the ﬁve studied operati-

ons. In our communication cost analysis, we measure

the total size of the data that is emitted from a map or

reduce node.

Related Work. Since the seminal MapReduce pa-

per (Dean and Ghemawat, 2004), different proto-

cols have been proposed to perform operations in

a privacy-preserving manner (Derbeko et al., 2016)

such as search (Blass et al., 2012) (Mayberry et al.,

2013), count (Vo-Huu et al., 2015), matrix multiplica-

tion (Bultel et al., 2017) or joins (Dolev et al., 2016).

Chapter 2 of (Leskovec et al., 2014) presents an

introduction to the MapReduce paradigm. In particu-

lar, it includes the MapReduce algorithm for grouping

and aggregation that we enhance with privacy guaran-

tees. Very few approaches address the privacy preser-

ving execution for grouping and aggregation operati-

ons in MapReduce, and moreover they have different

assumptions than we do.

(Bonawitz et al., 2017) provides a technique

to compute secure aggregation, while relying on

Shamir’s secret sharing (Shamir, 1979) to compute

Secure Grouping and Aggregation with MapReduce

349

Alg. Approach Comp. cost (big-O) Comm. cost (big-O)

COUNT

Standard p1 `C

qn 2n

SP pC

` 2C

qn 3n

SUM

Standard p1 `C

qn 2n

SP pC

` 2C

qn 3n

AVG

Standard p1 ` 2C

qn 2n

SP pC

` 3C

` 2C

qn 3n

MIN{MAX

Standard p1 `C

comp

qn 2n

SP pC

ope

` 3C

comp

qn 3n

Figure 4: Summary of results. Let n be the number of tuples in the relation R. Let C

(resp. C

, C

ope

, C

) is the

cost of addition (resp. multiplication, division, pseudo-random function evaluation, order-preserving symmetric encryption,

asymmetric encryption, and asymmetric decryption) and 1 represents the cost to access to one tuple in the relation.

the sum of values coming from different sources. Si-

milarly, (Alghamdi et al., 2017) provides a techni-

que to compute secure aggregation for wireless sensor

networks. Contrary to us, these two approaches do not

consider the MapReduce paradigm and they cannot

be easily adapted for MapReduce because values of

shared attributes are encrypted in a non-deterministic

way. This is not a suitable choice for MapReduce

keys that need to be equal in order to aggregate the

key-value pairs on the same reducer.

(Dolev et al., 2016) proposed a technique for

executing MapReduce computations in the public

cloud while preserving data owner privacy. They use

the Shamir’s secret sharing and accumulating auto-

mata (Dolev et al., 2015). Among the ﬁve aggregati-

ons studied in this paper, they support only the count,

whose computation is done on secret-shares in the pu-

blic cloud, and at the end, the user performs the inter-

polation on the outputs. On the other hand, in our set-

ting, the user has only to decrypt the ﬁnal query result,

contrary to the need of doing interpolations in (Dolev

et al., 2015).

On the other hand, substantial works has been

done on privacy-preserving functional queries on tra-

ditional rational database. Popa et al. (Popa et al.,

2011) designed CryptDB a system allowing a user to

execute queries over encrypted data. The authors con-

sider two threats. The ﬁrst threat is a curious database

administrator who tries to learn private data while the

second threat is an adversary that gains complete con-

trol of application. In (Macedo et al., 2017), authors

proposed a generic framework called SafeNoSQL to

compute in a privacy-preserving manner on NoSQL

databases. This framework has a modular and exten-

sible design that enables data processing over multi-

ple cryptographic techniques applied on the same da-

tabase schema. Contrary to us, these two approaches

do not consider the MapReduce paradigm.

To the best of our knowledge, we are the ﬁrst to

propose secure algorithms for grouping and aggre-

gation computation with the MapReduce paradigm

where the public cloud performs all the computations

and where the user has only to decrypt the result sent

by the cloud.

Outline. We introduce some preliminary notions in

Section 2. Then, we present our SP approach for these

ﬁve operations in Section 3. Finally, we outline con-

clusions and future work in Section 4.

2 PRELIMINARIES

2.1 Relational Algebra

A relation R is a set of n tuples. For a tuple t P R,

by π

ptq we denote the projection of the tuple t on

the attributes X i.e., the tuple obtained from t after

removing all attributes values that are not in X.

By γ

A,θpBq

pRq we denote the grouping and aggre-

gation operation on R, where A is the set of attributes

on which we group, B is the attribute for which we ap-

ply the aggregation function, and θ is an aggregation

function (SUM, COUNT, AVG, MIN, MAX).

2.2 Grouping and Aggregation with

MapReduce

We recall the MapReduce algorithms for grouping

and aggregation algorithms, as found in Chapter 2 of

(Leskovec et al., 2014): for COUNT in Figure 5(a),

for SUM in Figure 5(b), for AVG in Figure 5(c), and

for MIN in Figure 5(d). The algorithm for MAX is

very similar to the one for MIN and we omit it.

2.3 Cryptographic Tools

We present deﬁnitions of the cryptographic tools used

in our protocols: negligible function, pseudo-random

function, order-preserving encryption scheme, and

public key encryption scheme.

SECRYPT 2018 - International Conference on Security and Cryptography

350

Map function:

Input: pkey, valueq

// key: id of a chunk of R

// value: collection of t P R

foreach t P R do

emit pπ

ptq, 1q.

Reduce function:

Input: pkey, valuesq

// key: π

ptq for t P R

// values: collection of 1

count Ð 0;

foreach 1 P values do

count Ð count ` 1;

emit pπ

ptq, countq.

(a) COUNT operation.

Map function:

Input: pkey, valueq

// key: id of a chunk of R

// value: collection of t P R

foreach t P R do

emit pπ

ptq, π

ptqq.

Reduce function:

Input: pkey, valuesq

// key: π

ptq with t P R

// values: collection of π

ptq with t P R

sum Ð 0

foreach π

ptq P values do

sum Ð sum ` π

ptq;

emit pπ

ptq, sumq.

(b) SUM operation.

Map function:

Input: pkey, valueq

// key: id of a chunk of R

// value: collection of t P R

foreach t P R do

emit pπ

ptq, π

ptqq.

Reduce function:

Input: pkey, valuesq

// key: π

ptq for t P R

// values: collection of π

ptq

cpt Ð 0;

sum Ð 0;

foreach π

ptq P values do

cpt Ð cpt ` 1;

sum Ð sum ` π

ptq;

emit pπ

ptq, sum{cptq.

Map function:

Input: pkey, valueq

// key: id of a chunk of R

// value: collection of t P R

foreach t P R do

emit pπ

ptq, π

ptqq.

Reduce function:

Input: pkey, valuesq

// key: π

ptq for t P R

// values: collection of π

ptq

min

Ð values;

foreach π

ptq P values do

if π

ptq ă min then

min Ð π

ptq;

emit pπ

ptq, minq.

(d) MIN operation.

Figure 5: Grouping and aggregation with MapReduce for COUNT, SUM, AVG, MIN operations.

Deﬁnition 1 (Negligible function). A function ε :

N Ñ N is negligible in η if for every positive poly-

nomial pp¨q and sufﬁciently large η, εpηq ă 1{ppηq.

Deﬁnition 2 (Pseudo-random function). A function

f : t0, 1u

ˆ t0, 1u

Ñ t0, 1u

is a pseudo-random

function if it is calculable in polynomial time in η and

if for all polynomial-size algorithm B,

“

p¨q

“ 1 : k

Ð t0, 1u

‰

´ Pr

“

gp¨q

“ 1 : g

Ð Funcrn

, n

‰

ď εpηq ,

where εp¨q is a negligible function in η, Func is

the space functions from domain t0, 1u

to dom-

ain t0, 1u

, and the probabilities are taken over the

choice of k and g.

Deﬁnition 3 (Order-Preserving Symmetric Encryp-

tion (Agrawal et al., 2004)). Let η be a security para-

meter. An order-preserving encryption (OPE) scheme

is deﬁned by three algorithms pG

ope

, E

ope

, D

ope

pηq: returns a secret key K.

ope

pmq: returns a new key K

and a ciphertext c.

ope

pcq: returns the plaintext m.

such that for any two ciphertexts c

and c

with cor-

responding messages m

and m

we have c

ă c

and only if m

ă m

Deﬁnition 4 (Public Key Encryption (PKE)). Let η be

a security parameter. A public key encryption (PKE)

scheme is deﬁned by three algorithms pG, E, Dq:

Gpηq: returns a public/private key pair ppk, skq.

pmq: returns the ciphertext c.

pcq: returns the plaintext m.

In the following, we require an additive homo-

morphic encryption scheme to secure the grouping

and aggregation with MapReduce. There exists se-

veral schemes that have this property (Okamoto and

Uchiyama, 1998; Paillier, 1999; Naccache and Stern,

1998). We choose Paillier’s cryptosystem (Paillier,

1999) to illustrate speciﬁc required homomorphic

properties. Our results and proofs are generic, since

any other encryption schemes having such properties

can be used instead of Paillier’s scheme.

We recall the key generation, the encryption and

decryption algorithms.

Key Generation. We denote by Z

, the ring of integers

modulo n and by Z

the set of invertible elements of

Secure Grouping and Aggregation with MapReduce

351

. The public key pk of Paillier’s encryption scheme

is pn, gq, where g P Z

and n “ p ˆ q is the product

of two prime numbers such that gcdpp, qq “ 1. The

corresponding private key sk is pλ, µq, where λ is the

least common multiple of p ´ 1 and q ´ 1 and µ “

pLpg

mod n

´1

mod n, where Lpxq “

x´1

Encryption Algorithm. Let m be a message such that

m P Z

. Let g be an element of Z

and r be a random

element of Z

. We denote by E

the encryption

function that produces the ciphertext c from a given

plaintext m with the public key pk “ pn, gq as follows:

c “ g

ˆ r

mod n

Decryption Algorithm. Let c be the ciphertext such

that c P Z

. We denote by D

the decryption

function of the plaintext c with the secret key sk “

pλ, µq deﬁned as follows: m “ L

mod n

ˆ µ

mod n .

Homomorphic Addition of Plaintexts. Paillier’s

cryptosystem is a partial homomorphic encryption

scheme. Let m

and m

be two plaintexts in Z

. The

product of the two associated ciphertexts with the pu-

blic key pk “ pn, gq, denoted c

“ E

q “ g

ˆr

mod n

and c

“ E

q “ g

ˆ r

mod n

, is the

encryption of the sum of m

and m

q ˆ E

“ c

ˆ c

mod n

“ pg

ˆ r

q ˆ pg

ˆ r

q mod n

“

ˆ pr

ˆ r

mod n

“ E

` m

mod nq .

3 SECURE PRIVATE APPROACH

We present our SP approach for the COUNT, SUM,

AVG, MIN, and MAX aggregation functions with

MapReduce. We denote respectively these ﬁve pro-

tocols: SP-COUNT, SP-SUM, SP-AVG, SP-MIN, and

SP-MAX. The algorithm for SP-MAX is very similar

to SP-MIN and we omit it to avoid redundancy.

3.1 SP Protocols

To avoid the cloud to learn the content of the relation

R, the data owner protects it before the outsourcing.

We denote the protected relation by

The data owner protects the relation using a

pseudo-random function with her secret key k and by

applying it on values of grouping attributes of each

tuples of the relation R. These deterministic pseudo-

random function evaluations allow the cloud to per-

form equality tests between values of grouping attri-

butes. Moreover, the data owner encrypts each va-

lue of the aggregation attribute either with Paillier’s

scheme (using the user public key pk

) or the OPE

scheme (using the shared secret key K between the

data owner and the user), depending on the aggrega-

tion function. We present the preprocessing phase in

Algorithm 1, where E represents either the Paillier en-

cryption (in the case of COUNT, SUM, AVG operati-

ons) or the OPE encryption (in the case of MIN and

MAX operations). We stress that A

and A

are just

notations making explicit the correspondences bet-

ween initial and outsourced data and that

R is the

schema of

R. For instance, if a relation R has two

attributes such that “Name” is the grouping attribute

and “Age” is the aggregation attribute, then

R has at-

tributes “Name

”, “Name

” and “Age

”.

Algorithm: PreProcpRq

R Ð H;

Ð tA

|A P Au;

Ð tA

|A P Au;

R Ð A

Y A

Y B;

for t P R do

pπ

ptqq;

pπ

ptqqq;

R Ð

R Y tt

ˆt

ˆ Epπ

ptqqu;

Algorithm 1: Preprocessing of relations.

SP-COUNT (Figure 6(a)). Value of pairs sent by the

map function contains the Paillier encryption of the

grouping attribute value and the Paillier encryption of

1. Using the homomorphic property of the Paillier’s

scheme, each reducer multiplies encryption of 1 to

obtain the count of tuples sharing the same value of

the grouping attribute.

SP-SUM (Figure 6(b)). Value of pairs sent by the

map function contains the Paillier encryption of the

grouping attribute value and the Paillier encryption

of the aggregation attribute value. Similarly to

the SP-COUNT protocol, we use the homomorphic

property of the Paillier’s scheme allowing each

reducer to multiply encrypted aggregates to obtain

the encryption of the sum of tuples values sharing the

same grouping attribute value.

SP-AVG (Figure 6(c)). The protocol combines the

SP-COUNT protocol and the SP-SUM protocol. This

allows the MapReduce user to compute the average.

SP-MIN (Figure 6(d)). We stress that before to apply

the map function, the data owner must encrypt all va-

lues of the aggregate attribute using an OPE scheme

with the secret key K shared between the data owner

and the MapReduce user.

SECRYPT 2018 - International Conference on Security and Cryptography

352

Map function

Input: pkey, valueq

// key: id of a chunk of

// value: collection of t P

foreach t P

R do

emit pπ

ptq, pπ

ptq, E

p1qqq

Reduce function

Input: pkey, valuesq

// key: π

ptq for t P

// values: collection of pπ

ptq, E

p1qq

count Ð1

foreach pπ

ptq, E

p1qq P values do

count Ð count ¨ E

p1q

emit pπ

, countq

(a) SP-COUNT protocol.

Map function

Input: pkey, valueq

// key: id of a chunk of

// value: collection of t P

foreach t P

R do

emit pπ

ptq, pπ

ptq, π

ptqqq

Reduce function

Input: pkey, valueq

// key: π

ptq for t P

// value: collection of pπ

ptq, π

ptqq

sum Ð 1

foreach pπ

ptq, π

ptqqP values do

sum Ðsum ¨ π

ptq

emit pπ

ptq, sumq

(b) SP-SUM protocol.

Map function

Input: pkey, valueq

// key: id of a chunk of

// value: collection of t P

foreach t P

R do

emit pπ

ptq, pπ

ptq, π

ptq, E

p1qqq

Reduce function

Input: pkey, valueq

// key: π

ptq for t P

// value: collection of pπ

ptq, π

ptq, E

p1qq

cpt Ð 1

sum Ð 1

foreach pπ

ptq, π

ptq, E

p1qq P values do

cpt Ð cpt ¨ E

p1q

sum Ð sum ¨ π

ptq

emit pπ

ptq, cpt, sumq

Map function

Input: pkey, valueq

// key: id of a chunk of

// value: collection of t P

foreach t P

R do

emit pπ

ptq, pπ

ptq, π

ptqqq

Reduce function

Input: pkey, valuesq

// key: π

ptq for t P

// values: collection of pπ

ptq, π

ptqq

, v

Ð values

min Ð D

foreach pπ

ptq, π

ptqq P values do

x Ð D

pπ

ptqq

if x ă min then

min Ð x

emit pπ

ptq, E

pminqq

(d) SP-MIN protocol.

Figure 6: Secure grouping and aggregation with MapReduce for COUNT, SUM, AVG, and MIN operations. The highlighting

emphasizes differences w.r.t. the standard non-secured approach cf. Figure 5.

Value of pairs sent by the map function contains

the encryption of the pre-computed OPE ciphertexts

using an IND-CPA public key encryption scheme

with the public key pk

of the public cloud. Since the

OPE encryption is deterministic, the additional public

key encryption avoids an eavesdropper between the

data owner and the public cloud to have any informa-

tion on repetitions of values sent by the data owner.

After received the key-value pairs, the public

cloud uses its secret key sk

to obtain OPE ciphers.

Using the property of the OPE scheme, each reducer

of the public cloud computes the minimum to obtain

the minimum value associated to the considered

value of the grouping attribute. Finally, the public

cloud uses the public key pk

of the user to encrypt

each OPE ciphertext and sends the result to the user.

Remark: As we can see in the SP-COUNT protocol

(Figure 6(a)), a public cloud knowing that it performs

the count operation can deduce the value of the count

even if it can not decrypt the encryption of 1. In fact,

the public cloud can count tuples that each reducer re-

ceives. Hence, it deduce the count result for the cor-

responding key. We stress that the plain value of the

key stay unknown from the public cloud since it does

not have the secret key sk

of the user to decrypt it.

In the following, we present the SP

comb

-COUNT and

the SP

comb

-AVG protocols in Figure 7 using combi-

ners (Leskovec et al., 2014) to avoid this leakage of

information.

3.2 Reﬁnement: Combiners

Combiners allow to push some of what the reducers

do to the map function. In the case of the COUNT

operation, the map function counts tuples of the chunk

that share the same value for the grouping attribute.

Hence, each reducer receives key-value pairs, where

key is the grouping attribute value, and value is the

count of tuples sharing this key in the chunk.

We use homomorphic property of the Paillier’s

scheme to count in the map function the number of

Secure Grouping and Aggregation with MapReduce

353

Map function:

Input: pkey, valueq

// key: id of a chunk of

// value: collection of pt

, t

q P

L Ð r s ; // Let L be a dictionary

foreach pt

, t

q P

R do

if pt

, t

q P L then Lrpt

, t

qs Ð Lrpt

, t

qs ¨

p1q;

else Lrpt

, t

qs Ð E

p1q;

foreach pt

, t

q P L do

emit pt

, pt

, Lrpt

, t

qsqq.

Reduce function:

Input: pkey, valuesq

// key: π

ptq for t P

// values: collection of pE

paq, E

pbqq

count Ð 1;

foreach pE

paq, E

pbqq P values do

count Ð count ¨ E

pbq;

emit pE

paq, countq.

Figure 7: SP

comb

-COUNT protocol.

Map function:

Input: pkey, valueq

// key: id of a chunk of

// value: collection of pt

, t

q P

L Ð r s ; // Let L be a dictionary

M Ð r s ; // Let M be a dictionary

foreach pt

, t

q P

R do

if pt

, t

q P L then Lrpt

, t

qs Ð Lrpt

, t

qs ¨

p1q;

else Lrpt

, t

qs Ð E

p1q;

if pt

, t

q P M then Mrpt

, t

qs Ð Mrpt

, t

qs ¨t

;

else Mrpt

, t

qs Ð t

;

foreach pt

, t

q P L do

emit pt

, pt

, Lrpt

, t

qs, Mrpt

, t

qsqq.

Reduce function:

Input: pkey, valuesq

// key: π

ptq for t P

// values: collection of pE

paq, E

pbq, E

pcqq

cpt Ð 1;

sum Ð 1;

foreach pE

paq, E

pbq, E

pcqq P values do

cpt Ð cpt ¨ E

pbq;

sum Ð sum ¨ E

pcq;

emit pE

paq, cpt, sumq.

Figure 8: SP

comb

-AVG protocol.

tuples in the chunk that share the same grouping attri-

bute value. Then, each reducer multiplies all encryp-

ted counts for the considered grouping attribute value

to obtain the ﬁnal encrypted count sent to the user. We

present this reﬁnement called SP

comb

-COUNT proto-

col in Figure 7.

Similarly, we can use combiners for the AVG ope-

ration. Even if the sum is encrypted, combiners hide

the count used for each grouping attribute value i.e.,

for each computed average. We present this reﬁne-

ment called SP

comb

-AVG protocol in Figure 8.

We stress that we can also use combiners with

SUM, and MIN/MAX operations but they do not add

privacy as in previous operations.

3.3 Security Proofs

The security proofs of the SP-SUM protocol (Theo-

rem 1) and of the SP-MIN protocol (Theorem 2) are

presented in (Ciucanu et al., 2018). We emphasize

that the security for SP-(COUNT-AVG) protocols are

similar to the SP-SUM protocol so we do not present

them. Moreover, the security proofs for SP-MIN and

SP-MAX protocols are identical so we do not present

it too.

Theorem 1. The SP-SUM protocol securely computes

the grouping and aggregation for the SUM operation

in the ROM in the presence of semi-honest adversary

even if cloud nodes collude.

Theorem 2. The SP-MIN protocol securely computes

the grouping and aggregation for the MIN operation

in the ROM in the presence of semi-honest adversa-

ries even if cloud nodes collude.

4 CONCLUSION

We have presented efﬁcient algorithms for grouping

and aggregation operations with MapReduce that en-

joy privacy guarantees such as none of the nodes of

the public cloud computing can learn the input or the

output relation. To achieve our goal, we relied on

Paillier’s cryptosystem and on Order-Preserving en-

cryption. We developed an efﬁcient approach (SP) on

the computation cost side as the communication cost

side. We have compared this approach to the standard

algorithm with respect to three fundamental criteria:

computation cost, communication cost, and privacy

guarantees.

Looking forward to future work, we plan to study

the practical performance of our algorithms in an

open-source system that implements the MapReduce

paradigm as Hadoop

. Additionally, we aim to inves-

tigate the grouping and aggregation computation with

privacy guarantees in different big data systems (such

as Spark or Flink) whose users also tend to outsource

data and computations similarly to MapReduce.

ACKNOWLEDGEMENTS

This research was conducted with the support of the

FEDER program of 2014-2020, the region council

Apache Hadoop: https://hadoop.apache.org/

SECRYPT 2018 - International Conference on Security and Cryptography

354

of Auvergne-Rh

one-Alpes, the support of the “Di-

gital Trust” Chair from the University of Auvergne

Foundation, the Indo-French Centre for the Promo-

tion of Advanced Research (IFCPAR) and the Cen-

ter Franco-Indien Pour La Promotion De La Re-

cherche Avanc

ee (CEFIPRA) through the project

DST/CNRS 2015-03 under DST-INRIA-CNRS Tar-

geted Programme.

REFERENCES

Agrawal, R., Kiernan, J., Srikant, R., and Xu, Y. (2004).

Order-Preserving Encryption for Numeric Data. In

Proceedings of the ACM SIGMOD International Con-

ference on Management of Data, pages 563–574.

Alghamdi, W. Y., Wu, H., and Kanhere, S. S. (2017). Reli-

able and Secure End-to-End Data Aggregation Using

Secret Sharing in WSNs. In 2017 IEEE Wireless Com-

munications and Networking Conference, WCNC, pa-

ges 1–6.

Blass, E., Pietro, R. D., Molva, R., and

Onen, M. (2012).

PRISM - Privacy-Preserving Search in MapReduce.

In Privacy Enhancing Technologies - 12th Internatio-

nal Symposium, PETS, pages 180–200.

Bonawitz, K., Ivanov, V., Kreuter, B., Marcedone, A.,

McMahan, H. B., Patel, S., Ramage, D., Segal, A.,

and Seth, K. (2017). Practical Secure Aggregation

for Privacy-Preserving Machine Learning. In Pro-

ceedings of the 2017 ACM SIGSAC Conference on

Computer and Communications Security, CCS, pages

1175–1191.

Bultel, X., Ciucanu, R., Giraud, M., and Lafourcade, P.

(2017). Secure Matrix Multiplication with MapRe-

duce. In Proceedings of the 12th International Confe-

rence on Availability, Reliability and Security, pages

11:1–11:10.

Ciucanu, R., Giraud, M., Lafourcade, P., and Ye, L.

(2018). Secure grouping and aggregation with mapre-

duce. Cryptology ePrint Archive, Report 2018/501.

https://eprint.iacr.org/2018/501.

Dean, J. and Ghemawat, S. (2004). MapReduce: Simpli-

ﬁed Data Processing on Large Clusters. In 6th Sym-

posium on Operating System Design and Implementa-

tion OSDI, pages 137–150.

Derbeko, P., Dolev, S., Gudes, E., and Sharma, S. (2016).

Security and privacy aspects in MapReduce on clouds:

A survey. Computer Science Review, 20:1–28.

Dolev, S., Gilboa, N., and Li, X. (2015). Accumula-

ting Automata and Cascaded Equations Automata for

Communicationless Information Theoretically Secure

Multi-Party Computation: Extended Abstract. In Pro-

ceedings of the 3rd International Workshop on Secu-

rity in Cloud Computing, SCC@ASIACCS ’15, pages

21–29.

Dolev, S., Li, Y., and Sharma, S. (2016). Private and Se-

cure Secret Shared MapReduce. In Data and Appli-

cations Security and Privacy XXX - 30th Annual IFIP

WG 11.3 Conference, DBSec, pages 151–160.

Gentry, C. (2009). Fully Homomorphic Encryption Using

Ideal Lattices. In Proceedings of the Forty-ﬁrst Annual

ACM Symposium on Theory of Computing, STOC ’09,

pages 169–178. ACM.

Leskovec, J., Rajaraman, A., and Ullman, J. D. (2014).

Mining of Massive Datasets. Cambridge University

Press.

Macedo, R., Paulo, J., Pontes, R., Portela, B., Oliveira, T.,

Matos, M., and Oliveira, R. (2017). A practical fra-

mework for privacy-preserving nosql databases. In

36th IEEE Symposium on Reliable Distributed Sys-

tems, SRDS 2017, Hong Kong, Hong Kong, September

26-29, 2017, pages 11–20.

Mayberry, T., Blass, E., and Chan, A. H. (2013). PIRMAP:

Efﬁcient Private Information Retrieval for MapRe-

duce. In Financial Cryptography and Data Security

- 17th International Conference, FC, pages 371–385.

Naccache, D. and Stern, J. (1998). A New Public Key Cryp-

tosystem Based on Higher Residues. In Proceedings

of the 5th ACM Conference on Computer and Commu-

nications Security, CCS ’98, pages 59–66, New York,

NY, USA. ACM.

Okamoto, T. and Uchiyama, S. (1998). A New Public-key

Cryptosystem as Secure as Factoring, pages 308–318.

Springer Berlin Heidelberg.

Paillier, P. (1999). Public-Key Cryptosystems Based on

Composite Degree Residuosity Classes. In Advan-

ces in Cryptology - EUROCRYPT ’99, International

Conference on the Theory and Application of Crypto-

graphic Techniques, pages 223–238.

Popa, R. A., Redﬁeld, C. M. S., Zeldovich, N., and Bala-

krishnan, H. (2011). Cryptdb: protecting conﬁdentia-

lity with encrypted query processing. In Proceedings

of the 23rd ACM Symposium on Operating Systems

Principles 2011, SOSP 2011, Cascais, Portugal, Oc-

tober 23-26, 2011, pages 85–100.

Shamir, A. (1979). How to Share a Secret. Commun. ACM,

22(11):612–613.

Vo-Huu, T. D., Blass, E., and Noubir, G. (2015). EPiC: Efﬁ-

cient Privacy-Preserving Counting for MapReduce. In

Networked Systems - Third International Conference,

NETYS, pages 426–443.

Secure Grouping and Aggregation with MapReduce

355