privacy-preserving property thanks to an interactive
protocol executed between any trusted user that wants
to send query vectors and the data owner: this proto-
col generates a key that is used to encrypt the query
vectors and it is not possible to use this key to decrypt
the encrypted instances stored in the cloud. The data
owner must participate on the processing, even if it is
only to generate keys, therefore this protocol cannot
be classified as a non-interactive protocol. Moreover,
the protocol only finds the nearest neighbours and the
classification step is not performed.
The works (Elmehdwi et al., 2014) consider a dif-
ferent scenario: the data owner encrypts the data and
submits them to a first server, sending the secret key
to a second server. Thereby, any authorized person is
able to send a query vector to the first server, which
runs a distributed interactive protocol with the second
server (this sever may decrypt some data in this pro-
cess), and finally the first server returns the k nearest
neighbors. Even if the client does not have to pro-
cess the data, that method requires a trusted server to
store the private key, and this trusted server acts as the
client in the distributed processing scenario. Relying
on a trusted third party naturally introduces additional
substantial risk. Later, the same authors extended the
idea to the classification problem (Samanthula et al.,
2015), but the same risk of collusion remains.
Another approach is proposed in (Wong et al.,
2009), where a new cryptographic scheme called
asymmetric scalar-product-preserving encryption
(ASPE) is also proposed. The scheme preserves a
special type of scalar product, allowing the k nearest
vectors to be found without requiring an interactive
process. The scheme allows the server to calculate
inner products between dataset vectors by calculating
the inner product of encrypted vectors, determining
the vectors that are closer to the query vector. How-
ever, the authors were again only concerned with the
task of finding the nearest neighbors, not with the
classification problem. Also, a cryptographic scheme
created ad hoc for this task lacks extensive security
analysis that more general and well-established
cryptographic schemes already have. In comparison,
the building blocks in our proposal have well-known
properties and limitations.
8 CONCLUSIONS
We presented non-interactive privacy-preserving vari-
ants of the k-NN classifier for both the unweighted
and the weighted versions, and established by ex-
tensive experiments that they are sufficiently effi-
cient and accurate to be viable in practice. The pro-
posed protocol combines homomorphic encryption
and order-preserving encryption and is applicable for
running queries against private databases stored into
the cloud. To the best of our knowledge, this is the
first proposal for performing k-NN classification over
encrypted data in a non-interactive way.
If a client and a cloud already employ any joint
protocol to find nearest neighbours (for instance, by
using other cryptographic primitives instead of OPE,
or by running some interactive algorithm) then they
can use an HE scheme and the techniques presented
here to derive a class from the other classes.
As future work, possible improvements to the k-
NN presented here might involve data obfuscation
and perturbation techniques to achieve stronger se-
curity properties against inference attacks, while pre-
serving accuracy and efficiency.
ACKNOWLEDGMENTS
We thank Google Inc. for the financial support
through the Latin America Research Awards “Ma-
chine learning over encrypted data using Homomor-
phic Encryption” and “Efficient homomorphic en-
cryption for private computation in the cloud”.
REFERENCES
Alpaydin, E. (2004). Introduction to Machine Learning.
The MIT Press.
Altman, N. S. (1992). An introduction to kernel and nearest-
neighbor nonparametric regression. The American
Statistician, 46(3):175–185.
Boldyreva, A., Chenette, N., and O’Neill, A. (2011). Order-
preserving encryption revisited: Improved security
analysis and alternative solutions. In CRYPTO, vol-
ume 6841 of Lecture Notes in Computer Science,
pages 578–595. Springer.
Bost, R., Popa, R. A., Tu, S., and Goldwasser, S. (2015).
Machine learning classification over encrypted data.
In NDSS. The Internet Society.
Choi, S., Ghinita, G., Lim, H., and Bertino, E. (2014). Se-
cure knn query processing in untrusted cloud environ-
ments. IEEE Trans. Knowl. Data Eng., 26(11):2818–
2831.
Elmehdwi, Y., Samanthula, B. K., and Jiang, W. (2014).
Secure k-nearest neighbor query over encrypted data
in outsourced environments. In ICDE, pages 664–675.
IEEE Computer Society.
Gilad-Bachrach, R., Dowlin, N., Laine, K., Lauter, K. E.,
Naehrig, M., and Wernsing, J. (2016). Cryptonets:
Applying neural networks to encrypted data with high
throughput and accuracy. In ICML, volume 48 of
JMLR Workshop and Conference Proceedings, pages
201–210. JMLR.org.
ICISSP 2017 - 3rd International Conference on Information Systems Security and Privacy
370