for XOR. It must be stressed that there is nothing in-
herently different from p
L
, which renders it immune
to the phantom gradient attack. It is most probably a
question of finding the correct work around this "one-
half"-challenge. These first results hold promise, as
it shows that gradual learning of neural networks can
also be applied to key recovery in cryptology.
6 FUTURE WORK
There is much room for future work on the phan-
tom gradient attack. In particular, research regarding
good replacement functions. Ideally, the replacement
function should keep as many as the properties of tra-
ditional XOR. For example: (x ⊕ y) ⊕ x should ide-
ally be y in the replacement function as well. More
generally, there is much room for attempting to at-
tack other cryptosystems. For example, if we use the
phantom gradient attack to attack a public cryptogra-
phy scheme, we can use the public key to generate
as many training samples as needed. Then we can
use the phantom gradient attack to attack the decryp-
tion function: f
c
(k
private
) = p, The subscript c is the
generated ciphertext, k
private
is the secret private key,
and p is the chosen plaintext. We subscript c since,
for each iteration, we assume that it is constant like
we did with the plaintext in this work. The attack
may also be extended even to work when the plaintext
is unknown; however, this will likely require many
training samples. As the phantom gradient attack is a
new cryptanalytical attack, there is room for studying
how to protect against it. Since it draws its founda-
tion from neural networks, one could draw from cases
where neural networks struggle. For example, learn-
ing works better on deep networks rather than wide
networks. A cryptosystem that has to be represented
as a wide network may be less vulnerable to a phan-
tom gradient attack. For training the network, we tried
gradient descent and gradient descent with momen-
tum in this paper. However, other optimizers remain
untested. Two natural candidates are the neural net-
work optimizers ADAM and RMSProp. Moreover, it
is not obvious that square error is the most suited loss
function. Testing different optimizers and loss func-
tions are low hanging fruits for future research.
ACKNOWLEDGMENTS
The author wishes to give a special thanks to Audun
Jøsang and Thomas Gregersen for valuable discussion
and words of encouragement.
REFERENCES
Alani, M. M. (2012). Neuro-cryptanalysis of des and triple-
des. In International Conference on Neural Informa-
tion Processing, pages 637–646. Springer.
Bernstein, D. J. (2019). Crypto competitions: Cae-
sar submissions. https://competitions.cr.yp.to/
caesar-submissions.html. (Accessed on 03/19/2020).
Dobraunig, C., Eichlseder, M., Mendel, F., and Schläffer,
M. (2016). Ascon v1.2. Submission to Round 3 of the
CAESAR competition.
Dobraunig, C., Eichlseder, M., Mendel, F., and Schläffer,
M. (2019). Ascon v1.2. Submission to Round 1 of the
NIST Lightweight Cryptography project.
Dourlens, S. (1996). Applied neuro-cryptography and
neuro-cryptanalysis of des. Master Thesis. Advisor:
Riesner, Christian.
Erhan, D., Bengio, Y., Courville, A., and Vincent, P. (2009).
Visualizing higher-layer features of a deep network.
University of Montreal, 1341(3):1.
Greydanus, S. (2017). Learning the enigma with recurrent
neural networks. arXiv preprint arXiv:1708.07576.
Gritsenko, A. A., D’Amour, A., Atwood, J., Halpern, Y.,
and Sculley, D. (2018). Briarpatches: Pixel-space in-
terventions for inducing demographic parity. arXiv
preprint arXiv:1812.06869.
Kinzel, W. and Kanter, I. (2002). Neural cryptography. In
Proceedings of the 9th International Conference on
Neural Information Processing, 2002. ICONIP’02.,
volume 3, pages 1351–1354. IEEE.
Klimov, A., Mityagin, A., and Shamir, A. (2002). Analysis
of neural cryptography. In International Conference
on the Theory and Application of Cryptology and In-
formation Security, pages 288–298. Springer.
Lewis, J. P. (1988). Creation by refinement: a creativity
paradigm for gradient descent learning networks. In
ICNN, pages 229–233.
NIST (2020). Lightweight cryptography | csrc. https://
csrc.nist.gov/projects/lightweight-cryptography. (Ac-
cessed on 03/19/2020).
Parascandolo, G., Huttunen, H., and Virtanen, T. (2016).
Taming the waves: sine as activation function in deep
neural networks.
Portilla, J. and Simoncelli, E. P. (2000). A parametric tex-
ture model based on joint statistics of complex wavelet
coefficients. International journal of computer vision,
40(1):49–70.
Shannon, C. E. (1949). Communication theory of se-
crecy systems. The Bell System Technical Journal,
28(4):656–715.
Simonyan, K., Vedaldi, A., and Zisserman, A. (2013).
Deep inside convolutional networks: Visualising im-
age classification models and saliency maps. arXiv
preprint arXiv:1312.6034.
Zhang, J., He, T., Sra, S., and Jadbabaie, A. (2019).
Analysis of gradient clipping and adaptive scaling
with a relaxed smoothness condition. arXiv preprint
arXiv:1905.11881.
ICISSP 2021 - 7th International Conference on Information Systems Security and Privacy
626