performances improvement in embedded, lightweight
environments, is therefore critical.
Choice of Atmel Boards and Importance of De-
tailed Technical Specifications: We considered
two Atmel boards, both equipped with a crypto-
graphic co-processor, the (old) SAMA5D3 board (AT-
MEL, 2017b) and the (new) SAMA5D2 board (AT-
MEL, 2017a). We chose these because of their low
price and wide acceptance in industrial systems, and
because the cryptographic co-processor documenta-
tion is publicly available, a requirement for advanced
developments. This is not always the case as we dis-
covered after buying another more powerfull board:
the provided information turned out to be too lim-
ited for our needs and our academic status did not
enable us to obtain the technical documentation from
the manufacturer, even after asking their support.
A major difference exists between these two At-
mel boards, which justifies that we consider both
of them: the cryptographic co-processor of the first
board supports common AES modes but not XTS-
AES, while the second one also supports XTS-AES.
Those constraints led us to consider different imple-
mentation options that are the subject of this work.
Scientific Approach Followed in this Work: The
first step of our work was the experimental analysis
of three XTS-AES implementations: a pure software
implementation (the legacy baseline), an implemen-
tation that leverages the dedicated cryptographic co-
processor with XTS-AES support of the SAMA5D2
board (the most favourable case), and in between an
implementation that leverages the cryptographic co-
processor with ECB-AES support only of the old
SAMA5D3 board. Our benchmarks demonstrated
that the performance in all cases was still behind ex-
pectations and did not match our objective of efficient
on-the-fly encryption/decryption of large amounts of
data within the Atmel boards.
An analysis of in-kernel data paths highlighted a
limitation of plaintext sizes to a hard-coded 512 bytes
value, in particular because this is the common sector
size on most devices, and also because test vectors
are limited to a maximum of 512 bytes in the official
XTS-AES standard (IEEE Computer Society, 2008).
We therefore explored the possibility of having 4 KB
long requests (i.e., a page size), a rational choice and
a pretty natural idea for kernel operations. We called
this optimization ”extended request mode”, or extReq.
We therefore modified dm-crypt as well as the un-
derlying atmel-aes driver, two highly complex tasks,
in order to support extended encryption/decryption
requests. We then analyzed the performance impacts.
With this optimization, a mixed implementation with
the (old) SAMA5D3 ECB-AES co-processor features
roughly the same performance as that of the (new)
SAMA5D2 XTS-AES co-processor.
Finally we analyzed the existing XTS-AES cryp-
tographic co-processor of the SAMA5D2 board in or-
der to apply the extReq optimization to it directly. Un-
fortunately, because of bad design choices by Atmel,
this new cryptographic co-processor is not compati-
ble with this optimization, therefore limiting the op-
portunities for major performance improvements. We
explain why it is so and conclude this work with rec-
ommendations for future co-processor designs (the
interested reader is invited to refer to the full paper:
https://hal.archives-ouvertes.fr/hal-02555457).
Note that this work only considers cryptographic
operations over large data chunks, which is pretty
common with FDE use-cases. It does not consider
the opposite case, i.e., large numbers of small data
chuncks, which is not the target of our optimisation.
Contributions of this Work:
• this works explores the implementation of crypto-
graphic primitives in Linux systems, detailing the
complex interactions between software and hard-
ware components, and the dm-crypt kernel mod-
ule internals. Note that this work implied ma-
jor in-kernel low-level software developments and
complex performance evaluation campaigns.
• this work shows that significant performance
gains are possible thanks to the ”extended request
mode”, extReq, optimization, even with boards
that do not feature cryptographic co-processors
supporting XTS-AES. Although the idea behind
this optimisation is pretty natural, we describe
the architectural implications, we apply it to sev-
eral XTS-AES implementations, depending on
the available hardware, and provide performance
evaluation results. Note that even if this work only
considers embedded boards, it will be useful to
other execution environments.
• when we tried to apply the extReq optimization
to the XTS-AES facility of the new cryptographic
co-processor, we discovered an uncompatible de-
sign. We explain why it is so, we provide likely
explanations for this situtation, as well as recom-
mendations for future co-processor designs. This
is an important outcome of this work if we want
to boost FDE cryptographic performance.
SECRYPT 2020 - 17th International Conference on Security and Cryptography
264