used memory elements (Flip Flops) and boolean logic
implementation resources (Lookup Tables), and con-
firms that using this coefficient reduction approach
decreases substantially the consumed FPGA area. We
report in tables the Area-Time (AT) product as an ef-
ficiency indicator to compare the designs, and com-
puted as the number of occupied CLBs times the
execution time in milliseconds. The gathered data
suggests that the x-net architecture is one order of
magnitude more efficient when employed to compute
polynomial multiplications during encapsulations in
NTRU rings. During the decapsulation, this no longer
holds, as we recall that one of the three multiplica-
tions specified in round 3 submission of NTRU does
not have one operand with small coefficients, thus re-
quiring an additional cost (indicated with a ? marker
in the table).
Table 4 reports the comparison of our
cryptosystem-specialized designs with the exist-
ing state of the art on NTRU and NTRU Prime linear
time multipliers. We note that our design achieves a
30% to 40% reduction in the required CLBs for both
cryptosystems, when comparing our solution which
loads a single large coefficient (x-net) with the one
in (Farahmand et al., 2019). Furthermore, we also
obtain a 28% to 96% gain in working frequency with
respect to the same design, therefore achieving also a
higher area-time efficiency. We compare our solution
loading two large coefficients at once, with the only
currently available datapoint in the public technical
report (Carter et al., 2022). The solution reported in
the technical report, where it is denoted as x
2
-net,
is 10% larger in area a 2.2× slower in the working
frequency for the design for NTRU. These results
show how the x-net design is a remarkable fit for the
R
p
× R
q
multiplications in NTRU and NTRU Prime.
5 CONCLUSION
In this work, we analyzed a flexible design for linear-
time polynomial multiplications, applicable to ac-
celerate four post-quantum cryptographic primitives:
Kyber, Saber, NTRU and NTRU Prime. We reported
quantitative results of the efficiency of primitive-
tailored designs, obtaining area savings (10%–40%)
and significant frequency gains (96%–120%) with re-
spect to the state of the art of NTRU and NTRU Prime
multipliers. Our unified design provides the first hard-
ware implementation of a polynomial multiplier able
to accelerate the computation of Kyber, Saber, NTRU
and NTRU Prime at all security levels in a single com-
ponent with a 15% frequency reduction, and only a
third of a dedicated multiplier in area increase.
REFERENCES
Alagic, G., Apon, D., Cooper, D., Dang, Q., Dang, T.,
Kelsey, J., Lichtinger, J., Miller, C., Moody, D., Per-
alta, R., Perlner, R., Robinson, A., Smith-Tone, D.,
and Liu, Y.-K. (2022). . https://doi.org/10.6028/NIST.
IR.8413-upd1.
Basso, A. and Roy, S. S. (2021). Optimized polynomial
multiplier architectures for post-quantum KEM saber.
In 58th ACM/IEEE Design Automation Conference,
DAC 2021, San Francisco, CA, USA, December 5-9,
2021, pages 1285–1290. IEEE.
Carter, E., He, P., and Xie, J. (2022). High-performance
polynomial multiplication hardware accelerators for
KEM saber and NTRU. IACR Cryptol. ePrint Arch.,
page 628.
Dang, V. B., Mohajerani, K., and Gaj, K. (2021). High-
Speed Hardware Architectures and FPGA Bench-
marking of CRYSTALS-Kyber, NTRU, and Saber.
IACR Cryptol. ePrint Arch., page 1508.
Farahmand, F., Dang, V. B., Nguyen, D. T., and Gaj, K.
(2019). Evaluating the potential for hardware accel-
eration of four ntru-based key encapsulation mech-
anisms using software/hardware codesign. In Ding,
J. and Steinwandt, R., editors, Post-Quantum Cryp-
tography - 10th International Conference, PQCrypto
2019, Chongqing, China, May 8-10, 2019 Revised
Selected Papers, volume 11505 of Lecture Notes in
Computer Science, pages 23–43. Springer.
Karatsuba, A. (1963). Multiplication of multidigit numbers
on automata. In Soviet physics doklady, volume 7,
pages 595–596.
Liu, B. and Wu, H. (2015). Efficient architecture and im-
plementation for ntruencrypt system. In IEEE 58th In-
ternational Midwest Symposium on Circuits and Sys-
tems, MWSCAS 2015, Fort Collins, CO, USA, August
2-5, 2015, pages 1–4. IEEE.
Marotzke, A. (2020). A constant time full hardware imple-
mentation of streamlined NTRU prime. In Liardet, P.
and Mentens, N., editors, Smart Card Research and
Advanced Applications - 19th International Confer-
ence, CARDIS 2020, Virtual Event, November 18-19,
2020, Revised Selected Papers, volume 12609 of Lec-
ture Notes in Computer Science, pages 3–17. Springer.
NIST PQC Team (2022). PQC Standardization
Process: Announcing Four Candidates to
be Standardized, Plus Fourth Round Can-
didates. https://csrc.nist.gov/news/2022/
pqc-candidates-to-be-standardized-and-round-4.
Peng, B., Marotzke, A., Tsai, M., Yang, B., and Chen, H.
(2021). Streamlined NTRU prime on FPGA. IACR
Cryptol. ePrint Arch., page 1444.
Sklavos, N., Chaves, R., di Natale, G., and Regazzoni, F.
(2017). Hardware Security and Trust: Design and De-
ployment of Integrated Circuits in a Threatened En-
vironment. Springer Publishing Company, Incorpo-
rated, 1st edition.
ICISSP 2023 - 9th International Conference on Information Systems Security and Privacy
88