AVX-512-based Parallelization of Block Sieving and Bucket Sieving for

the General Number Field Sieve Method

∗

Pritam Pallab and Abhijit Das

Indian Institute of Technology, Kharagpur, India

Keywords:

General Number Field Sieve Method, RSA Cryptanalysis, Line Sieving, Lattice Sieving, Block Sieving,

Bucket Sieving, Single Instruction Multiple Data (SIMD), Multi-core, Multi-thread, AVX-512, Skylake.

Abstract:

The fastest known general-purpose technique for factoring integers is the General Number Field Sieve Method

(GNFSM), in which the most time-consuming part is the sieving stage. For both line sieving and lattice sieving,

two cache-friendly extensions used in practical implementations are block sieving and bucket sieving. The

new AVX-512 instruction set in modern Intel CPUs offers some fast vectorization intrinsics. In this paper, we

report our AVX-512 based cache-friendly parallelization of block and bucket sieving for the GNFSM. We use

vectorization for both sieve-index calculations and sieve-array updates in block sieving, and for the insertion

stage in bucket sieving. Our experiments using Intel Xeon Skylake processors demonstrate a performance

boost in both single-core and multi-core environments. The introduction of cache-friendly sieving leads to a

speedup of up to 63%. On top of that, vectorization yields a speedup of up to 25%.

1 INTRODUCTION

The General Number Field Sieve Method (GN-

FSM) (Lenstra et al., 1993a) is the fastest known

technique for factoring large composite integers, like

RSA moduli. The RSA (Rivest–Shamir–Adleman)

algorithm (Rivest et al., 1978) is one of the earliest

and most widely used public-key cryptographic algo-

rithms, and exploits the difﬁculty of factoring prod-

ucts of pairs of large primes to derive its security.

The last published successful RSA factorization at-

tempt was that of an RSA modulus of length 795

bits (Boudot et al., 2020a). There is also an un-

published claim of successful factorization of RSA-

250 (Boudot et al., 2020b) which is a 829-bit RSA

modulus. All these attempts implement the GNFSM.

The GNFSM originates from a specialized form

called the Special Number Field Sieve Method

(SNFSM) (Lenstra et al., 1990) which is developed

to factor composite integers of the form r

± s, where

r, s,e ∈ Z and e > 0. It is asymptotically faster than

the GNFSM, and is used to factor the ninth Fermat

Number F

= 2

512

+ 1 (Lenstra et al., 1993b). The

SNFSM is later generalized to the GNFSM to work

for any composite integer (Buhler et al., 1993). This

method is based on a ring homomorphism (Briggs,

1998) Z[θ] → Z for a suitable algebraic number θ,

and is intended to discover a non-trivial Fermat con-

∗

Funded partially by the Ministry of Electronics and

Information Technology, India.

gruence of the form x

≡ y

(mod n).

The GNFSM consists of multiple stages, among

which sieving is the most time-consuming one taking

around 60–80% of the overall running time. There

are two main techniques used for sieving: line sieving

and lattice sieving. In this paper, we mainly focus on

line sieving. In both of these types of sieving, mem-

ory accessing plays a pivotal role. In order to mini-

mize costly cache misses, two new modiﬁcations are

introduced. These are called block sieving (Wambach

and Wettig, 1995) and bucket sieving (Aoki and Ueda,

2004). In the recent factorization attempts, the block

and bucket sieving ideas are extensively used. Earlier,

SSE2- and AVX-based SIMD parallelization tech-

niques are attempted (Sengupta and Das, 2017) for

line and lattice (Pollard, 1993) sieving. In that work,

the index-calculation part is vectorized, but the sieve-

array updating part is not. This is attributed to limited

and costly intrinsics available in previous generations

of CPUs. The recent introduction of AVX-512 offers

a new set of intrinsics, and thereby opens the opportu-

nities of exploring the potentials of fully vectorizing

the sieving stage. In this paper, we report our AVX-

512-based vectorization attempts for cache-friendly

block- and bucket-sieving variants of line sieving. We

are able to achieve speedup factors of up to 63% with

cache-friendly sieving and additional speedup factors

of up to 25% with vectorization.

The rest of the paper is organized as follows. Sec-

tion 2 deals with the background and a study of ex-

Pallab, P. and Das, A.

AVX-512-based Parallelization of Block Sieving and Bucket Sieving for the General Number Field Sieve Method.

DOI: 10.5220/0010515206530658

In Proceedings of the 18th International Conference on Security and Cryptography (SECRYPT 2021), pages 653-658

ISBN: 978-989-758-524-1

653

isting factoring algorithms. Section 3 elaborates our

vectorization approaches for both block sieving and

bucket sieving. The experimental results for single-

core and multi-core environments are presented in

Section 4. Section 5 concludes the paper with notes

on possible extensions of our current work.

2 BACKGROUND

2.1 General Number Field Sieve

Method

In order to factor n, the GNFSM starts with the selec-

tion of two irreducible polynomials f

(x) and f

(x) of

degrees d

and d

and with a common root m modulo

n. Here, f

(with d

= 1) pertains to the rational side,

whereas f

(with d

> 1) pertains to the algebraic side.

We also let θ ∈ C be a root of f

(x). Next, the rational

(RFB) and the algebraic (AFB) factor bases are cre-

ated. RFB consists of small (integer) primes bounded

by a limit B

, while the AFB consists of prime ide-

als in the number ring Q[X]/h f

(X)i of prime norms

bounded by a limit B

. For each small prime p, the

prime ideals of norm p can be obtained by identifying

the roots of the equation f

(x) modulo p, that is, by

solving the congruence f

(r) ≡ 0 (mod p).

The sieving stage uses two integer parameters a

and b with gcd(a,b) = 1. If a + bm and a + bθ are

both smooth over the respective factor bases, a re-

lation is discovered. The integer a + bm is called

smooth if it factors completely over the RFB, whereas

the algebraic number a + bθ is called smooth if the

ideal ha + bθi factors completely over the prime ide-

als in AFB. A choice of the pair (a,b) gives a relation

if and only if both the integers (−b)

(−a/b) and

(−b)

(−a/b) factor completely over the primes ≤

and B

. Using the ring homomorphism η : Z[θ] →

Z/nZ taking θ 7→ m, each relation η(a + bθ) ≡ a +

bm (mod n) gives a linear congruence. The resulting

linear system is solved to reach the Fermat congru-

ence. If gcd(x − y, n) is a trivial factor of n, we go for

the other solutions else we report the factors.

2.2 Sieving

The main focus of this paper is on the efﬁcient imple-

mentations of the sieving part mainly the line sieving

whereas the proposed methods are applicable to lat-

tice sieving in a straightforward manner. Block and

bucket sieving techniques are the cache-friendly ex-

tensions of normal sieving.

2.2.1 Block Sieving

Instead of accessing the whole sieve array S for each

factor f in the factor base FB, we divide S into multi-

ple blocks and perform sieving on one block at a time.

The entire sieve line of length 2MAX

+ 1 is subdi-

vided into b

blocks, where the size of each block

is b

= d(2MAX

+ 1)/b

e. This method is advan-

tageous if we keep the value of b

within the size of

the available cache memory. This enables the runtime

system to load a whole block of S at a time in the

cache, and for all f ∈ FB, accesses are made within

that block only. This reduces the cache misses sig-

niﬁcantly, thereby speeding up the whole sieving pro-

cess. The factor base is subdivided into two parts:

0 ≤ f ≤ FB

and FB

< f ≤ FB

MAX

. For the smaller

factors ( f ≤ FB

), block sieving is used.

2.2.2 Bucket Sieving

In order to manage the cache memory efﬁciently for

large factors also, bucket sieving is introduced. In-

stead of performing normal sieving over large factors,

buckets are created, ﬁlled, and sieved only one at a

time. The main concept relies again on the use of

only a portion of the sieve array S during accessing

and log subtractions. Let B

be the number of buck-

ets under consideration, and B

= d(2MAX

+1)/B

Bucket sieving employs a two-fold approach. For

each large factor f and for each of the sieving loca-

tion a we encounter, an element (a,log( f )) is inserted

in the b(a + MAX

)/B

c-th bucket. The size of each

bucket is kept within the limits of the available cache

memory. Later, we iterate over all the buckets one

by one, popping its elements and performing S up-

dates accordingly. As each of the buckets holds the

elements having a within the range of the cache size,

cache misses are reduced drastically.

3 OUR IMPLEMENTATION

APPROACH

In this section, we elaborate our approach of using

SIMD (Single Instruction Multiple Data) in the con-

text of block sieving and bucket sieving. The latest

SIMD feature added by Intel is AVX-512 which sup-

ports 512-bit registers. In the context of sieving, it

allows us to perform 16 index calculations and log

subtractions in a data-parallel fashion.

SECRYPT 2021 - 18th International Conference on Security and Cryptography

654

Table 1: AVX-512 SIMD instructions used.

AVX-512 intrinsic Pseudo-function

mm512 load epi32 (void const*

mem addr)

simd load

mm512 store epi32 (void* mem addr,

m512i a)

simd store

mm512 add epi32 ( m512i a, m512i b) simd add

mm512 sub epi32 ( m512i a, m512i b) simd sub

mm512 reduce min epi32 ( m512i a) simd minimum

mm512 i32gather epi32 ( m512i vindex,

void const* base addr, int scale)

simd gather

mm512 i32scatter epi32 (void*

base addr, m512i vindex, m512i

a, int scale)

simd scatter

3.1 SIMD-based Block Sieving

In Intel’s AVX-512 intrinsics, a special data type

m512i can store sixteen 32-bit integers in a vec-

tor. In (Sengupta and Das, 2017), vectoriza-

tion of the subtraction phase of the sieve array

is avoided in line and lattice sieving because of

limitations of AVX. With AVX-512, we can per-

form SIMD-level parallelization of both index cal-

culations and sieve-array modiﬁcations. We pack

16 primes p

, p

i+1

,. .., p

i+15

= p[i : i + 15] from the

factor base FB into a m512i SIMD variable ∆

and their log values into another SIMD variable

∆

log p

. For a ﬁxed b, we calculate the starting siev-

ing locations a

i+1

,. ..,a

i+15

= a

[i : i + 15] for

(r[i : i + 15], p[i : i + 15]). Then, we pack another

m512i variable ∆

with a

[i : i + 15]. Now, we keep

on incrementing ∆

by ∆

over the entire a-line up

to MAX

to ﬁnd out 16 sieving cell indices at a time.

Using the AVX-512 intrinsic gather, we collect the

values S[∆

] and store them in ∆

, and subtract ∆

log p

from ∆

. Then, we store the subtracted components

back to their corresponding locations using another

AVX-512 intrinsic scatter. This enables us to shorten

the outer factor-base loop by a factor of 16 at the

cost of some SIMD overhead. The index vector ∆

packed with the starting indices only once for a par-

ticular p[i : i + 15].

Moreover, the incremental addition of ∆

to ∆

Table 2: Details of the pseudo functions.

Pseudo-function Description

allocate memory Allocates memory to the array.

is incomplete Checks if an element has pending iterations.

process bucket For each (a,log p) stored in the bucket, S[a +

MAX

] − log p is performed emptying the

bucket.

insert element Inserts (a,log(p)) into a given bucket.

populate sieve array For a given b and a ∈ [A

], S[a + MAX

] is

populated by log|(−b)

f (−a/b)|.

initialize Initializes array elements with given value.

Algorithm 1: SIMD-based block sieving.

1 for b ← B

to B

2 for A

← −MAX

to MAX

in steps of b

3 A

← minimum(A

+ b

,MAX

)

4 populate sieve array (S,b,A

)

5 for each p[i : i + 15] ∈ FB such that p[ j] ≤ FB

i ≤ j ≤ i + 15, do

6 ∆

← simd load(p[i : i + 15])

7 ∆

log p

← simd load(log p[i : i + 15])

8 if A

equals −MAX

then

9 a

[i : i + 15] ← initial sieving points

10 ∆

← simd load(a

[i : i + 15])

11 while simd minimum(∆

)≤ A

12 ∆

← simd gather(S,∆

)

13 ∆

← simd sub(∆

,∆

log p

)

14 simd scatter(S,∆

,∆

)

15 ∆

← simd add(∆

,∆

)

16 a

[i : i + 15] ← simd store(∆

)

17 for j ← i to i +15 do

18 if is incomplete(a

[ j],A

) then

19 while a

[ j] ≤ A

20 S[a

[ j]+ MAX

] ←

S[a

[ j] + MAX

] − log(p[ j])

21 a

[ j] ← a

[ j] + p[ j]

does not require unpacking of any of the SIMD reg-

isters. Therefore we achieve effective vectorization

of sieving-index calculations with 16-fold speedup.

However, gathering and scattering costs after each in-

dex increment introduce some overhead. Algorithm 1

elaborates the steps of AVX-512-based block sieving.

Table 1 lists the AVX-512 intrinsics used in the im-

plementation of this algorithm.

3.2 SIMD-based Bucket Sieving

In bucket sieving, updating indices are calculated sep-

arately and stored in buckets. Later, the buckets are

emptied followed by sieve-array updates. The bucket-

ﬁlling part is SIMD-friendly. In bucket sieving, we

work with the large primes (p > FB

) of the factor

base FB. We take 16 primes p[i : i + 15] at a time, and

store them in an SIMD variable ∆

. We also calculate

the initial locations a

[i : i + 15], and store them in an-

other SIMD variable ∆

. Then, we keep on ﬁnding 16

new sieving locations using an SIMD increment of ∆

by ∆

, and ﬁll the buckets.

We start by allocating memory to the array BARR

of buckets. In order to keep track of the numbers of

elements in the buckets in the array BARR, we main-

tain another array B

. For efﬁcient memory usage,

we pre-allocate each bucket in the bucket array BARR

with a maximum element capacity of BUC

MAX

. The

AVX-512-based Parallelization of Block Sieving and Bucket Sieving for the General Number Field Sieve Method

655

value of BUC

MAX

is determined according to the size

of the cache memory so that during the bucket-pop

operation, cache-miss rates are minimized. During

each insertion, we keep a check whether any bucket

exceeds its capacity. If it so happens, we pop all

the elements from that bucket, and update the siev-

ing array at the stored locations. This strategy also

eliminates the need of malloc and free operations

of bucket entries after individual insert and pop op-

erations. These memory operations are atomic, so

avoiding them inside the loop boosts parallelism in

multi-threaded implementations.

Algorithm 2 summarizes these implementation

ideas. The workings of the pseudo-functions used in

this algorithm are explained in Table 2.

4 EXPERIMENTAL RESULTS

4.1 Hardware and Software Setup

We use Intel’s Xeon Gold Series (Model No. 6130)

processor clocked at 2.10 GHz with an L3 Cache of

size 22 MB. The gcc compiler (version 9.2.0), GMP

library (version 6.1.2)and OpenMP API (version 4.5)

is used. For calculating the prime ideals, we use

Victor Shoup’s NTL library (version 11.3.2) (Shoup

et al., 2020). The optimization ﬂag -O3 and the in-

trinsic ﬂag -mavx=native are used. In the multi-core

implementations, we use all of the 16 cores of a sin-

gle processor. The operating system is CentOS Linux

release 7.4.1708 (Core).

4.2 Data Setup

As a test bench, we here consider the two numbers

RSA-512 and RSA-768 which are factored as re-

ported in (Cavallar et al., 2000) and (Kleinjung et al.,

2010). In each of the cases, we consider the same

polynomials that are used in the actual factorization

attempts. Suitable partitioning of the factor base be-

tween block- and bucket-sieving primes has a ma-

jor impact on the overall running time. We vary the

small-versus-large demarcation boundary FB

based

on the sieving range MAX

across various test cases.

For our multi-threaded implementation, we use

the OpenMP directive #pragma omp parallel for

to launch 16 threads expected to map to the individ-

ual cores. We allocate different segments of b values

to the different threads in order to avoid concurrent

writes. The read-only p and log p arrays are shared

by all the threads, so that they can stay loaded in the

cache. We have chosen the same limiting values (up-

per) for both the factor bases: B

= B

= MAX

4.3 Timing Results

Table 3 reports the timings T

±v

±b

of our implemen-

tations of sieving. The subscript indicates whether

cache-friendly (block/bucket) sieving is used (+b) or

not (−b), whereas the superscript indicates whether

vectorization is used (+v) or not (−v). For exam-

ple, T

−v

indicates the timing of our non-vectorized

implementation with block and bucket sieving. All

the times are in seconds, and stand for the com-

bined times of rational sieving and algebraic siev-

ing. Each sieving includes the time taken by the pre-

computation of initial indices, index increments and

log subtractions, and locating potential sieving loca-

tions. The time for ﬁnal trial divisions (relation gen-

eration) is excluded here. The number of threads uti-

lized is denoted as N

. Each of the reported times is

the average over 100 test cases.

Based on these four sets of timings, we calculate

four relevant sets of speedup ﬁgures. The speedup

of [T+] over [T−] is calculated as



[T−] − [T+]

[T−]



100%, where both the signs ± appear either in the

subscript or in the superscript with the other kept

unchanged. For example, ψ

−v

− T

−v

100% indicates the speedup obtained by vec-

torization on cache-friendly sieving, and ψ

−v

−b

− T

−v

−b

× 100% indicates the speedup ob-

tained by cache-friendly sieving without vectoriza-

tion.

The experimental data establishes two facts. First,

AVX-512-based vectorization achieves a speedup of

up to 56% in non-cache-friendly sieving and up to

25% in cache-friendly block and bucket sieving over

non-vectorized implementations. Second, the effec-

tiveness of cache-friendly sieving is manifested by a

speedup of up to 63% both with and without vector-

ization. In particular, the best running times are ob-

tained with both cache-friendly sieving and vectoriza-

tion (the column headed T

5 CONCLUSION

In this paper, we report the practical effectiveness

of block and bucket sieving and AVX-512-based

vectorization. This study establishes the usefulness of

exploiting latest hardware features for implementing

time-consuming algorithms like the GNFSM for fac-

toring integers. There are several ways in which our

study can be extended. Both cache-friendly sieving

SECRYPT 2021 - 18th International Conference on Security and Cryptography

656

Algorithm 2: SIMD-based bucket sieving.

1 B

← (2 × MAX

+ 1)/B

// Total number of buckets

2 BARR ← allocate memory(B

× BUC

MAX

) // Memory for all buckets

3 B

← initialize(0) // All buckets are initially empty

4 for b ← B

to B

5 for A

← −MAX

to MAX

in steps of b

6 A

← minimum(A

+ b

,MAX

)

7 populate sieve array (S,b,A

) // Initialize with log values

8 block sieving(A

) // Use Algorithm 1 to handle small primes

9 for each p[i : i + 15] ∈ FB

LIST

such that p[ j] > FB

, i ≤ j ≤ i + 15 do

10 ∆

← simd load(p[i : i + 15])

11 a

[i : i + 15] ← initial sieving points

12 ∆

← simd load(a

[i : i + 15]) // Calculate next indices

13 while simd minimum(∆

) ≤ MAX

14 for each (a

[ j], p[ j]) in (a

[i : i + 15], p[i : i + 15]) do

15 b

← (a

[ j] + MAX

)/B

// The bucket number

16 if B

] equals BUC

MAX

then

17 // Bucket capacity reached

18 process bucket(S,BARR,B

)

19 B

] ← 0 // Bucket ﬂushed

20 insert element(S, BARR,B

[ j], log(p[ j]))

21 B

] ← B

] + 1 // Update entry stored

22 ∆

← simd add(∆

,∆

)

23 a

[i : i + 15] ← simd store(∆

)

24 for j ← i to i + 15 do

25 if is incomplete(a

[ j], MAX

) then

26 while a

[ j] < MAX

27 b

← (a

[ j] + MAX

)/B

28 if B

] equals BUC

MAX

then

29 process bucket(S,BARR,B

)

30 B

] ← 0

31 insert element(S,BARR,B

[ j], log(p[ j]))

32 B

] ← B

] + 1

33 a

[ j] ← a

[ j] + p[ j]

34 for b

← 1 to B

35 process bucket(S,BARR,B

) // Use all non-empty buckets

and the use of vectorization are expected to boost lat-

tice sieving by the same margins as line sieving. How-

ever, explicit experiments are not carried out with lat-

tice sieving. Block sieving is effectively vectorized,

but bucket sieving has further rooms for investigation,

particularly in the bucket emptying process.

The current processor technology imposes restric-

tions on the processing speed in the presence of SIMD

utilization. For Xeon 6130 processors, individual

cores work at a speed of 2.1 GHz, but enabling AVX2

or AVX-512 reduces the frequency to 60–65% (Wi-

kiChip, 2020). Further reduction happens with the

increasing number of cores. This is one of the main

reasons behind not achieving the ideal speedup in the

case of multi-threaded implementations. Finding a

balance between the use of multiple cores and the use

of SIMD features remains a challenging practical area

of investigation.

REFERENCES

Aoki, K. and Ueda, H. (2004). Sieving using bucket sort.

In International Conference on the Theory and Appli-

cation of Cryptology and Information Security, pages

92–102. Springer.

AVX-512-based Parallelization of Block Sieving and Bucket Sieving for the General Number Field Sieve Method

657

Table 3: Timing and speedup ﬁgures.

Parameters Sieving time (sec) Percentage speedup

MAX

−v

−b

−v

−b

−v

RSA-512

5 × 10

11 1 4.5 × 10

5 × 10

0.281 0.131 0.151 0.098 46.26 25.19 53.38 35.10

3 × 10

11 1 4.5 × 10

5 × 10

2.056 0.968 1.270 0.737 38.23 23.86 52.92 41.97

3 × 10

11 1 2

3 × 10

2.255 1.068 1.300 0.805 42.35 24.63 52.64 38.08

3 × 10

161 16 2

3 × 10

6.299 2.321 5.929 2.212 5.87 4.70 63.15 62.69

RSA-768

5 × 10

11 1 4.5 × 10

5 × 10

0.386 0.192 0.171 0.160 55.70 16.67 50.26 6.43

3 × 10

11 1 4.5 × 10

5 × 10

2.506 1.368 1.332 1.137 46.85 16.89 45.41 14.64

3 × 10

11 1 2

3 × 10

2.947 1.427 1.511 1.186 48.73 16.89 51.58 21.51

3 × 10

161 16 2

3 × 10

7.499 4.072 6.691 3.763 10.77 7.59 45.70 43.76

Boudot, F., Gaudry, P., Guillevic, A., Heninger, N., Thom

E., and Zimmermann, P. (2020a). Comparing the dif-

ﬁculty of factorization and discrete logarithm: a 240-

digit experiment. arXiv preprint arXiv:2006.06197.

Boudot, F., Gaudry, P., Guillevic, A., Heninger, N., Thom

E., and Zimmermann, P. (2020b). Factorization of

rsa-250. https://caramba.loria.fr/rsa250.txt. Accessed:

2021-02-08.

Briggs, M. E. (1998). An introduction to the general number

ﬁeld sieve. PhD thesis, Virginia Tech.

Buhler, J. P., Lenstra, H. W., and Pomerance, C. (1993).

Factoring integers with the number ﬁeld sieve. In The

development of the number ﬁeld sieve, pages 50–94.

Springer.

Cavallar, S., Dodson, B., Lenstra, A. K., Lioen, W., Mont-

gomery, P. L., Murphy, B., Te Riele, H., Aardal, K.,

Gilchrist, J., Guillerm, G., et al. (2000). Factoriza-

tion of a 512-bit rsa modulus. In International Confer-

ence on the Theory and Applications of Cryptographic

Techniques, pages 1–18. Springer.

Kleinjung, T., Aoki, K., Franke, J., Lenstra, A. K., Thom

E., Bos, J. W., Gaudry, P., Kruppa, A., Montgomery,

P. L., Osvik, D. A., et al. (2010). Factorization of a

768-bit rsa modulus. In Annual Cryptology Confer-

ence, pages 333–350. Springer.

Lenstra, A. K., Hendrik Jr, W., et al. (1993a). The develop-

ment of the number ﬁeld sieve, volume 1554. Springer

Science & Business Media.

Lenstra, A. K., Lenstra, H. W., Manasse, M. S., and Pollard,

J. M. (1993b). The factorization of the ninth fermat

number. Mathematics of Computation, 61(203):319–

349.

Lenstra, A. K., Lenstra Jr, H. W., Manasse, M. S., and Pol-

lard, J. M. (1990). The number ﬁeld sieve. In Proceed-

ings of the twenty-second annual ACM symposium on

Theory of computing, pages 564–572. ACM.

Pollard, J. M. (1993). The lattice sieve. In The development

of the number ﬁeld sieve, pages 43–49. Springer.

Rivest, R. L., Shamir, A., and Adleman, L. (1978). A

method for obtaining digital signatures and public-

key cryptosystems. Communications of the ACM,

21(2):120–126.

Sengupta, B. and Das, A. (2017). Use of simd-based data

parallelism to speed up sieving in integer-factoring

algorithms. Applied Mathematics and Computation,

293:204–217.

Shoup, V. et al. (2020). Ntl: A library for doing number

theory.

Wambach, G. and Wettig, H. (1995). Block sieving algo-

rithms. Citeseer.

WikiChip (2020). Intel xeon gold 6130.

https://en.wikichip.org/wiki/intel/xeon gold/6130.

Accessed: 2021-02-08.

SECRYPT 2021 - 18th International Conference on Security and Cryptography

658