GPU-Based Brute Force Cryptanalysis of KLEIN

Cihangir Tezcan

Department of Cyber Security, Graduate School of Informatics, Middle East Technical University, Ankara, Turkey

Keywords:

Cryptanalysis, Lightweight Cryptography, GPU, KLEIN.

Abstract:

KLEIN is a family of lightweight block ciphers that supports 64-bit, 80-bit, and 96-bit secret keys. In this

work, we provide a CUDA optimized table-based implementation of the KLEIN family which does not contain

shared memory bank conﬂicts. Our best optimization reach more than 45 billion 64-bit KLEIN key searches

on an RTX 4090. Our results show that KLEIN block cipher is susceptible to brute force attacks via GPUs.

Namely, in order to break KLEIN in a year via brute force, one needs around 13, 1.34 million, and 111 billion

RTX 4090 GPUs for 64-bit, 80-bit, and 96-bit secret keys, respectively. We recommend lightweight designs

to avoid short keys.

1 INTRODUCTION

The Advanced Encryption Standard (AES) (Dae-

men and Rijmen, 2002) is arguably responsible for

most of the encrypted data and after more than 20

years of cryptanalysis efforts, it is still secure against

all known cryptanalysis techniques. Although AES

is suitable and optimized for many platforms and

use cases, resource-constrained devices might beneﬁt

from different encryption algorithms in terms of hard-

ware size, latency, throughput, or battery consump-

tion. Hence, many lightweight block ciphers were

proposed for many different devices and platforms.

Security of modern ciphers does not depend on se-

curity by obscurity techniques. Instead, cipher de-

signs are public and a well-designed cipher is se-

cure as long as its secret key is generated randomly

and kept secret. Thus, a well-designed encryption

algorithm is resistant against non-generic attacks.

Whereas generic attacks provide a security upper

bound. For instance, regardless of the design of a ci-

pher, an attacker can capture a plaintext and its corre-

sponding ciphertext under a secret key and encrypt the

plaintext with every possible key to check if the ex-

pected ciphertext is observed. Such an attack is called

brute force or exhaustive key search attack. For a k-

bit secret key, such an exhaustive search requires at

most 2

encryptions. Thus, the key size k must be se-

lected depending on the current and foreseeable future

technology to prevent generic attacks. For instance,

112-bit secret keys are assumed by NIST (Barker and

https://orcid.org/0000-0002-9041-1932

Roginsky, 2019) to be secure until 2030 and maybe

later. However, there are some ISO/IEC standard

encryption algorithms that support 80-bit keys. Al-

though AES key size is at least 128 bits, some of the

lightweight designs use shorter keys for better perfor-

mance. Yet short keys might make them susceptible

to brute force attacks.

KLEIN (Gong et al., 2011) is an example for such

a lightweight block cipher. It is software-oriented

and it has AES-like design. However, unlike AES,

KLEIN supports three short key sizes: 64-bit, 80-bit,

and 96-bit. Thus, KLEIN can provide short term se-

curity due to its short key sizes. However, the length

of this short term security depends on the current tech-

nology and it should be calculated so that the users

might have an idead about how long their encrypted

data will remain secret.

An exhaustive search attack is easily paralleliz-

able since we are performing the same encryption op-

eration with a different candidate key. To perform

the compuations, an attacker can use central process-

ing units (CPUs), graphics processing units (GPUs),

FPGSs, or ASICS. GPUs outperforms CPUs in par-

allelizable operations since they have thousands of

cores and they have single instruction multiple data

architecture. FPGAs can outperform GPUs especially

when the operations are not memory intensive. More-

over, FPGAs might be energy efﬁcient compared to

GPUs but they require expertise and they are not as

easily accessible as GPUs. Since ASICs are dedicated

devices, they outperform FPGAs or GPUs but manu-

facturing costs must be considered since these devices

884

Tezcan, C.

GPU-Based Brute Force Cryptanalysis of KLEIN.

DOI: 10.5220/0012461900003648

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 10th International Conference on Information Systems Security and Privacy (ICISSP 2024), pages 884-889

ISBN: 978-989-758-683-5; ISSN: 2184-4356

can only perform a speciﬁc function.

Exhaustive search attack implementation of a

symmetric key encryption algorithm on a GPU can be

categorized into three methods: Naive, table-based,

and bitsliced. In a naive implementation, every oper-

ation of the encryption algorithm is implemented as

they are. Table-based implementations aim to pre-

compute and store outputs of layers of the cipher for

every possible input. Thus, they can be regarded as a

time-memory trade-off for a naive implementation.

Since the input space is large, table-based imple-

mentations partition the input space so that these par-

titions can be computed and stored independently and

their results can be combined at the end. Such pre-

computed tables are called T-tables. In GPU imple-

mentations, these tables are generally stored in the

shared memory for better performance, instead of

other memory types like global or constant memory.

The bottleneck in this approach is the bank conﬂicts

in the shared memory and the inability to use large T-

tables due the limited shared memory size of GPUs.

In bitslicing technique every bit is kept in a differ-

ent variable. This approach removes the operations

that are needed to access a single bit in a byte or a

larger data type. Bitsliced implementations are fa-

vorable when the state of the cipher is small and the

cipher design contains operations on bits like in the

case of CRYPTO1 (Tezcan, 2017). However, an ef-

ﬁcient bitsliced GPU implementation is also provided

in (Nishikawa et al., 2017) for AES.

Having a fast implementation of a cipher can be

used for many purposes other than fast encryption.

For instance, the current best GPU implementation

of AES (Tezcan, 2021) is used in (Belorgey et al.,

2023) as AES-CTR-based masking function in their

aggregation protocol on the concept of counter-based

cryptographically-secure pseudorandom number gen-

erators (csPRNGs), a concept that is used by Face-

book in their torchcsprng csPRNG. They improved

upon torchcsprng using the optimizations of (Tez-

can, 2021) and obtained 100x speedup in the masking

function compared to a single CPU core.

A fast implementation can be used to experi-

mentally verify theoretically obtained results. More-

over, it also allows us to check the strength of the

brute force attacks on short keys. For instance, key

length of block ciphers were revisited in (Tezcan,

2022) where it was shown that 56-bit DES and 80-

bit PRESENT secret keys are well within the reach

of current GPU technology.

In this work we use the ideas of (Tezcan, 2021)

which were used to remove shared memory bank con-

ﬂicts in GPU optimizations of AES. Since KLEIN

has AES-like structure, we obtained a shared mem-

ory bank conﬂict free optimization of KLEIN and we

can try 2

35.40

64-bit KLEIN keys per second on an

RTX 4090 GPU.

Security of KLEIN were analyzed against known

cryptanalytic techniques and a full-round truncated

differential attack on KLEIN with 64-bit was pro-

vided by (Lallemand and Naya-Plasencia, 2014).

This attack requires 2

57.07

encryptions and 2

54.5

data.

Time and data complexities of this attack was im-

proved in (Rasoolzadeh et al., 2017) which now re-

quires 2

54.9

encryptions and 2

48.6

data. Note that these

attacks still require huge amount of encryptions and

the authors of those attacks could not verify them in

practice. Thus, having fast and optimized implemen-

tations are crucial for veriﬁcation of theoretically ob-

tained results. Note that it takes less than 8.5 days

to verify the attack of (Rasoolzadeh et al., 2017) us-

ing our optimized codes on a single RTX 4090. With

multiple GPUs, the veriﬁcation can be done in hours.

Attacks slightly better than the exhaustive search

on every three version of KLEIN were also obtained

in the literature. A biclique attack on 64-bit version

of KLEIN was provided in (Ahmadian et al., 2015)

which requires 2

62.8

encryptions. Similarly, biclique

attacks on all versions of KLEIN were provided in

(Abed et al., 2012) which require 2

and 2

95.8

en-

cryptions for the key sizes of 80 bits and 96 bits, re-

spectively. Our GPU optimizations can be used to ver-

ify full or reduced versions of these attacks.

2 KLEIN

KLEIN is a software-oriented lightweight block ci-

pher family that was proposed at RFIDSec 2011

(Gong et al., 2011). It has a compact implementa-

tion design and requires low memory both in hard-

ware and software. This makes KLEIN suitable for

resource-constrained devices like wireless sensors or

RFID tags.

KLEIN is a Substitution-Permutation Network

that works on blocks of 64 bits. It supports three key

lengths k, namely 64, 80, and 96 bits and we denote

these versions by KLEIN-k. The number of rounds

for these key lengths are 12, 16, and 20, respectively.

A 64-bit state of KLEIN is represented by 16 nibbles.

Each round consists of 4 layers:

1. Round key is XORed with the state.

2. A 4 × 4 S-box is applied to the state 16 times in

parallel.

3. The state is rotated two bytes to the left.

4. The state is divided into two parts and both of

them are multiplied by the MDS matrix of AES.

GPU-Based Brute Force Cryptanalysis of KLEIN

885

Table 1: The speciﬁcations of the GPUs that are used in this work. CC denotes CUDA compute capability.

GPU Cores Clock Rate CC Architecture

MX 250 384 1582 MHz 6.1 Pascal

GTX 970 1664 1253 MHz 5.2 Maxwell

RTX 2070 Super 2560 1770 MHz 7.5 Turing

RTX 4090 16384 2550 MHz 8.9 Lovelace

Round keys are generated from the master key and

it consists of XOR, swap, four S-box, and round con-

stant XOR operations. A more detailed information

for KLEIN can be found in (Gong et al., 2011).

In this work, our main aim is to optimize KLEIN

on GPUs. We used many different GPUs from differ-

ent architectures to show that our optimizations are

not valid only for a speciﬁc GPU. The speciﬁcations

of the GPUs that are used in this work are provided in

Table 1.

3 CUDA OPTIMIZATION OF

KLEIN

To the best of our knowledge, the best known GPU

optimization of AES was provided in (Tezcan, 2021).

It is a table-based implementation where the tables

are kept in the shared memory of the GPU and due to

a good arrangement of the tables, no shared memory

bank conﬂicts occur when different threads in a warp

try to read the same table value. Since KLEIN has

an AES-like structure, it is desirable to use the same

approach.

Although KLEIN also operates on bytes, its S-

box works on nibbles instead of bytes. And if

we create our tables according to nibbles, resulting

table-based implementation will require more opera-

tions than AES and will be slower than AES. Thus,

we combined every two consecutive 4 × 4 S-box of

KLEIN in order to turn them into an 8 × 8 S-box.

Then we created the tables by combining the three

layers of the round function after the round key addi-

tion. Namely for each input of the 8 × 8 S-box, we

calculated the result of the S-box operation, two bytes

to the left, and the matrix multiplication. The result

can be stored in an array of 256 elements with 32-bit

sizes. We need to create four tables in this respect due

to the four bytes that are multiplied with the matrix.

However, these tables turn out to be one byte rota-

tions of each other, due to the choice of AES matrix.

Thus, keeping a single table and obtaining the others

by rotations are possible.

In current GPU architectures, threads work as

warps which consists of 32 threads. And there are 32

data lanes these threads in a warp can use to access

the shared memory. If two threads try to read values

that are in the same shared memory bank, these op-

erations become serialized. In order to avoid shared

memory bank conﬂicts, 32 copies of AES table are

stored in (Tezcan, 2021) which allowed every thread

in a warp to use its own data lane. Similarly, we calcu-

lated the table for KLEIN and stored it in the global

memory of GPU as T 0G. The following CUDA code

writes that table to the shared memory 32 times to

avoid shared memory bank conﬂicts.

b i t 3 2 t h r e a d I n d e x = b l o c k I d x . x

blo ckD im . x +

t h r e a d I d x . x ;

i n t wa r p T h r e a d I n d e x = t h r e a d I d x . x & 3 1 ;

s h a r e d b i t 3 2 T0S [ 2 5 6 ] [ 3 2 ] ;

i f ( t h r e a d I d x . x < 25 6 )

f o r ( i n t i = 0 ; i < 3 2 ; i ++) T0S [ t h r e a d I d x . x ] [ i ]

= T0G [ t h r e a d I d x . x ] ;

32 copies of this table requires 32KB of shared

memory. Since current GPUs come with 48KB of

shared memory, we cannot do this for the other three

tables. Thus, we only use one table and obtain the rest

by byte rotations. When we store the key with two

variables key1 and key0 and the state as plaintext1

and plaintext0, one round of encryption turns into the

following CUDA code for KLEIN-64:

temp1 = p l a i n t e x t 1 ˆ ( ke y1 >> 1 6 ) ;

temp0 = p l a i n t e x t 0 ˆ ( ke y1 << 1 6 ) ˆ ( key0 >> 3 2 ) ;

p l a i n t e x t 0 = a r i t h m e t i c R i g h t S h i f t ( T0S [ ( tem p1 & 0

x00FF0000 ) >> 1 6 ] [ wa r p T h r e a d I n d e x ] , 2 4 ) ˆ

a r i t h m e t i c R i g h t S h i f t ( T0S [ ( temp1 & 0 xFF00000 0 )

>> 2 4 ] [ w a r p T h r e a d I n d e x ] , 1 6 ) ˆ

a r i t h m e t i c R i g h t S h i f t ( T0S [ temp0 & 0 x000000FF ] [

w a r p T h r e adIndex ] , 8 ) ˆ T0S [ ( te mp0 & 0 x0000FF 00

) >> 8 ] [ w a r p T h r e a d I n d e x ] ;

p l a i n t e x t 1 = a r i t h m e t i c R i g h t S h i f t ( T0S [ ( tem p0 & 0

x00FF0000 ) >> 1 6 ] [ wa r p T h r e a d I n d e x ] , 2 4 ) ˆ

a r i t h m e t i c R i g h t S h i f t ( T0S [ ( temp0 & 0 xFF00000 0 )

>> 2 4 ] [ w a r p T h r e a d I n d e x ] , 1 6 ) ˆ

a r i t h m e t i c R i g h t S h i f t ( T0S [ temp1 & 0 x000000FF ] [

w a r p T h r e adIndex ] , 8 ) ˆ T0S [ ( te mp1 & 0 x0000FF 00

) >> 8 ] [ w a r p T h r e a d I n d e x ] ;

Since NVIDIA GPUs do not have an instruction

for bit rotations, we perform two shift and one XOR

operation to perform the rotation which is denoted as

arithmeticRightShi f t() in our codes. Although there

is no single instruction for bit rotations, it was ob-

served in (Tezcan, 2021) that CUDA’s byte permuta-

tion instruction byte perm can be used in our cal-

culations since our bit rotations are a multiple of 8.

ICISSP 2024 - 10th International Conference on Information Systems Security and Privacy

886

Table 2: Number of key searches per second for the exhaustive key search attack on KLEIN.

GPU KLEIN-64 KLEIN-80 KLEIN-96

MX 250 2

29.70

keys/s 2

00.00

keys/s 2

28.42

keys/s

GTX 970 2

31.75

keys/s 2

30.74

keys/s 2

30.48

keys/s

RTX 2070 Super 2

33.19

keys/s 2

32.46

keys/s 2

32.17

keys/s

RTX 4090 2

35.40

keys/s 2

34.74

keys/s 2

34.39

keys/s

Although using this instruction allows us to use a sin-

gle instruction instead of three, apparently new gener-

ation GPUs like RTX 2070 Super and RTX 4090 per-

form the same operations in both cases because the

change in the performance was negligible in our ex-

periments. However, using the byte perm instruc-

tion provided 5% speedup on GTX 970.

Key schedule requires calculation of four 4 × 4 S-

boxes and we turned that into two 8 × 8 S-box cal-

culations. Since we can store this S-box as 8-bit un-

signed char array instead of 32-bit unsigned int, we

have enough shared memory and can story 32 copies

of it to avoid shared memory bank conﬂicts. How-

ever, we observed that shared memory bank conﬂicts

in reading these 8-bit S-box values does not cause the

delays we observed for the bank conﬂicts for T 0S.

Thus, we got better occupancy on the GPU when we

kept a single copy of this table in the shared memory.

Using our best optimizations

, we performed

exhaustive key search attack on every version of

KLEIN using many GPUs. The number of keys that

we can try in a second are provided in Table 2.

Main difference between the performance of the

three versions of KLEIN comes from the number

of rounds of each version. Namely, 12, 16, and

20 rounds for 64-bit, 80-bit, and 96-bit secret keys.

Moreover, our KLEIN-64 implementation is also

faster than the other variants because the 64-bit se-

cret key can be stores in two 32-bit unsigned integer.

However, we had to use two 64-bit integers in our

KLEIN-80 and KLEIN-96 implementations. Since

GPU architectures are design for 32-bit operations,

64-bit operations are slower.

Since the design of KLEIN is similar to AES and

we used similar optimization techniques, we provide

the performance of the exhaustive search attack on

these two block ciphers on the same GPU in Table

3. Although KLEIN has more rounds than AES, it

can be seen that our KLEIN optimization is faster

than AES because our optimizations require less op-

erations.

Our table-based optimized KLEIN CUDA codes are

publicly available at GitHub so that they can be used to ver-

ify our experiments, to analyze KLEIN, or to compare fu-

ture optimizations: https://www.github.com/cihangirtezca

n/CUDA KLEIN

Table 3: Number of key searches per second for the exhaus-

tive key search attack on KLEIN and AES performed for

different key sizes on a single RTX 2070 Super GPU.

Cipher Keys/s

AES - 128 / 192 / 256 2

32.43

32.01

31.66

KLEIN - 64 / 80 / 96 2

33.19

32.46

32.17

4 CRYPTANALYSIS OF KLEIN

Our key search results that are provided in Table 2 can

be used to estimate how long will it take to perform

brute force attacks on the three versions of KLEIN. A

year consists of around 2

24.91

seconds. Thus, we can

try 2

35.40+24.91

= 2

60.31

KLEIN-64 keys per second

on an RTX 4090 and capture the key in less than 13

years. Performing the same attack with a million RTX

4090 GPUs reduces the attack time to less than 5 days.

A biclique attack on KLEIN-64 was provided in

(Ahmadian et al., 2015) which requires 2

62.8

encryp-

tions and 2

data. Thus, we can perform this attack

on a single RTX 4090 in less than 6 years.

A truncated differential attack on the full 12

rounds of KLEIN-64 was proposed in (Lallemand

and Naya-Plasencia, 2014). That attack requires

57.07

encryptions and authors tried to experimentally

verify their attack by using a C implementation on

Intel(R) Xeon(R) CPU W3670 at 3.20GHz (12MB

cache) with 8GB of RAM. However, it would take

hundreds of years to complete the experiment on the

12-round attack. Instead, they performed their exper-

iments on the reduced versions of their attack. When

the attack is reduced to 10 rounds, the time complex-

ity reduces to 2

44.4

encryprions and they performed it

in 15 days. Similarly they performed their attack on

9 rounds which requires 2

encryptions and it took

around 2 days.

The attacks of (Lallemand and Naya-Plasencia,

2014) perform partial encryptions and decryptions. A

small modiﬁcation to our optimized CUDA codes can

be used to perform these attacks. It should be noted

that such a modiﬁcation would introduce a small over-

head to the performance. Since we can perform 2

35.40

KLEIN-64 encryptions per second on a single RTX

4090, the 9-round experiment that requires 2

en-

cryptions would take just a few seconds with our GPU

GPU-Based Brute Force Cryptanalysis of KLEIN

887

implementation. Similarly, the 10-round experiment

that took 15 days when run on CPU would take less

than 10 minutes with our proposed GPU optimiza-

tions. Moreover, the full 12-round attack that requires

57.07

encryptions would take 2

21.67

second which

is less than 39 days. Note that it would take more

than 300 years to verify this attack on the CPU setup

and the C implementation of (Lallemand and Naya-

Plasencia, 2014).

The attacks of (Lallemand and Naya-Plasencia,

2014) was improved in (Rasoolzadeh et al., 2017)

which now requires 2

54.9

encryptions. Performing

54.9

encryptions would take less than 9 days with our

CUDA codes on an RTX 4090.

Our optimization results show that we can try

34.74+24.91

= 2

59.65

KLEIN-80 keys in a year. This

means that it would take 2

20.35

years for an RTX 4090

to capture a KLEIN-80 or it would require 2

20.35

≈

1.34 million RTX 4090 GPUs to capture the key in a

year.

A biclique attack in (Abed et al., 2012) has a

time complexity of 2

encryptions which is two times

faster than the exhaustive search attack. However, this

attack requires 2

memory and implementing this at-

tack using our GPU optimizations might result in an

attack that is slower than the exhaustive search. Be-

cause storing and processing 2

data would introduce

a signiﬁcant overhead.

Our optimization results show that we can try

34.39+24.91

= 2

59.3

KLEIN-96 keys in a year. This

means that it would take 2

36.7

years for an RTX 4090

to capture a KLEIN-96 or it would require 2

36.7

≈

111 billion RTX 4090 GPUs to capture the key in a

year.

A biclique attack in (Abed et al., 2012) has a time

complexity of 2

95.18

encryptions which is 2

0.82

times

faster than the exhaustive search attack. However, this

attack requires 2

memory and implementing this at-

tack using our GPU optimizations might result in an

attack that is slower than the exhaustive search. Be-

cause storing and processing 2

data would introduce

a signiﬁcant overhead.

Although an exhaustive key search attack on

GPUs does not look realistic with these numbers, it

should be noted that this attack can become practical

in the future since new GPUs are always built with

more cores and faster clock speeds. Moreover, GPUs

are general purpose devices and if an attack on 96-

bit KLEIN becomes proﬁtable, one can built ASICs

where this attack becomes practical and requires less

electricity than GPUs.

5 CONCLUSIONS

In this work we provided a CUDA optimized table-

based implementation of the KLEIN family of block

ciphers which does not contain shared memory bank

conﬂicts. Our best optimization reach 2

35.40

≈ 45

billion KLEIN-64 key trials on an RTX 4090. Our

results show that KLEIN block cipher that supports

64-bit, 80-bit, and 96-bit secret keys is susceptible to

brute force attacks via GPUs. Thus, lightweight de-

signs should not support short keys.

ACKNOWLEDGEMENTS

This work has been supported by The Scientiﬁc

and Technological Research Council of T

urkiye

UBITAK) and German Academic Exchange Ser-

vice (DAAD) Bilateral Research Cooperation Project

ITAK 2531 Project) under the grant number

123N546 and titled ”Cryptanalysis of Symmetric Key

Encryption Algorithms: Theory vs. Practice”.

This project has also been supported by Mid-

dle East Technical University Scientiﬁc Research

Projects Coordination Unit under grant number

AGEP-704-2023-11294.

REFERENCES

Abed, F., Forler, C., List, E., Lucks, S., and Wenzel, J.

(2012). Biclique cryptanalysis of present, led, and

klein. Cryptology ePrint Archive, Paper 2012/591.

https://eprint.iacr.org/2012/591.

Ahmadian, Z., Salmasizadeh, M., and Aref, M. R. (2015).

Biclique cryptanalysis of the full-round KLEIN block

cipher. IET Inf. Secur., 9(5):294–301.

Barker, E. and Roginsky, A. (2019). Transitioning the use

of cryptographic algorithms and key lengths. NIST SP

800-131A Rev. 2.

Belorgey, M. G., Dandjee, S., Gama, N., Jetchev, D., and

Mikushin, D. (2023). Falkor: Federated learning se-

cure aggregation powered by AESCTR GPU imple-

mentation. In Brenner, M., Costache, A., and Rohloff,

K., editors, Proceedings of the 11th Workshop on En-

crypted Computing & Applied Homomorphic Cryp-

tography, Copenhagen, Denmark, 26 November 2023,

pages 11–22. ACM.

Daemen, J. and Rijmen, V. (2002). The Design of Rijndael:

AES - The Advanced Encryption Standard. Informa-

tion Security and Cryptography. Springer.

Gong, Z., Nikova, S., and Law, Y. W. (2011). KLEIN: A

new family of lightweight block ciphers. In Juels,

A. and Paar, C., editors, RFID. Security and Pri-

vacy - 7th International Workshop, RFIDSec 2011,

Amherst, USA, June 26-28, 2011, Revised Selected

ICISSP 2024 - 10th International Conference on Information Systems Security and Privacy

888

Papers, volume 7055 of Lecture Notes in Computer

Science, pages 1–18. Springer.

Lallemand, V. and Naya-Plasencia, M. (2014). Cryptanaly-

sis of KLEIN. In Cid, C. and Rechberger, C., editors,

Fast Software Encryption - 21st International Work-

shop, FSE 2014, London, UK, March 3-5, 2014. Re-

vised Selected Papers, volume 8540 of Lecture Notes

in Computer Science, pages 451–470. Springer.

Nishikawa, N., Amano, H., and Iwai, K. (2017). Implemen-

tation of bitsliced AES encryption on cuda-enabled

GPU. In Yan, Z., Molva, R., Mazurczyk, W., and

Kantola, R., editors, Network and System Security -

11th International Conference, NSS 2017, Helsinki,

Finland, August 21-23, 2017, Proceedings, volume

10394 of Lecture Notes in Computer Science, pages

273–287. Springer.

Rasoolzadeh, S., Ahmadian, Z., Salmasizadeh, M., and

Aref, M. R. (2017). An improved truncated differ-

ential cryptanalysis of Klein. Tatra Mountains Math-

ematical Publications, 67(1):135–147.

Tezcan, C. (2017). Brute force cryptanalysis of MIFARE

classic cards on GPU. In Mori, P., Furnell, S., and

Camp, O., editors, Proceedings of the 3rd Interna-

tional Conference on Information Systems Security

and Privacy, ICISSP 2017, Porto, Portugal, February

19-21, 2017, pages 524–528. SciTePress.

Tezcan, C. (2021). Optimization of advanced encryption

standard on graphics processing units. IEEE Access,

9:67315–67326.

Tezcan, C. (2022). Key lengths revisited: Gpu-based brute

force cryptanalysis of DES, 3DES, and PRESENT. J.

Syst. Archit., 124:102402.

GPU-Based Brute Force Cryptanalysis of KLEIN

889