Breaking a Hitag2 Protocol with Low Cost Technology

V. Gayoso Mart´ınez

, L. Hern´andez Encinas

, A. Mart´ın Mu˜noz

and J. Zhang

1,2

Institute of Physical and Information Technologies (ITEFI),

Spanish National Research Council (CSIC), Madrid, Spain

The Honk Kong University of Science and Technology (HKUST), Kowloon, Hong Kong

{victor.gayoso, luis, agustin}@iec.csic.es, jzhangcf@connect.ust.hk

Keywords:

Cryptography, CUDA, C++, Hitag2, Java, OpenMP, Stream Cipher.

Abstract:

Hitag2 is an encryption algorithm designed by NXP Semiconductors that is used in electronic vehicle immo-

bilizers and anti-theft devices. Hitag2 uses 48-bit keys for authentication and conﬁdentiality, and due to that

feature it is considered an insecure cipher. In this contribution we present a comparison of low cost technolo-

gies able to break a known protocol based on this cipher in a reasonable amount of time. Building on top of

these solutions, it is possible to create an environment able to obtain Hitag2 keys in almost negligible time.

The procedure can be easily expanded in order to consider other protocols based on the same cipher.

1 INTRODUCTION

Hitag2 is a 48-bit stream cipher used widely in both

automotive Remote Keyless Entry (RKE) and Passive

Keyless Entry (PKE) systems. An RKE system con-

sists of an RF transmitter embedded into a car key

that sends a short burst of digital data to a receiver in

the vehicle, where it is decoded. In this context, users

have to actively initiate the authentication process by

pressing a button in their car key. The frequency used

by RKE systems is 315 MHz in the US and Japan, and

433 MHz in Europe.

In comparison, in PKE systems users are able to

automatically unlock their cars when they approach

the vehicle without having to actively press any but-

ton, as a bidirectional communication takes place

beetween the car key and the vehicle when the trans-

mitter is within the system’s range. PKE systems ty-

pically operate at the frequency of 125 KHz.

In this contribution, we have focused on the usage

of Hitag2 as a PKE system in a publicly known pro-

tocol (Verdult et al., 2012). Given the short length of

Hitag2’s keys, this stream cipher has been considered

insecure for some years, and as such it can be attacked

by using expensive devices such as COPACOBANA

(Guneysu et al., 2008). In addition to that, Hitag2

suffers from more elaborated cryptographic attacks

(Courtois et al., 2009; Courtois et al., 2011; Stem-

bera and Novotny, 2011; Verdult et al., 2012; Garcia

et al., 2016).

Thus, our goal is not to show that Hitag2 is inse-

cure, but to compare low cost technologies that can

be used to obtain the transmitter’s key with a sole

computer in the scope of the aforementioned proto-

col. In this sense, we have developed three implemen-

tations, two of them using an only-software approach

(Java and C++/OpenMP), and the other one based on

a CUDA-capable graphics card.

The rest of this paper is organized as follows:

In Section 2, we present a brief overview of the

Hitag2 algorithm. Section 3 describes the Java,

C++/OpenMP, and CUDA platforms, including part

of the code used in the CUDA implementation. In

Section 4, we offer to the readers the experimental re-

sults obtained with our implementations. Finally, our

conclusions are presented in Section 5.

2 HITAG2

2.1 Algorithm

Hitag2 is a stream cipher which consists of an inter-

nal 48-bit Linear Feedback Shift Register (LFSR) and

a non-linear ﬁlter function f, as it can be observed in

Figures 1 and 2. Hitag2 is the successor of Crypto1,

another proprietary encryption algorithm created by

NXP Semiconductors speciﬁcally for Mifare Radio

Frequency Identiﬁcation (RFID) tags.

In addition to the 48-bit key, this cipher uses a

32-bit serial number and a 32-bit Initialization Vec-

tor (IV). After a set-up phase of 32 cycles, the cipher

works in an autonomous mode where the content of

Martínez, V., Encinas, L., Muñoz, A. and Zhang, J.

Breaking a Hitag2 Protocol with Low Cost Technology.

DOI: 10.5220/0006271905790584

In Proceedings of the 3rd International Conference on Information Systems Security and Privacy (ICISSP 2017), pages 579-584

ISBN: 978-989-758-209-7

579

1 2 3 4 5 6 7 ...0 11 12 13 14 15 16 ... 20 21 22 ... 25 26 28 29 30 31 32 33 ... 4227 4443 46 47

fa fb fb fb fa

Figure 1: Hitag2 initialization phase.

1 2 3 4 5 6 7 ...0 11 12 13 14 15 16 ... 20 21 22 ... 25 26 28 29 30 31 32 33 ... 4227 4443 46 47

fa fb fb fb fa

Keystream

23 41

Figure 2: Hitag2 encryption phase.

the registry deﬁnes both the next encryption bit and

how the registry is updated. Thus, the total number

of cycles is deﬁned by the length of the bitstream that

needs to be encrypted.

The ﬁlter function f consists of three different

functions f

, f

and f

. While f

and f

take as input

four bits and produce as output one bit, f

uses ﬁve

bits in order to generate the ﬁnal result in the form of

a single bit.

The three functions, which are used both in the

initialization phase and the encryption phase, can be

modelled as boolean tables allowing easy implemen-

tations, so the output of those functions for the input i

is the i-th bit of the values given below:

(i) = (0x2C79)

(i) = (0x6671)

(i) = (0x7907287B)

In the initialization phase (see Figure 1), the reg-

ister is initially ﬁlled with the 32 bits of the serial

number and the ﬁrst 16 bits of the key. If the se-

rial number is expressed as id

(0 ≤ i ≤ 31) and the

key is expressed as k

(0 ≤ i ≤ 48), the register bits r

(0 ≤ i ≤ 47) adopt the following initial state:

= id

(0 ≤ i ≤ 31)

32+i

= k

(0 ≤ i ≤ 15)

In each cycle, the bit generated by f

is XORed

with the corresponding bits of the IV and the key,

generating a bit that is inserted in the register at the

position 47, shifting the register one bit to the left.

The new bit is computed according to the following

expression:

⊕ id

⊕ k

i+16

(0 ≤ i ≤ 31)

In the encryption phase (see Figure 2), the new

bit of the keystream is directly the output of f

, while

the bit inserted at the register at position 47 in each

cycle is the result of the concatenated XOR operations

⊕r

⊕

⊕ r

⊕.

2.2 Protocol

In the PKE protocol analysed in this contribution,

which was reversed engineered and published online

in 2008 (Wiener, 2008), the communication between

a reader (vehicle) and a transponder embedded in the

car key starts with the reader, which sends an authen-

ticate command to the transponder. Upon reception

of this command, the transponder replies with a 32-

bit message containing its serial number. Then, the

reader generates a 32-bit IV and uses that value, to-

ICISSP 2017 - 3rd International Conference on Information Systems Security and Privacy

580

gether with the 48-bit key belonging to the transpon-

der, in order to encrypt the value 0xFFFFFFFF. If

the transponder validates the reader by recovering the

0xFFFFFFFF value, it will send to the reader in en-

crypted form some conﬁguration bytes only known to

both of them (Verdult et al., 2012; Verdult, 2015).

This protocol provides an easy attack scheme, as

any eavesdropper is able to obtain both the plaintext

and the ciphertext from the protocol’s operation. As

the number of keys is larger than the number of possi-

ble ciphertexts (48 bits vs 32 bits), an attacker will be

able to compute many keys which convert the same

plaintext into the same ciphertext. Thus, a brute force

attack such as the one described in this contribution

needs an additional step in order to correlate the keys

obtained from several encryption pairs.

In this phase of our study, we have focused on

the implementations that are able to compute those

potential keys. In the next phase, we will focus on

improving the retrieval step by including Field Pro-

grammable Gate Array (FPGA) devices in the com-

parison of technologies, and on determining the aver-

age number of pairs needed to isolate the correct key.

3 IMPLEMENTATION

PLATFORMS

3.1 C++ and OpenMP

C++ is a programming language designed by Bjarne

Stroustrup in 1983, and that is standardized since

1998 by the International Organization for Standard-

ization (ISO). The latest version is known as C++14

(ISO/IEC, 2014).

OpenMP (Open Multi-Processing) is an Appli-

cation Programming Interface (API) that supports

shared-memory parallel programming in C, C++, and

Fortran on several platforms, including GNU/Linux,

OS X, and Windows. The latest stable version is 4.5,

released on November 2015 (OpenMP, 2016). When

using OpenMP, the section of code that is intended to

run in parallel is marked with a preprocessor directive

that will cause the threads to form before the section

is executed. By default, each thread executes the par-

allelized section of code independently. The runtime

environment allocates threads to processors depend-

ing on usage, machine load, and other factors.

3.2 Java

The Java programming language was originated in

1990 when a team at Sun Microsystems was work-

ing ﬁrst in the design and development of software

for small electronic devices, and later in the emerging

market of Internet browsing. Once the ﬁrst ofﬁcial

version of Java was launched in 1996, its popularity

started to increase exponentially.

Currently there are more than 10 million Java de-

velopers and, according to (Oracle Corp., 2016), the

ﬁgure of Java enabled devices (mainly personal com-

puters, mobile phones, and smart cards) is numbered

in the thousands of millions. On January 2010, Ora-

cle Corporation completed the acquisition of Sun Mi-

crosystems (Oracle Corp., 2010), so at this moment

the Java technology is managed by Oracle. The latest

version, known as Java 8, was launched in 2014.

3.3 CUDA

GPGPU is the term that refers to the use of a Graph-

ics Processor Unit (GPU) card to perform computa-

tions in applications traditionally managed by a Cen-

tral Processing Unit (CPU). Due to their particular

hardware architecture, GPUs are able to compute cer-

tain types of parallel tasks quicker than multi-core

CPUs, which has motivated their usage in scientiﬁc

and engineering applications (NVIDIA Corp., 2016).

The disadvantage of using GPUs in those scenarios is

their higher power consumption compared to that of

traditional CPUs (Mittal and Vetter, 2014).

CUDA is the best known GPU-based parallel

computing platform and programming model, created

by NVIDIA. CUDA is designed to work with C, C++

and Fortran, and with programming frameworks such

as OpenACC or OpenCL, though with some limita-

tions. CUDA organizes applications as a sequential

host program that may execute parallel programs, re-

ferred to as kernels, on a CUDA-capable device.

In order to work with CUDA applications, the pro-

grammer needs to copy data from host memory to de-

vice memory, invoke kernels and then copy data back

from device memory to host memory.

The code displayed in Listing 1 contains the de-

tails of the CUDA kernel, where only one key is tested

by each thread.

As one of the goals of our study was to determine

if the amount of time copying elements back and forth

between host and device memories was to some ex-

tent comparable to the running time of the kernel, we

developed a second version of the CUDA application

which is able to request each thread to test a speciﬁed

number of keys before it ﬁnishes its execution.

Breaking a Hitag2 Protocol with Low Cost Technology

581

1 #define

bit(x,n) (((x)>>(n))&1)

2 #define

g4(x,a,b,c,d) (bit(x,a) + bit(x,b)*2 + bit(x,c)*4 + bit(x,d)*8)

3 #define

f5(a,b,c,d,e) (a + b*2 + c*4 + d*8 + e*16)

4 #define

fa 0x2C79

5 #define

fb 0x6671

6 #define

fc 0x7907287B

__global__

void

hitag2_en(uint32_t *ciphertext, uint64_t *key , uint32_t *plaintext ,

uint64_t *last_key, uint32_t *last_encrypted, uint64_t *numtot, uint64_t *serial , uint32_t *iv)

{

uint64_t index = blockIdx.x*blockDim.x + threadIdx.x, a = *serial;

uint32_t x = *plaintext;

uint32_t y = *ciphertext;

uint64_t a = *serial;

uint32_t b = *iv;

uint64_t z = *key + index;

uint64_t LFSR = 0;

uint32_t func = 0;

uint32_t bstream = 0;

uint32_t result = 0;

23 // Phase 1: Initilization

LFSR = (((z & 0xFFFF00000000) >> 32) + (a << 16)) & 0xFFFFFFFFFFFF;

27 for

(

int

r = 0; r < 32; r++)

{

func = bit(fc, f5(bit(fa, g4(LFSR, 45, 44, 42, 41)), bit(fb, g4(LFSR, 39, 35, 33, 32)),

bit(fb, g4(LFSR, 30, 26, 24, 21)), bit(fb, g4(LFSR, 19, 18, 16, 14)),

bit(fa, g4(LFSR, 13, 4, 3, 1))));

LFSR = (LFSR << 1) + ((bit(z, (31 - r)) ˆ bit(b, (31 - r)) ˆ func));

}

35 // Phase 2: Encryption

37 for

(uint32_t i = 0; i < 32; i++)

{

bstream <<= 1;

func = bit(fc, f5(bit(fa, g4(LFSR, 45, 44, 42, 41)), bit(fb, g4(LFSR, 39, 35, 33, 32)),

bit(fb, g4(LFSR, 30, 26, 24, 21)), bit(fb, g4(LFSR, 19, 18, 16, 14)),

bit(fa, g4(LFSR, 13, 4, 3, 1))));

bstream += func;

LFSR = (LFSR << 1) + ((bit(LFSR, 47)) ˆ (bit(LFSR, 45)) ˆ (bit(LFSR, 44)) ˆ (bit(LFSR, 41)) ˆ

(bit(LFSR, 40)) ˆ (bit(LFSR, 39)) ˆ (bit(LFSR, 31)) ˆ (bit(LFSR, 25)) ˆ

(bit(LFSR, 24)) ˆ (bit(LFSR, 21)) ˆ (bit(LFSR, 17)) ˆ (bit(LFSR, 6)) ˆ

(bit(LFSR, 5)) ˆ (bit(LFSR, 4)) ˆ (bit(LFSR, 1)) ˆ (bit(LFSR, 0)));

}

result = bstream ˆ x;

__syncthreads();

55 if

(result == y)

{

*key = z;

}

60 if

(index == *numtot - 1)

{

*last_encrypted = result;

*last_key = z;

}

Listing 1: Portion of code belonging to the CUDA application.

ICISSP 2017 - 3rd International Conference on Information Systems Security and Privacy

582

4 TESTS

All the tests whose results are presented in this

section were completed using a PC with an Intel Core

i7 processor model 3370 at 3.40 GHz. The CUDA-

capable graphics card used in the tests is a GeForce

GTX 950 card with 768 processor cores, a base clock

of 1024 MHz, a memory bandwith of 6.6 GB/s, a

ﬂoating point performance of 1,572.9 GFLOPS, and

a texture rate of 49.2 GTexels per second (GT/s). The

GTX 950 is a graphics card that can be purchased by

approximately 175 euros. In comparison, the most

powerful Nvidia card, the GTX 1080 Ti, uses 3,328

processor cores and can be obtained by 900-1,000 eu-

ros.

While the CUDA and C++/OpenMP applications

have been compiled with Visual Studio 2010, the Java

application has been compiled with NetBeans 8.0 us-

ing the JDK (Java Development Kit) version 1.8.0-

101.

In all the tests that have been performed, each ap-

plication has to check the ﬁrst 2

possible keys (an

arbitrary value large enough in order to obtain valid

conclusions) using an encryption/decryptionpair gen-

erated with the following values:

• Serial number: 0x87654321.

• IV: 0x75b5de65.

• Plaintext: 0xFFFFFFFF.

• Ciphertext: 0x1CE18551.

Table 1 shows the running time in seconds of the

C++/OpenMP and Java implementations when using

a different number of concurrent threads. Table 2

includes the running time of the CUDA application

when executed with different grid sizes but a con-

stant block size of 512. Table 3 presents the results

when using the second version of the CUDA appli-

cation when using different grid sizes but the same

block size of 512. Table 4 includes the runningtime of

the CUDA application when executed with different

grid sizes but a constant block size of 1024. Table 5

presents the results when using the second version of

the CUDA application when using different grid sizes

but the same block size of 1024.

Table 1: Running time in seconds using the C++ and Java

multi-threaded implementations.

1 2 4

C++ 18126.60 9084.68 4625.80

Java 17548.88 8461.70 4496.55

8 16 32

C++ 3749.45 3748.61 3747.32

Java 3744.72 3694.46 3817.03

Table 2: Running time in seconds using the ﬁrst CUDA im-

plementation with a block size of 512.

Grid size

512 1024 2048

180.66 175.90 173.78

Table 3: Running time in seconds using the second CUDA

implementation with a block size of 512 and the kernel loop.

Iterations in the kernel loop

1 2 4 8

174.88 173.11 172.35 171.88

Table 4: Running time in seconds using the ﬁrst CUDA im-

plementation with a block size of 1024.

Grid size

512 1024 2048

175.38 172.56 171.80

Table 5: Running time in seconds using the second CUDA

implementation with a block size of 1024 and the kernel

loop.

Iterations in the kernel loop

1 2 4 8

175.02 172.58 171.40 170.82

5 CONCLUSIONS

The tests presented in the previous section provide

an interesting result, in the sense that the mul-

tithread Java application slightly outperforms the

C++/OpenMP application in most of the tests. Given

that both implementations are almost identical, the

most probable explanation is the use of basic data

types in both cases, which allowed us to avoid slow-

performance Java classes such as

BigInteger

. Be-

sides, as the Java compiler used in the tests was re-

leased in 2016 while the C++ compiler belonged to

Visual Studio 2010, it is reasonable to expect that the

Java compiler contained the latest advances when ex-

ecuting interpreted code.

Even though we decided to use in the Java and

C++/OpenMP tests a number of concurrent threads

that surpasses the theoretical limit provided by the i7

processor (which has four physical cores and eight

logical ones), and as such the C++ implementation

does not improve its performance, the Java applica-

tion provided better results when requesting a higher

number of concurrent threads. We assume that this

is due to optimizations of the Java virtual machine,

which apparently manages more efﬁciently a higher

number of threads when communicating with the op-

Breaking a Hitag2 Protocol with Low Cost Technology

583

erating system.

Regarding the CUDA implementations, when

comparing the version which tries one key in each

thread with the version that tries several keys, it is

possible to detect a slight improvement when using

the second version of the CUDA application. How-

ever, the difference is not signiﬁcant, which implies

that the delays created by the passing of data elements

between the host and device memories are not a bot-

tleneck in this kind of applications.

When comparing the results of the Java and

C++/OpenMP vesions and the results of the CUDA

versions, it is clear that, even when using the 8 logical

cores of the i7 processor, the non-GPU implementa-

tions are not a match for the GPU application. Us-

ing the best result obtained with the CUDA versions,

it can be extrapolated that the whole set of 2

keys

could be tested in approximately one month.

As a work-in-progress study, in the next phase we

are planning to include in the comparison an imple-

mentation using a low cost FPGA. In addition to that,

we will work on the determination of the number of

plaintext/ciphertext pairs needed to correctly isolate

the correct key in the analysed protocol as well as in

other protocols also based on Hitag2.

ACKNOWLEDGEMENTS

This work has been supported by the European

Union FEDER funds distributed through Ministe-

rio de Econom´ıa y Competitividad (Spain) under

the project TIN2014-55325-C2-1-R (ProCriCiS), and

through Comunidad de Madrid (Spain) under the

project S2013/ICE-3095-CM (CIBERDINE).

The work associated to this contribution has been

performed during J. Zhang’s stay at CSIC in 2016.

REFERENCES

Courtois, N. T., O’Neil, S., and Quisquater, J.-J. (2009).

Practical algebraic attacks on the Hitag2 stream ci-

pher. In Information Security: 12th International

Conference (ISC 2009), pages 167–176.

Courtois, N. T., O’Neil, S., and Quisquater, J.-J. (2011).

Cube cryptanalysis of Hitag2 stream cipher. In Inter-

national Conference on Cryptology and Network Se-

curity (CANS 2011), pages 15–25.

Garcia, F. D., Oswald, D., Kasper, T., and Pavlid`es, P.

(2016). Lock it and still lose it–On the (in)security

of automotive remote keyless entry systems. In

25th USENIX Security Symposium (USENIX Security

2016), pages 929–944.

Guneysu, T., Kasper, T., Novotny, M., Paar, C., and

Rup, A. (2008). Cryptanalysis with COPACOBANA.

57:1498.

ISO/IEC (2014). ISO/IEC 14882:2014.

http://www.iso.org/iso/home/store/catalogue

ics/

catalogue

detail ics.htm?csnumber=64029&

ICS1=35&ICS2=60.

Mittal, S. and Vetter, J. S. (2014). A survey of methods

for analyzing and improving GPU energy efﬁciency.

ACM Computing Surveys, 47(2):1–23.

NVIDIA Corp. (2016). What is GPU comput-

ing? https://www.nvidia.com/object/what-is-gpu-

computing.html.

OpenMP (2016). The OpenMP API speciﬁcation for paral-

lel programming.

Oracle Corp. (2010). Oracle Completes Acquisition of Sun.

http://www.oracle.com/us/corporate/press/044428.

Oracle Corp. (2016). Go Java. https://go.java/index.html.

Stembera, P. and Novotny, M. (2011). Breaking Hitag2 with

reconﬁgurable hardware. In 14th Euromicro Confer-

ence on Digital System Design (DSD 2011), pages

558–563.

Verdult, R. (2015). The (in)security of proprietary cryp-

tography. Radboud University Nijmegen, Nijmegen

(Nederlands).

Verdult, R., Garcia, F. D., , and Balasch, J. (2012). Gone in

360 seconds: Hijacking with Hitag2. In 21st USENIX

Security Symposium (USENIX Security 2012), pages

237–252.

Wiener, I. (2008). Philips/NXP Hitag2 PCF7936/46/47/52

stream cipher reference implementation.

https://web.archive.org/web/20080105114835/

http://cryptolib.com/ciphers/hitag2/hitag2.c.

ICISSP 2017 - 3rd International Conference on Information Systems Security and Privacy

584