gether with the 48-bit key belonging to the transpon-
der, in order to encrypt the value 0xFFFFFFFF. If
the transponder validates the reader by recovering the
0xFFFFFFFF value, it will send to the reader in en-
crypted form some configuration bytes only known to
both of them (Verdult et al., 2012; Verdult, 2015).
This protocol provides an easy attack scheme, as
any eavesdropper is able to obtain both the plaintext
and the ciphertext from the protocol’s operation. As
the number of keys is larger than the number of possi-
ble ciphertexts (48 bits vs 32 bits), an attacker will be
able to compute many keys which convert the same
plaintext into the same ciphertext. Thus, a brute force
attack such as the one described in this contribution
needs an additional step in order to correlate the keys
obtained from several encryption pairs.
In this phase of our study, we have focused on
the implementations that are able to compute those
potential keys. In the next phase, we will focus on
improving the retrieval step by including Field Pro-
grammable Gate Array (FPGA) devices in the com-
parison of technologies, and on determining the aver-
age number of pairs needed to isolate the correct key.
3 IMPLEMENTATION
PLATFORMS
3.1 C++ and OpenMP
C++ is a programming language designed by Bjarne
Stroustrup in 1983, and that is standardized since
1998 by the International Organization for Standard-
ization (ISO). The latest version is known as C++14
(ISO/IEC, 2014).
OpenMP (Open Multi-Processing) is an Appli-
cation Programming Interface (API) that supports
shared-memory parallel programming in C, C++, and
Fortran on several platforms, including GNU/Linux,
OS X, and Windows. The latest stable version is 4.5,
released on November 2015 (OpenMP, 2016). When
using OpenMP, the section of code that is intended to
run in parallel is marked with a preprocessor directive
that will cause the threads to form before the section
is executed. By default, each thread executes the par-
allelized section of code independently. The runtime
environment allocates threads to processors depend-
ing on usage, machine load, and other factors.
3.2 Java
The Java programming language was originated in
1990 when a team at Sun Microsystems was work-
ing first in the design and development of software
for small electronic devices, and later in the emerging
market of Internet browsing. Once the first official
version of Java was launched in 1996, its popularity
started to increase exponentially.
Currently there are more than 10 million Java de-
velopers and, according to (Oracle Corp., 2016), the
figure of Java enabled devices (mainly personal com-
puters, mobile phones, and smart cards) is numbered
in the thousands of millions. On January 2010, Ora-
cle Corporation completed the acquisition of Sun Mi-
crosystems (Oracle Corp., 2010), so at this moment
the Java technology is managed by Oracle. The latest
version, known as Java 8, was launched in 2014.
3.3 CUDA
GPGPU is the term that refers to the use of a Graph-
ics Processor Unit (GPU) card to perform computa-
tions in applications traditionally managed by a Cen-
tral Processing Unit (CPU). Due to their particular
hardware architecture, GPUs are able to compute cer-
tain types of parallel tasks quicker than multi-core
CPUs, which has motivated their usage in scientific
and engineering applications (NVIDIA Corp., 2016).
The disadvantage of using GPUs in those scenarios is
their higher power consumption compared to that of
traditional CPUs (Mittal and Vetter, 2014).
CUDA is the best known GPU-based parallel
computing platform and programming model, created
by NVIDIA. CUDA is designed to work with C, C++
and Fortran, and with programming frameworks such
as OpenACC or OpenCL, though with some limita-
tions. CUDA organizes applications as a sequential
host program that may execute parallel programs, re-
ferred to as kernels, on a CUDA-capable device.
In order to work with CUDA applications, the pro-
grammer needs to copy data from host memory to de-
vice memory, invoke kernels and then copy data back
from device memory to host memory.
The code displayed in Listing 1 contains the de-
tails of the CUDA kernel, where only one key is tested
by each thread.
As one of the goals of our study was to determine
if the amount of time copying elements back and forth
between host and device memories was to some ex-
tent comparable to the running time of the kernel, we
developed a second version of the CUDA application
which is able to request each thread to test a specified
number of keys before it finishes its execution.