In this paper, two high-throughput hardware
architectures of the JH algorithm are proposed and
analytically described. The first one incorporates no
pipeline stages while the second one corresponds to
a design with three pipeline stages. Beyond that,
certain design choices were made targeting high
throughput with reasonable area consumption. Both
of them are able to perform as any of the four
versions of JH (JH-224/256/384/512) and were
successfully implemented in Xilinx Virtex-4, Virtex-
5 and Virtex-6 FPGAs. The performance metrics
that are gathered, including Frequency, Area, and
Throughput, show that the proposed architectures
outperform the existing ones in terms of
Throughput/Area cost factor.
The rest of the paper is organized as follows.
Section 2 states the previously published works and
Section 3 presents the JH algorithm, as submitted to
NIST. In Section 4 the proposed architectures are
described in details. The implementation results and
the corresponding comparisons are shown in Section
5, while Section 6 concludes the paper.
2 RELATED WORK
Regarding hardware implementations of the JH
algorithm, to the best of authors’ knowledge, there
are no previously published works dealing with the
JH algorithm itself. However, there are several ones
performing comparative analyses among either the
round-two candidates (Baldwin et al., 2010);
(Henzen et al., 2010); (Tillich et al., 2009); (Matsuo
et al., 2010); (Homsirikamol et al., 2010); (Gaj et al.,
2010); (Guo et al., 2010a); (Guo et al., 2010b);
(Kobayashi et al., 2010), or the round-3 candidates
(Jungk et al., 2011); (Kerckhof et al., 2011); (Guo et
al., 2011); (Guo et al., 2012); (Jungk, 2011);
(Homsirikamol et al., 2011); (Tillich et al., 2010);
(Provelengios et al., 2011). The above studies
include both FPGA and ASIC CMOS
implementations. Specifically, FPGA
implementations and results are reported in 10
papers (Baldwin et al., 2010); (Matsuo et al., 2010);
(Homsirikamol et al., 2010); (Gaj et al., 2010); (Guo
et al., 2010a); (Kobayashi et al., 2010); (Jungk et al.,
2011); (Jungk, 2011); Homsirikamol et al., 2011;
Provelengios et al., 2011).
Apart from (Homsirikamol et al., 2011) and
(Provelengios et al., 2011), all the other works deal
with simple implementations without any form of
optimization. On the other hand, in (Homsirikamol
et al., 2011) pipeline and unrolling investigation
takes place. However it is shown that there are quite
few benefits from both the above techniques.
Regarding (Provelengios et al., 2011), the pipeline
technique is applied, targeting low power desings.
Thus, the reported performance results are low.
Finally, it has to be stressed that, in the
competition’s third round, the JH algorithm is
tweaked (denoted as JH42). The difference between
those two is that the iterations of the first are 36
(plus the potential needed for initialization or
finalization) while the second one’s are 42. This
work deals with JH42 of round-three, which is
considered more efficient for hardware
implementation and offers more security margins
compared to the previous one (Wu, 2008).
3 THE JH ALGORITHM
The hash function family JH, proposed by Hongjun
Wu (2008), includes two main special features: a
new compression structure and a generalized AES
(NIST, 2001a) design methodology. The latter
methodology offers the possibility of easily
constructing large block ciphers from smaller
components. Obviously, the compression structure is
a bijective function implemented as a block cipher
with constant key. The family itself consists of four
versions, namely the JH-224, JH-256, JH-384, and
Jh-512, which are based on the same compression
function but produce a hash value of different width
(via truncation of the output’s bits).
A general diagram of the compression function,
F
d
, is shown in Figure 1. It uses an internal state,
H(i), the size of which is 2
d+2
bits, where the i factor
denotes the i-th iteration and d the dimension of a
block of bits. A d-dimensional block consists of 2
d
4-bit elements. The starting state, H(0), is version-
dependent. In other words, there is a vector, IV,
which is appropriately loaded into the state and
represents the message digest size.
The input message is portioned to n m-bit blocks,
M, through a padding procedure. The compression
operates on a message block, M(n). Initially, the
block is XORed with the lower half of the 2
d+2
-bit
state value. Then, the result is fed in the E
d
function.
The output of E
d
is then XORed once more with the
message block and loaded into the state. If it is the
last block of the message or the message is one-
block then the procedure is over and the hash value
is in the final state. Otherwise, the procedure is
repeated for the next message block.
The E
d
function is based on the d-dimensional
generalized AES methodology and applies
High-throughputHardwareArchitecturesoftheJHRound-threeSHA-3Candidate-AnFPGADesignandImplementation
Approach
127