A POLYNOMIAL BASED HASHING ALGORITHM

V. Kumar Murty

Department of Mathematics, University of Toronto, 40 St. George Street, Toronto, M5S 3G3, Canada

Nikolajs Volkovs

GANITA Lab, Department of Mathematical and Computational Sciences

University of Toronto - Mississauga, 3359 Mississauga Road, Mississauga, Canada

Keywords:

Hash function, data integrity, polynomials over ﬁnite ﬁelds.

Abstract:

The aim of this article is to describe a new hash algorithm using polynomials over ﬁnite ﬁelds. It runs at

speeds comparable to SHA-3. Hardware implementations seem to run at signiﬁcantly faster speeds, namely

at 1.8 Gb/sec on an FPGA. Unlike most other existing hash algorithms, our construction does not follow the

Damgaard-Merkle philosophy. The hash has several attractive features in terms of its ﬂexibility. In particular,

the length of the hash is a parameter that can be set at the outset. Moreover, the estimated degree of collision

resistance is measured in terms of another parameter whose value can be varied.

1 INTRODUCTION

There is much discussion now about how to construct

a good hash algorithm. The main difﬁculty stems

from the fact that the design principles of a hash func-

tion are not completely understood. However, some

desirable features of a future hash function have been

enumerated (NIST,2006). In particular, the algorithm

should provide for

1. a changeable length of hash

2. collision resistance measured in terms of a tunable

parameter

3. different functions for online and ofﬂine usage.

There are some methods that have been studied in

the literature to produce new hash functions from old

functions. For example, one might consider the con-

catenation of two existing hash functions. Or, one

might increase the number of rounds in an existing

function. It is not clear what effect these methods

have on collision resistance. In general, it seems that

the design principles of hash functions are not well

understood.

The aim of this article is to describe a new hash

algorithm, which incorporates some of the features

speciﬁed in (NIST,2006). In particular, the output

length of the function is a parameter that can be set

at the beginning. Moreover, the degree of collision

resistance is expected to depend on another parame-

ter that can also be speciﬁed at the outset.

The performance of the algorithm is comparable

to that of the SHA-family. In particular, for a 384 bit

hash, the speed is comparable to SHA-384. For a 512

bit hash, the speed is 5% faster than SHA-512.

The hardware implementation of the algorithm is

especially effective. Modelling shows that the speed

of the function may reach 1.8 Gbits/sec (on an FPGA

V running at 299 MHz). Analysis also suggests that

on an ASIC, we may obtain speeds of approximately

4 Gbits/sec (600 MHz). We note that the structure of

the algorithm is such that the performance improves

for longer ﬁles and larger hash sizes. The algorithm

has two phases, the second of which does not depend

on the length of the message being hashed.

The collision resistance of the function is depen-

dent on the difﬁculty of solving a family of systems

of iterated exponential equations in a ﬁnite ﬁeld. This

is not a problem that has been studied in the literature.

However, it does not seem to be tractable by standard

methods of analytic number theory.

Summarizing, the hash function that we construct

has the following important attributes. Firstly, the

length of the output can be changed simply by chang-

ing a few steps of the calculation. Secondly, an aspect

of the construction, namely the use of “bit strings”

allows us to choose the degree of collision resistance

103

Kumar Murty V. and Volkovs N. (2008).

A POLYNOMIAL BASED HASHING ALGORITHM.

In Proceedings of the International Conference on Security and Cryptography, pages 103-106

DOI: 10.5220/0001929501030106

 SciTePress

for a ﬁxed output length. Thirdly, the computation is a

bit-stream procedure as opposed to a block procedure.

Finally, we note that creating a collision requires the

solution of a system of non-linear iterated exponential

equations. As far as we are aware, such equations can-

not be solved by standard methods of analytic number

theory. Moreover, we are not aware of any other hash

function in the literature whose collision resistance in-

volves such iterated exponential equations.

For its performance characteristics, design fea-

tures and dependence on what appears to be an in-

tractible mathematical problem, we believe this hash

function is worthy of further attention, and so we

present this preliminary report.

Our construction uses polynomials over ﬁnite

ﬁelds. We note that earlier works have used polyno-

mials over ﬁnite ﬁelds in the construction of hash al-

gorithms. See for example, the work of Krovetz and

Rogoway (Krovetz and Rogoway, 2000). However,

our use of polynomials is very different. Most other

existing hash algorithms are based on the Damgaard-

Merkle (Damgaard,1989), (Merkle,1989) approach.

The reader can ﬁnd a description of such algorithms

in the book of Menezes, van Oorschot and Vanstone

(Menezes et al,1997).

Following the recent ground-breaking work of

Wang (Wang et al, 2005), the Damgaard-Merkle de-

sign methodology has come under close scrutiny. Our

approach, however, is not based on the Damgaard-

Merkle methodology.

2 BRIEF DESCRIPTION OF THE

STEPS

The main steps are stretching, masking, forming a

collection of tables with bit strings, forming a knap-

sack and performing one exponentiation in a group.

We brieﬂy describe each of these steps.

2.1 Padding, Splitting and Masking

The message is stretched by appending 4096 bits con-

sisting of a ﬁxed string. This stretching operation

is different from the one described in, for example,

Aiello, Haber and Venkatesan (Aiello et al, 1998)

in which a randomized function is used to perform

stretching. In our algorithm, the purpose of stretch-

ing is to populate certain auxiliary bit strings. Let us

denote by k the length in bits of the padded message

The message is then split into overlapping seg-

ments which are interpreted as polynomials over F

of degree < n where n is chosen such that 3 < n < 11.

More precisely, denote by M(i, j) the substring of

M beginning with the i-th bit and ending with the

j-th bit. Also, denote by M[i] the i-th bit of M.

Let us deﬁne S(M,n) to be the set M(1, n),M(2,n +

1),··· ,M(k − n + 1,k),M(k − n + 2)M[1],M(k − n+

3)M(1,2),···M[k]M(1,n − 1). Each M(i,i + n − 1)

may be thought of as a polynomial of degree < n over

. Thus, S(M,n) consists of k polynomials of degree

< n. Note that the construction of the S(M, n) is a

stream procedure. We choose c values of n, where c

is a variable parameter.

Next, we perform a complicated iterative mathe-

matical procedure which we call masking. It is one-

to-one and length preserving. Though one-to-one, the

procedure is difﬁcult to invert and it involves ﬁnite

ﬁeld arithmetic. Let us denote by CUR

,· ·· ,CUR

the effect of this procedure. We view them as k poly-

nomials of degree < n. At any given time, we need to

store 2

of these polynomials.

For any bit string B, we deﬁne int(B) to be the

integer whose base 2 expansion is B. The registers

CUR

are constructed as follows. Set

= d

(i) = i− 2− int(M

i−1

), (1)

= d

(i) = i− 2− int(CUR

i−1

Let f(x) ∈ F

[x] be irreducible of degree n. Thus,

there is an isomorphism of ﬁelds

[x]/( f(x)) ≃ F

Denote by φ

the isomorphism of F

-vector spaces

[x]/( f(x)) −→ F

Let δ and β be generators of (F

[x]/( f(x)))

(resp.

[x]/(g(x)))

) corresponding to polynomials f(x)

and g(x) say. We set

CUR

= M

⊕ φ

(δ) ⊕ φ

(β), (2)

CUR

= M

⊕

(δ

(int(M

i−1

)+int(CUR

i−2

)mod2

)⊕

⊕φ

(β

(int(CUR

i−1

)+int(CUR

i−2

))mod2

)

for i = 2,...,2

+ 1, and

CUR

= M

⊕ (3)

(δ

int(M

i−1

)+int(CUR

)mod2

)⊕

⊕φ

(β

(int(CUR

i−1

)+int(CUR

)mod2

)

for i = 2

+2,...,k with d

and d

deﬁned by (1). Once

again, we stress that the procedure just described for

calculating the values CUR

is a stream procedure.

Moreover, as the result below indicates, the values of

theCUR

uniquely determine the original message M.

SECRYPT 2008 - International Conference on Security and Cryptography

104

Proposition 2.1. Let M and M

′

be messages of length

k with CUR

(M) = CUR

′

) for i = 1,· ·· ,k. Then

M = M

′

The proof is given in (Murty and Volkovs,2008).

If we choose c values n

,· ·· ,n

of n, then we cal-

culate the values of registers CUR

for all the ele-

ments of the sequences

S(M, n

),S(M, n

),...,S(M,n

Of course, for different n

we use different ﬁelds F

with the corresponding generators. Thus, we have

constructed the following set of collections of poly-

nomials

CUR

,CUR

,...,CUR

, (4)

2.2 Construction of Tables

We construct a number (we considered 200) of tables

containing 2

entries each. These tables are initially

set to zero and then individual entries of the table are

incremented according to a simple rule that depends

on the bits of the stretched message and the values of

the CUR

. At the end of this operation, we produce

tables the average entry of which is k/(2

∗ 200). In

particular, for a message of 1MB and using n = 4, the

average table entry will be an integer of size 2500. By

construction, the sum of the entries in the tables is the

same as the message length, namely k.

Choose an integer r > 2

. We will construct r ta-

bles Tb

,· ·· , Tb

as follows. Let q > 0 and g > 0

be chosen so that q + g < n. For example, we may

take q = g = 1. For any integer m, denote by i =

int(m (mod r)) the integer with 0 ≤ i < r and m = i

(mod r). Deﬁne the function

h = h(M) : { 1,2,· ·· , k} −→ {1,2,· ·· ,r}

as follows:

h(1) =

(

int(q (mod r)) + 1 if M[1] = 1

int(q+ g (mod r)) + 1 if M[1] = 0.

For i > 1,

h(i) =

(

h(i− 1) + int(q (mod r)) + 1 if M[i] = 1

h(i− 1) + int(q+ g (mod r)) + 1 if M[i] = 0.

If on the i-th step of the distribution, M[i] is 1, we

assign polynomial CUR

to the table h(i). By “as-

signing” an element CUR

to a table with index h(i)

we mean that we increment the counter int(CUR

) -

th element of the h(i)-th table by 1. We stress that

r, q and g are chosen so that it is impossible to send

any two neighboring polynomials from any sequence

CUR to one and the same table.

In practice, we have more than one value of n.

Thus, the above construction requires the choice of

,· ·· ,r

corresponding to n

,· ··n

. The construction

produces tables Tb

( j)

for 1 ≤ j ≤ c and 1 ≤ i ≤ r

2.3 Bit Strings

This is a technical construction which involves asso-

ciating a number of strings to selected entries of each

table. The length of these strings is a parameter that

can be chosen. The number of strings per table is an-

other parameter that can be chosen, with fewer strings

corresponding to greater loss of information. In our

tests, we associated up to 5 bit string to each table.

These strings are constructed so as to capture infor-

mation about the order in which the entries of the ta-

ble are updated. These strings require an additional

memory of about 5KB, assuming one string of length

200 bits per table. Details of this construction will be

presented in (Murty and Volkovs,2008).

2.4 The Spectrum

From these tables, we apply a certain linear trans-

formation to construct a “spectrum”. The maximum

value of a spectrum entry is approximately k/10 and

the average value is k/(2

n+1

∗ 10). In particular, for a

message of length 1MB, the spectrum from each ta-

ble contains about 25 bits. It is not necessary to store

more than one spectrum at a time.

2.5 Knapsack

Using the spectrum, we perform a Cantor enumera-

tion which computes one single number for each ta-

ble. This number is of the order (k/10)

. In particu-

lar, for a 1MB ﬁle, this is about 40 bytes. To the inte-

ger obtained by enumeration, we add the integer cor-

responding to the bit string. This produces, for each

table t, an integer I

. Now we form the sum

I =

∑

which is an integer of length 320 bits.

2.6 Exponentiation and the Hash Value

Now ﬁx a group G and P an element of this group.

We compute the point multiple I.P. For the group, we

can choose, for example, the group of points on an

elliptic curve over a ﬁnite ﬁeld of 2

elements, where

τ is a parameter that can be chosen. Finally, the hash

value if the x-coordinate of the point I.P. It belongs

to a ﬁnite ﬁeld of 2

elements and can be interpreted

as a bit string of length τ.

A POLYNOMIAL BASED HASHING ALGORITHM

105

2.7 Outline of the Algorithm

Our algorithm, then, can be described in brief as fol-

lows.

PARAMETERS:

,· ·· ,n

}

INPUT: Message

of length

OUTPUT: Hash value

bits

1. Compute the stretching and splitting

S(M, n

)

(

1 ≤ j ≤ c

)

2. Compute the masking

CUR

for

1 ≤ j ≤ c

and

1 ≤ i ≤ k

3. Compute the tables

)

for

1 ≤ j ≤ c

and

1 ≤ i ≤ r

. Each table has

entries

and each entry has

bits.

4. Compute bit strings and their

associated integers

5. From the tables, compute the spectra

and their associated integers

6. Use both sets of integers to compute

an integer

(for

1 ≤ j ≤ c

7. Compute

in the group.

8. Put the

together to form final

hash value

3 PRELIMINARY ANALYSIS

We tested the algorithm on the randomness tests pro-

vided by NIST. The results were as follows. For all

ﬁve output lengths, namely 160, 224, 256, 384 and

512 bits, all the tests were successfully passed. More-

over, all of the results we obtained during running

the NIST tests (NIST2,2006) show that we passed the

tests with certain reserve. For instance, for the Fre-

quency Test, in accordance with which “the number

of 1’s must be between 9,654 and 10,346”, we got

that the interval of number of 1’s for the 256-th bit of

the output was between 9817 and 10225.

The collision resistance of the algorithm is depen-

dent on the difﬁculty of solving several problems in-

cluding the problem of iterated exponential equations

over a ﬁnite ﬁeld. Details will be given in the ex-

tended paper (Murty and Volkovs,2008) in prepara-

tion.

4 SUMMARY

The hash function that has been brieﬂy described in

this announcement offers several attractive features

such as tunable output length, a tunable measure of

collision resistance. Moreover, in hardware it is run-

ning at 1.8 Gb/sec on an FPGA and it is expected to

run even faster on an ASIC. The design methodology

is not based on the Damgaard-Merkle approach and is

a bit-stream procedure.

REFERENCES

Aiello, W., Haber, S., and Venkatesan, R. (1998). New

constructions for secure hash functions (Extended ab-

stract). In Fast Software Encryption, LNCS vol.1372,

pages 150-167. Springer Verlag, Berlin.

Damgaard, I. (1989). A design principle for hash functions.

In Advances in Cryptology, LNCS 435, pages 416-427.

Springer Verlag, Berlin.

Hankerson, D., Menezes, A., and Vanstone, S. (2004).

Guide to Elliptic curve cryptography. Springer-

Verlag, New York.

Krovetz,T., and Rogoway,P. (2000). Fast universal hash-

ing with small keys and no preprocessing: the PolyR

construction. In Information Security and Cryptology

ICICS 2000, LNCS vol. 2015, pages73-89. Springer-

Verlag, Berlin.

Mal’cev, A.I. (1970) Algorithms and recursive functions.

Wolters-Noordhoff Pub.Co.

Menezes, A., van Oorschot, P.C., and Vanstone, S. (1997)

Handbook of Applied Cryptography. CRC Press.

Merkle, R. (1989). A Certiﬁed Digital Signature. In

Advances in Cryptology, LNCS 435, pages 218-238.

Springer Verlag, Berlin.

Murty, V. Kumar, and Volkovs, N. (2008). ERINDALE: A

polynomial based hashing algorithm. In preparation.

National Institute of Standards and Technology (2006).

Second NIST Workshop on Hash functions.

http://csrc.nist.gov/groups/ST/hash/second work-

shop.html, August 24-25, 2006.

National Institute of Standards and Technol-

ogy (2006). Computer Security Divi-

sion, Computer Security Resource Center.

http://csrc.nist.gov/groups/STM/cavp/standards.html

Wang, X., Yin, Y., and Yu, H. (2005). Finding collisions

in the full SHA-1. In Advances in Cryptology, LNCS

3621, pages 17-36. Springer Verlag, Berlin.

SECRYPT 2008 - International Conference on Security and Cryptography

106