INFINITE ALPHABET PASSWORDS

A Uniﬁed Model for a Class of Authentication Systems

Marcia Gibson, Marc Conrad and Carsten Maple

Institute for Research in Applicable Computing, University of Bedfordshire, Park Square, Luton, U.K.

Keywords:

User authentication, Password, Inﬁnite alphabet, Formal model.

Abstract:

In the paper we propose a formal model for class of authentication systems termed, “Inﬁnite Alphabet Pass-

word Systems” (IAPs). We deﬁne such systems as those that use a character set for the construction of the

authentication token that is theoretically inﬁnite, only bound by practical implementation restrictions. We ﬁnd

that the IAP architecture can feasibly be adapted for use in many real world situations, and may be imple-

mented using a number of system architectures and cryptographic protocols. A security analysis is conducted

on an implementation of the model that utilizes images for its underlying alphabet. As a result of the analysis

we ﬁnd that IAPs can offer security beneﬁts over traditional alphanumeric password schemes. In particular

some of the signiﬁcant problems concerning phishing, pharming, replay, dictionary and ofﬂine brute force

attacks are mitigated.

1 INTRODUCTION

It has been said that the user is often the weakest

link in security (Sasse et al., 2001), and as a result

when designing systems that we want to be effective

in practice, we must consider the needs and tenden-

cies of users.

A well documented example is the traditional

password, which is a sequence of characters. The

Universal Character Set deﬁned by ISO/IEC stan-

dard 10646 (ISO, 2003) is used as the basis for

many character encoding systems. For example, Uni-

code (The Unicode Consortium, 2009) corresponds

with ISO/IEC 10646:2003 plus amendments 1-6.

This contains “codepoints” comprising unique names

and integer references for nearly 100,000 characters.

When choosing passwords, individuals tend to select

from a substantially smaller subset of this: those di-

rectly accessible via their input device. For instance, a

standard United Kingdom keyboard can generate 103

printable characters by pressing a key or common key

combinations.

In addition, users will often ﬁnd recalling lengthy

strings of high entropy (Shannon, 1948) difﬁcult (Yan

et al., 2004). When faced with the predicament of

being unable to authenticate, they will often attempt

to reduce the effort involved. Well known methods

for this include, creating passwords based upon pre-

existing semantic associations such as, pet’s name or

writing passwords down (Klein, 1990), sharing one or

a handful of passwords between numerous accounts

(Gaw and Felten, 2006), or selecting passwords

that are shorter and less sporadically placed within

the overall password space (Morris and Thompson,

1979).

Actions such as these increase ease of use, but

also negate some of the intrinsic security beneﬁts that

may otherwise be offered. In the security research

community there has been a concerted effort to de-

velop novel authentication schemes that directly ad-

dress this problem. One such category of schemes

concerns the recognition or cued-recall of password

symbols from a presented visual (Dhamija and Perrig,

2000), auditory (Gibson et al., 2009), or haptic (Kuber

and Yu, 2006) alphabet. In systems that support soft-

ware alphabets, the potential alphabet size is larger

than those which are feasibly available when gener-

ated using hardware peripherals. In particular, the use

of optimally designed image and sound based alpha-

bets, have been found to aid memorability when com-

pared to a traditional text-based counterpart (Dhamija

and Perrig, 2000; Gibson et al., 2009). In theory, en-

hanced memorability should alleviate the need to de-

vise workarounds that undermine security. In this pa-

per we devise a formal model for an authentication

system architecture termed, “Inﬁnite alphabet pass-

words” (IAP), which utilizes software based alpha-

bets to enhance security. We envisage an IAP system

Gibson M., Conrad M. and Maple C. (2010).

INFINITE ALPHABET PASSWORDS - A Uniﬁed Model for a Class of Authentication Systems.

In Proceedings of the International Conference on Security and Cryptography, pages 94-99

DOI: 10.5220/0002986200940099

 SciTePress

as a system where there is no limitation in the num-

ber of available symbols that can be combined to form

password sequences.

An example of a true inﬁnite alphabet is the set of

all images containing a blue n-gon (triangle, square,

pentagon, ...). It should be clariﬁed, that although the

alphabet can be modeled as being inﬁnite, in practice

the alphabets are in fact virtually inﬁnite due to the

limitations of time and space. A feasible bound on

alphabet size, is the data width (number of bits that

can be used to represent a distinct symbol), as well

as the capability of the system upon which the alpha-

bet is generated to handle strings of a particular bit

length. Hence a more realistic example of a virtually

inﬁnite alphabet might be the set of images indexed in

the Google search engine, or the set of all top selling

music singles, in a given location and time-frame. Al-

though only virtually inﬁnite, such an alphabet would

be large enough for any practical purpose.

2 MODELING AN IAP

An inﬁnite alphabet password system is a model

where we allow the set of symbols that can be com-

bined to fomulate passwords A = {a

,...}

∼

N to

be inﬁnite.

In the following we assume that a user authenti-

cates him or herself to a server S via a client node

over an open network with the password data itself

stored as a database entry on the server. However we

note that in a more general setting depending on en-

visaged architecture the authentication may take place

on a grid, client node, removable media device or as

part of a cloud architecture.

As a consequence of the inﬁnity of A, we let dif-

ferent servers carry different alphabets. That means if

Σ denotes a system of servers with S ∈ Σ and A

⊆ A

denotes an alphabet used for authentication for S ∈ Σ

then A

∩ A

0 for any T ∈ Σ with S 6= T. Note

that because of N× N

∼

N this is possible also if we

model Σ

∼

N to be an inﬁnite set of servers.

The enrolment process is accomplished by the

user supplying to the system a unique identiﬁer u. Ex-

ample identiﬁer formats include biometric data, hard-

ware token, PIN or username. u is then stored via a

one-way hash function as a record on S ∈ Σ.

The server S on receipt of the the user id u gener-

ates (randomly) a ﬁnite subset A

S,u

⊆ A

, which be-

comes that user’s password alphabet. This is sent to

the client device for presentation, where the user se-

lects from A

S,u

a predeﬁned number of elements to

form a password. This sequence is transmitted to S

where it is also associated with u concluding the en-

rollment process.

At authentication to S, the user presents their

unique identiﬁer u. The server returns the ﬁnite set of

symbols from the inﬁnite alphabet namely A

S,u

. This

is presented to the user. No explicit conﬁrmation is

given as to whether the identifer data was correct. If

an incorrect identiﬁer v 6= u is entered, a decoyset A

S,v

with A

S,v

∩ A

S,u

0 is presented.

During key replacement, a new ﬁnite subset that

does not contain any of the users previous password

symbols is created and associated with the account

(not problematic due to the inﬁnity of A). From this,

the user may select a new password sequence as de-

scribed in enrollment.

In the remainder, we assume that the A

S,u

is pre-

sented to the user in authentication stages A

S,u

(1),

S,u

(2), ... with

S,u

(i) = A

S,u

. In the stage m

the user is presented with the set A

S,u

(m) and subse-

quently selects a number of elements from this before

being presented with A

S,u

(m + 1). We also assume

that the placement of symbols A

S,u

(m) is randomized

at each presentation.

2.1 Optional Enhancement: Injective

Password Sequence

We want the user to be able to authenticate via recog-

nition of their password symbols. If a symbol appears

in more than one ﬁnite presentation set A

S,u

(i) the

user also has to memorize the sequence in which they

must select their password symbols. This needlessly

increases cognitive load (as well as the probability

that insecure coping strategies will be adopted). For

this reason, we can optionally require that A

S,u

(i) ∩

S,u

( j) =

0 for any i 6= j. In a conventional ﬁnite al-

phabet this would reduce security, because the aver-

age search space for a successful brute force attack is

considerably decreased. In an IAP system, the search

space is inﬁnite, so there is no loss of security.

2.2 Optional Enhancement: Error

Feedback and Recovery

In traditional password systems, users are given min-

imal help in recovering from input errors. This is be-

cause in this case, an error is not simply an error, but

may be an indication of an attempt to break into the

system. In an IAP system, we have the option of pro-

viding feedback to the authentic user without inadver-

tently providing clues to an attacker.

We illustrate this with the following example. As-

sume that the client’s password is (a

ι(1)

,...,a

ι(n)

)

with n even and a

ι(2m−1)

ι(2m)

∈ A

S,u

(m) for 1 ≤

INFINITE ALPHABET PASSWORDS - A Unified Model for a Class of Authentication Systems

m ≤ n/2, i.e in every authentication stage the user se-

lects two password letters. When the selection (b,c)

in stage m is correct, i.e. (b,c) = (a

ι(2m−1)

ι(2m)

)

the client is exposed to the set A

S,u

(m + 1) unless

2m = n and access is granted. When the selection is

incorrect, the user is exposed to a different, unrelated

set B

S,u

(m,b,c) ⊆ A. In the model we may assume

that all sets B

S,u

(m,b,c) and A

S,u

(m) are disjunct to

prevent an intersection attack. On presentation of

S,u

(m,b,c), the authentic user may notice that the

presented set B

S,u

(m,b,c) differs from the expected

set A

S,u

(m). At this point, they may signal to the sys-

tem that they cannot ﬁnd their password element, for

instance via a “reset” option. This action will return

them to the ﬁrst stage A

S,u

(1) hence allowing them to

retry their authentication attempt from the beginning.

The user who does not recognize the disjunctive na-

ture of A

S,u

(m) and B

S,u

(m,b,c) may be identiﬁed as

a potential intruder and be implicitly excluded from

the system.

2.3 Optional Enhancement: Time

Localization

We may also design the alphabet A such that it con-

sists of an inﬁnite number of pairwise disjunct equiv-

alence classes A(1), A(2), A(3), ... of inﬁnite size

such that a ∼

A(k)

′

when a and a

′

have a same, or ob-

viously related semantical meaning. The user when

requested to enter the password, must be able to iden-

tify elements of the same equivalence class and distin-

guish elements of different equivalence classes. Ex-

amples of such an equivalence class may be images

where the colors have been distorted, for example in a

similar way to the images described in (Hayashi et al.,

2008), or those that relate to clear concepts such as

“tree” or “house”.

The alphabet is then the set of equivalence classes

and in the authentication process the client is exposed

to a member of each equivalence class. The mem-

ber chosen from each equivalence class changes from

session to session.

An illustrative example for such a mechanism us-

ing equivalence classes might be dogs shown in dif-

ferent positions (sitting, standing, jumping). Large

numbers of different images of a particular dog can

be generated easily by making a movie of a dog and

using the stills. The password alphabet A(S,u) would

then be a set of different equivalence classes, each of

which would feature a different dog. A possible in-

truder who wants to guess the password must then, for

successful intrusion, know that the key is given in the

dog itself, independent of the behavior that is shown

for each dog. In addition, a nonce can be added, for

example in the form of a watermark in order to en-

hance security further.

3 SECURITY ANALYSIS

In this section we envisage a feasibly implementable

adaptation of the IAP model, where the alphabet A

is no longer inﬁnite, but instead only very large. A

and all related subsets are hence also ﬁnite, as are the

number of S elements contained within Σ.

In order to keep things as simple as possible, we

use a conservative example and imagine that A is

composed of bitmaps (i.e. no compression). Each

image is 150× 150 pixels with 24 bit color (RGB 0-

255). The total number of unique images is hence:

150

× 256

= 377,487,360, 000. Not all of these

would be suitable for use as alphabet elements, as

many would not be perceivably different for users.

For this example, we will specify that 0.0001% of

the elements are easy to distinguish, and hence suit-

able for inclusion in A. Therefore, A consists of

37,748,736 images.

We distribute A equally over 10,000 Web servers,

providing on average around 3,770 images to each.

Each user is given a personal ﬁnite subset of 50 im-

ages. In this conservative example, we therefore sup-

port around 75 users per server. Given the dimen-

sions and bit depth, each image would require approx-

imately 66 KB of storage space, and for each one we

also use 4 additional semantic equivalence class im-

ages. This would require a total of around 18,850

unique images to be stored and 1.1 GB of storage

space per server.

We assume that each user account is protected by

a username, coupled with a sequence of 5 IAP pass-

word elements chosen from a distracter set of 50 and

that presentation takes place over 5 screens. Finally,

although we wish our model to be independent of any

particular cryptographic scheme, this directly affects

security against some attacks. For this reason we con-

sider that passwords undergo a one-way hash function

before being stored on the server, and that the set A is

stored in unencrypted form, but that encrypted point-

ers are used to specify a mapping between a user ac-

count and a corresponding ﬁnite subset.

We consider the following typical scenarios of at-

tacks upon this adapted IAP system, where all three

optional enhancements namely, injective password

sequences, error feedback and recovery, and time lo-

calization have been incorporated. Here, Attacker

Mallory attempts to gain access to user Alice’s IAP

protected account.

SECRYPT 2010 - International Conference on Security and Cryptography

3.1 Tricking the User into Verbally

Communicating the Key

Mallory knows Alice’s username and fools her into

revealing her password sequence. The chance that

this attack would be successful is a property of the

communicability of Alice’s individual alphabet el-

ements. If we imagine that they are images with

concrete semantic associations, such as, “child” or,

“dog”, then the system is neither stronger nor weaker

than a traditional password in this sense.

3.2 Finding a Written Record of the

Password

Attacker Mallory gains access to Alice’s workstation

where the username and a description of the password

symbols has been sketched or written down.

Again, the chance that this will happen is directly

related to the intrinsic memorability and communica-

bility of the password elements. In our example, the

user may be less likely to write their password down.

If they do however, then the system is again neither

stronger or weaker than a text-based counterpart.

3.3 Password Prediction

Attacker Mallory comes to know Alices interests and

preferences along with her username. He proceeds

to enter the username at the interface and the system

responds with the ﬁrst 10 clips from her password al-

phabet. Mallory then attempts to predict Alices cho-

sen clips given his knowledge of her taste, in an at-

tempt to gain access.

A system offering an alphabet with pre-existing

semantic associations, or with some images being

more visually appealing than others would be prone

to entry using this method.

This is conﬁrmed in (Davis et al., 2004), where a

visual authentication system utilizing photographs of

faces as the alphabet was implemented. Here, users

tended to select attractive faces over less attractive

ones, or faces of people belonging to the same racial

group as themselves. As way of countermeasure, it

might be possible to learn from the user at enrolment

their preferences and interests, in order to minimize

the predictability of choices. This should not be prob-

lematic given the large corpus of images available.

However some may see this as an infringement of pri-

vacy.

3.4 Phishing and Pharming

Here Alice is either redirected to, or receives an email

enticing her to view an intruder Web site masquerad-

ing as the genuine log in page for the IAP system.

On loading the imposter site, Alice is prompted for

her username and on entry is presented with a set of

challenge images. Here, the challenge set presented

would be incorrect, and Alice would be unable to pro-

ceed with authentication, thereby securing her pass-

word information.

3.5 Hard or Software Keylogger

In this attack Alice’s inputs to the access device are

recorded and sent to Mallory. Since Alice will se-

lect her password symbols using a pointing device,

Mallory would receive information that Alice has

clicked, and possibly the co-ordinates of where she

has clicked, but not information about what she has

clicked. Element placements are shufﬂed between log

in sessions, therefore her password information is se-

cured.

3.6 Malicious Screen Capture

In this attack a sequence of images detailing Alice’s

interactions with the IAP interface is viewed remotely

by Mallory who later attempts to reenact the process.

Image based alphabets are particularly vulnerable to

this type of attack. It would therefore be necessary

to incorporate a mechanism to delineate or disguise

interactions with password elements during selection.

An example of such a scheme is the random cursor

matrix system described in, (Boit et al., 2009).

3.7 Timing Attack

In this attack, Mallory gains physical access to Alice’s

access device where he enters Alice’s username and

is presented with the ﬁrst presentation screen of her

challenge set. Mallory proceeds to guess at random

the ﬁrst element and measures the length of time it

takes for the images in the subsequent challenge set

to load in full. Mallory then uses the recovery option

to return to the ﬁrst screen where he selects a new

symbol and repeats the process. On happening upon

a second screen that loads the symbols more quickly

than others, Mallory assumes that the symbol set has

been accessed by Alice before because it is cached

locally in her browser. The symbol that was clicked in

order to load the screen was hence a correct password

symbol.

INFINITE ALPHABET PASSWORDS - A Unified Model for a Class of Authentication Systems

There are two options available as countermeasure.

Firstly, at enrolment the system may download every

image in the user’s ﬁnite set, to the client machine,

whereby it is cached. Or secondly, to ensure that no

symbol is cached locally. The second option seems

most sensible, as caching all symbols on the local ma-

chine would be ineffective for those users who happen

to clear their browser cache regularly, and may create

new vulnerabilities to attack via the client.

3.8 Brute Force Attack (Online)

Mallory knows Alice’s username and enters it at the

interface. The system responds by returning the ﬁrst

subset of her individual alphabet. From here Mallory

selects symbols at random until the correct password

sequence is obtained.

Permutations in Alice’s alphabet is q

, where q is

the number of symbols contained in the alphabet and

r is the password length.

In our example, q = 10 and r = 5 as users choose

symbols over ﬁve screens, each offering a choice of

10 clips. Giving a total of 100,000 permutations and

means the average brute force attack would elicit the

correct sequence in 100,000/2 attempts. For practi-

cal purposes we can mitigate this risk by blocking au-

thentication after a given number of attempts is ex-

ceeded.

This approach does not rule out a “low and slow”

attack, where Mallory circumvents the lock out policy

by distributing his guesses over a number of user ac-

counts. We must for this reason, ensure that the num-

ber of accounts the system supports, and upon which

Mallory can make a password guess, is substantially

lower than the average number of guesses required.

In the example system, we support 75 accounts. We

imagine that each account allows up to 3 authentica-

tion attempts before detection and that Mallory does

not wish to reach this limit. In this scenario, he is

able to make 150 guesses – far lower than the average

5,000 required.

3.9 Brute Force Attack (Ofﬂine)

Attacker Mallory gains access to Alice’s password

hash and the symbols stored in A (which are in the

clear). Let us assume that in this instance, the key

that is required to reveal the pointers to Alice’s ﬁnite

subset symbols is derived from the password symbols

themselves (which Mallory does not yet know). As

a result, in order to ﬁnd the password, Mallory must

compute all permutations of A and pass these to the

same hash function used to encrypt the password.

A itself contains 18,850 unique images, although

they each belong to an equivalence class. We hence

compute permutations based upon the number of

equivalence classes, of which we have 3,770. We pre-

vent the user from selecting the same clips more than

once, therefore the maximum possible number of per-

mutations Mallory can create based upon the num-

ber of sounds in the list is q!/(q− r)! ≃ 7.595× 10

where q = 3,770 and r = 5. This is roughly equiv-

alent to the number of combinations possible from a

9 character traditional password over an alphabet of

103 letters (i.e ≃ 1.304× 10

3.10 Dictionary Attack

Mallory uses a list of historically common pass-

word sequences and submits these to the system in

an attempt to elicit access to Alice’s account. The

principal requisite for Mallory’s success is that non-

standard frequencies must exist in the passwords se-

lected (i.e. there needs to exist common passwords).

The strength of an IAP against an attack of this type

is a result of the relative popularity of the images used

as alphabet elements.

The risk may be mitigated by splitting up the chal-

lenge set into subsets. During sequential presentation

the user must be shown at least one popular image

in each presentation set in order to create a password

made up of completely popular elements. In a tradi-

tional scheme, all of the popular passwords are avail-

able to the user for selection all of the time. Further-

more, an important aspect of our model is that each

server can be populated with an individual symbol al-

phabet A. The result is that we could do away with

the practice of attackers employing standard password

cracking dictionaries to gain access, as any dictionary

created would only be of use against the system upon

which it was originally generated, because the letters

used to create a password sequence therein, would not

be available as password elements elsewhere.

3.11 Replay Attack

In this attack, Mallory eavesdrops the connection be-

tween Alice’s access device and the server and gath-

ers message digests containing the identity of Alice’s

password symbols. Here, a nonce value is used as a

wrapper to the time-localized password, which is en-

coded using a one-way hash. If Mallory attempts to

replay the session back to the server at a later time, the

nonce is incorrect and the request denied. If Mallory

learns to predict the nonce and then creates a counter-

feit, he would still be unable to replay the data until

SECRYPT 2010 - International Conference on Security and Cryptography

the server expects a message using the same equiva-

lence class image.

4 DISADVANTAGES OF IAPS

The drawbacks are the storage and bandwidth require-

ment. As a solution, we could populate a system with

compressed images or sounds in order to reduce stor-

age consumption. Another option in the case of im-

ages might be to use vectors or fractals. This would

allow for better upwards scalability in systems that

support a large number of accounts.

A third option is that we could weaken the condi-

tion that the user’s personalized alphabets are disjunct

for any two users. It should be possible to implement

the IAP architecture and only require a low probabil-

ity that two users share a given symbol in their mutual

alphabets – This might have ramiﬁcations elsewhere,

and hence requires further research.

5 CONCLUSIONS

We modeled passwords as utilizing an inﬁnite alpha-

bet, allowing us to devise an optimized architecture

upon which image and sound based authentication

schemes can be based. We give an example of a fea-

sible implementation of the IAP model, using images

as a password alphabet, as a result we ﬁnd that al-

though modeled on inﬁnity, the architecture can be

feasibly adapted for use in many real world scenar-

ios. The envisaged system underwent a security anal-

ysis, wherein it was found that depending upon the

nature of the alphabet used, the system is at least

as strong as a traditional alphanumeric counterpart

against social engineering and online brute force at-

tacks and more secure against replay, keylogging,

phishing, pharming, ofﬂine brute force and dictio-

nary attacks. However, when image based alphabets

are implemented, the model is weaker than traditional

passwords against the threat of remote screen capture.

It is therefore essential that any image based IAP sys-

tem also incorporate countermeasures to mitigate this

risk.

The IAP model was developed with ﬂexibility in

mind. For this reason, it should be implementable

over a number of preferred architectures and cryp-

tographic protocols. It is hoped that the model may

prove useful to those considering future implementa-

tions of alternative authentication schemes.

REFERENCES

Boit, A., Geimer, T., and Loviscach, J. (2009). A random

cursor matrix to hide graphical password input. In

SIGGRAPH ’09: SIGGRAPH ’09: Posters, pages 1–

1, New York, NY, USA. ACM.

Davis, D., Monrose, F., and Reiter, M. K. (2004). On user

choice in graphical password schemes. In SSYM’04:

Proceedings of the 13th conference on USENIX Se-

curity Symposium, pages 11–11, Berkeley, CA, USA.

USENIX Association.

Dhamija, R. and Perrig, A. (2000). D´ej`a vu: A user study

using images for authentication. In Proceedings of

USENIX Security Symposium, pages 45–58, Denver,

Colorado.

Gaw, S. and Felten, E. W. (2006). Password manage-

ment strategies for online accounts. In SOUPS ’06:

Proceedings of the second symposium on Usable pri-

vacy and security, pages 44–55, New York, NY, USA.

ACM Press.

Gibson, M., Renaud, K., Conrad, M., and Maple, C. (2009).

Musipass: authenticating me softly with ”my” song.

In NSPW ’09: Proceedings of the 2009 workshop on

New security paradigms, pages 85–100, New York,

NY, USA. ACM.

Hayashi, E., Dhamija, R., Christin, N., and Perrig, A.

(2008). Use your illusion: secure authentication us-

able anywhere. In SOUPS ’08: Proceedings of the

4th symposium on Usable privacy and security, pages

35–45, New York, NY, USA. ACM.

ISO (2003). ISO/IEC 10646:2003 Information technology –

Universal Multiple-Octet Coded Character Set (UCS).

Klein, D. V. (1990). “foiling the cracker” – A survey of, and

improvements to, password security. In Proceedings

of the second USENIX Workshop on Security, pages

5–14.

Kuber, R. and Yu, W. (2006). Authentication using tactile

feedback. In HCI Engage 2006, Interactive experi-

ences.

Morris, R. and Thompson, K. (1979). Password secu-

rity: A case history. Communications of the ACM,

22(11):594–597.

Sasse, M. A., Brostoff, S., and Weirich, D. (2001). Trans-

forming the ‘weakest link’ — a human/computer in-

teraction approach to usable and effective security. BT

Technology Journal, 19(3):122–131.

Shannon, C. (1948). A mathematical theory of communica-

tion. The Bell System Technical Journal, 27:379–423.

The Unicode Consortium (2009). The Uni-

code Standard, version 5.2.0. Moun-

tain View, CA. ISBN 978-1-936213-00-9.

http://www.unicode.org/versions/Unicode5.2.0/.

Yan, J., Blackwell, A., Anderson, R., and Grant, A. (2004).

Password memorability and security: Empirical re-

sults. IEEE Security and Privacy, 2(5):25–31.

INFINITE ALPHABET PASSWORDS - A Unified Model for a Class of Authentication Systems