Lightweight security for Internet polls

Alessandro Basso, Francesco Bergadano, Ilaria Coradazzi, Paolo Dal Checco

Dipartimento di Informatica, Università degli Studi di Torino, Corso Svizzera 185 Torino,

Italy

Abstract. Is it possible to implement practical Internet Polls that fulfill even

the weakest security requirements? The technology available today would lead

to a negative answer, because of the following practical constraints: standard,

unmodified browsers are used, it is not economically possible to distribute cer-

tificates or even just user names and passwords, users connect from different

workstations, possibly behind firewalls, proxies and address translation nodes.

In this paper, we define an innovative notion of Internet Poll security, namely

“Security against Massive Falsification”, and we present a method that we con-

sider to be secure with respect to this definition. We discuss the security prop-

erties of the method with respect to existing techniques, and then propose a

public challenge for testing the strength of our claim

1 Introduction

Internet polls could be an important means for collecting information on people’s

preferences and opinions “faster and cheaper”. The quality of such information, how-

ever, is indeed very low today, due to the complete lack of security – existing Internet

poll systems may be attacked by automatic programs, and the poll results may be

overturned in minutes. In this paper we will define minimum security requirements

for providing poll data that can be meaningful, even the weakest way. We also pro-

pose an Internet poll system that does not require user registration, accepts votes

originating from the same IP address, and yet satisfies said security requirements.

2 Related work

In this section we describe the present state of the art in the area of Internet Polls. We

first define what an Internet Poll is and the related functionalities. We then proceed

considering the structure of a typical program used to perform automatic and multiple

voting, to better understand the protection techniques used to prevent such software

from being used. We conclude this section comparing a selection of existing polling

services.

Basso A., Bergadano F., Coradazzi I. and Dal Checco P. (2004).

Lightweight security for Internet polls.

In Proceedings of the 1st International Workshop on Electronic Government and Commerce: Design, Modeling, Analysis and Security, pages 46-55

DOI: 10.5220/0001402600460055

 SciTePress

2.1 Poll systems

An Internet poll is a Web application with the purpose of giving users an opportunity

to express their opinions regarding the topic of the poll. For this reason a lot of web

sites offer to their users the possibility to insert polls in their personal web pages.

Therefore, one can easily create and manage a set of polls at the same time. In order

to achieve this task, an Internet user who needs such a service must sign up at the poll

system site and create his own poll, compiling the special fields on a proper form.

Then the user has to copy and paste the code provided by the system on his web page.

Normally, a poll system is a dynamic application, which therefore takes advantage

of the client/server paradigm. We can identify two main components, the client side

and the server side.

The client side consists of a graphical interface, which shows to a voter the title,

message and possible choices of a poll. This interface is created by mean of a web

page displayed in a generic Internet browser. An Internet user can select a voting op-

tion and express its own preference by clicking on the “vote” button or see the poll’s

results.

The server side (in the form of a script, like a CGI or PHP script) has to receive the

data form the client, in order to bring operations to completion. These data consist of

the user’s preference and must be uniquely identified to avoid mistakes in the voting

procedure.

Let us consider an example, to better understand this concept. The example poll

has three different choices:

• Choice one

• Choice two

• Choice three

When a user selects one of these choices and clicks on the “vote” button, an identi-

fier of the choice is sent to the script responsible for the voting procedure. Let us

imagine of choosing the “Choice two”: the id sent to the server is “B” and the script

can obviously determine our preference.

2.2 Software for multiple and automatic voting

Writing some software that is able to perform multiple and automatic voting is fairly

simple (see, e.g., the discussion on cookie poisoning in [12]). The basic idea is to

analyze the HTML page which contains the poll, seeking for the text of possible

choices. Such a page is normally downloaded by the browser in a temporary-cached

file; therefore the search can be easily performed by looking for that file.

The second step is to determine the id which has to be sent to the voting script to-

gether with the other necessary parameters, like cookies and poll’s name.

The third and last step is to write a program to simulate the voting procedure for

any given number of times, by providing the voting script with the id and the other

needed information. Such a kind of program must take in account that the poll appli-

cation might use a protection system, like the ones described in the next paragraph. In

that case, each time it sends a vote, the program has to pay particular attention in by-

passing the protection mechanism of the poll.

107

2.3 Protection techniques against automatic voting

Because of the stateless nature of the HTTP protocol [7], there is no way of securely

storing any operations performed by voters in different sessions. Therefore, creating a

polling system may turn into a difficult task, since it can be a problem to understand

whether a voter has already expressed his opinion. There are however different tech-

niques to check and prevent multiple voting. Unfortunately, at the moment the secu-

rity level offered by such techniques is very low because it is still possible to exploit

the weaknesses of these methods to automate the voting process.

At the present time, the techniques used by existing poll systems to avoid multiple

voting are:

• IP locking

• Cookie-based methods

In the IP locking method, the check is performed by verifying the IP address of

voters. Therefore an IP address that has already voted, cannot send another vote for a

fixed time period. Unfortunately, that also means that some categories of users might

not be allowed to vote for a poll. In particular:

• People connected to the Internet through a dynamic IP address assigned by

the ISP. In this case, the same IP address can be subsequently assigned to

two different computers. Thus, the second pc will be excluded from the vote.

• People connected to the Internet from multi-user workstations.

• People connected to Internet from a LAN (Local Area Network) which uses

a NAT translator. The purpose of the NAT mechanism is to hide internal ad-

dresses of the LAN and, in the mean time, allowing all computers of the net

to surf the web using only one public IP address [EF94, Bel00].

• People connected to Internet behind a Proxy. As in the NAT situation, dif-

ferent users have the same IP address, so they are all excluded from the vot-

ing but the first.

Sometimes, the IP locking technique is combined with an improvement called

Browser header, which is based on the control of HTTP packets exchanged between

client and server. In particular, it checks the request headers issued from the browser

and verifies whether consecutive packets are identical. If so, the vote is not consid-

ered valid. In other words, the same IP addresses must not have the same HTTP

headers.

This protection allows multiple users connected form a LAN behind a NAT or

Proxy to vote, but its security is still weak. Indeed, an attacker might use the same IP

address and alter the content of HTTP headers each time he wants to vote. Since the

possible combinations of modified headers are quite many, it is easy to produce a

very large number of different headers.

Another interesting improvement of the IP locking technique is based on time in-

tervals. Two subsequent votes are considered invalid if the time interval between

them is less than a fixed value. This schema can be easily bypassed using an auto-

matic voting program, setting an appropriate time interval for consecutive votes.

Therefore, it avoids the main problem of the IP locking method, allowing users in

an environment protected by NAT or Proxy to correctly express their preferences. It

does not guarantee, however, that every user will be able to participate to the poll,

108

since it a difficult operation setting the time interval to the appropriate value for every

situation.

The second technique used to protect web polls from automatic and multiple vot-

ing is known as Cookies method. A cookie is a text file that contains the information

exchanged between client and server [8]. This information interchange is necessary to

the server to be able to verify whether a client has already voted. Data stored into a

cookie can be various and comprehend:

• the cookie name and associated value

• the URI for which the cookie is valid.

• the validity domain.

• the validity time expressed in second.

A cookie used for management of multiple votes also contains a flag to indicate

whether the client, holder of the cookie, has already voted. The main problem about

using cookies concerns the fact that users can disable them or delete them making

possible the creation of a program designed to alter votes’ results.

3 The Method

In this section we consider the method which characterizes our solution for the prob-

lem of multiple automatic votes. First we introduce some premises related to our solu-

tion, in order to better understand it. Then we proceed describing the characteristics

of the basic idea.

3.1 Human-Machine discrimination

Proving that one is a human to another human is fairly simple. This problem can be

easily solved by mean of “The Turing Test”, defined by Alan Turing in 1950 as the

foundation of the philosophy of artificial intelligence. Basically, a human judge asks

a set of questions to both a machine and a human being and discriminates among

them, depending on their answers.

On the other hand, proving that one is a human to a computer is much more diffi-

cult. Indeed, it is required what M. Blum, L. A. von Ahn, and J. Langford call “Com-

pletely Automatic Public Turing Test to tell Computers and Humans Apart”, or

CAPTCHA, in [1]. CAPTCHA is a set of tests which can be graded from computers

and designed in such a way that humans can pass but computers fail.

A generic CAPTCHA is characterized by the following properties:

• The test can be automatically created.

• It can be easily and quickly passed by a human.

• A test must be suitable for a very large majority of humans, with few excep-

tions.

• Virtually no machine can pass it.

• It must be able to resist automatic attacks even considering future technol-

ogy improvements and even if the test’s algorithm is published.

109

In [5], Allison L. Coates, Richard J. Fateman and Henry S. Baird present an appli-

cation of CAPTCHA which involves the extensively studied gap in image pattern

recognition ability between human and machine vision systems. As clearly showed in

their studies, nowadays the gap in ability between human and machine vision is wide

and is only slowly narrowing. This fact, which could be a serious problem in some

sectors of Computer Science, on the contrary is perfectly suitable in satisfying the

growing need for automatic methods to distinguish between human and machine us-

ers on the Web.

It has been experimentally showed that, by means of a correct choice of some well

determined parameters involving the quality of an image, it is possible to generate

human-legible images containing some text which are illegible to several of the best

present–day optical character recognition (OCR) machines. These images of printed

text are characterized by low quality and quite strong degradation but still readable,

with little or no conscious effort, by almost every person literate in the Latin alphabet

and with some years of reading experience. However, such schemes may have vul-

nerabilities as shown in [10].

Coates, Fateman and Baird [5], in agreement with the discussion in [13], de-

fine a series of parameters that are usually considered problematic and cause of errors

in the optical character recognition process:

• Thickened images, so that characters merge together.

• Thinned images, so that characters fragment into unconnected components.

• Noisy images, causing rough edges and salt–and–pepper noise.

• Condensed fonts, with narrower aspect ratios than usual; and Italic fonts,

whose rectilinear bounding boxes overlap their neighbors’.

Our solution for the problem of automatic voting has been developed by applying

these concepts to the polls context, as we show in the next paragraph.

3.2 A solution for the problem of automatic voting

As the reader can easily see, apparently there is no way to prevent poll results’ falsifi-

cations, if software like the one described in paragraph 2.2 are used to attack an

Internet poll system. Indeed, in such a system the following statement can be assumed

as true:

as long as the information used as a voting parameters is kept in a machine-

readable form on the client, an attacker can use it to generate arbitrary auto-

matic votes.

It should be clear now that the only way to avoid an attack for falsifying poll's re-

sults is to prevent the client from storing sensitive information in a machine-readable

form. Therefore, we have to consider what possible techniques can represent informa-

tion in a way that is easily understood by humans but difficult to comprehend by a

machine. Our idea for getting a higher security level in the area of Internet polls de-

rives directly from the concepts stated in the previous paragraphs. We though that a

way for preventing automatic programs from voting relies on the ability of our system

to generate a test which can be easily solved by human-voters but impossible for ma-

chine-voters.

110

The test that we consider is slightly different from a typical CAPTCHA. Indeed,

while the latter is mainly meant for distinguish between humans and computers, ours

is designed to make the voting procedure impossible to a non-human user.

The main idea is to include the poll’s choices into a runtime-generated image in

order to remove them from the web page sent to the client’s browser. In this way, we

prevent computers from analyze the HTML page containing the poll for finding cor-

rect bindings between choices’ text and their ids.

The image which contains the choices is created on the server each time a user re-

quests a poll page and the order in which each choice is displayed on the screen is

randomly chosen. The choices’ order must be stored somewhere in order to allow the

server to understand the mapping between the id of the preference chosen by a voter

and the correct choice. We name such an order “mapping scheme”.

It merely consists of the number of the choice which is first in the image. There-

fore, when a voter asks for a generic poll page, the server determines the mapping

scheme for that request and generates the image in the following way:

1. It gets the n choices of the poll in their standard order.

2. It randomly generates a number x, between 1 and n.

3. It creates the image by starting with the choice pointed by x. The second

choice is the one that follows it in the standard order and so on.

4. It stores the mapping scheme into a cookie, after encrypting it with a

symmetric algorithm.

Basically, the standard choices’ order is rotated using one randomly chosen choice

as pivot element.

This procedure is executed each time one asks for a poll page; therefore the

choices are always presented in an unpredictable order. When one votes, the id of the

selected preference is sent to the server along with the cookie containing the mapping

scheme. The server is then able to reconstruct the binding between the chosen id and

the proper choice and therefore it can correctly increment the counter of the selected

preference.

The image containing the poll’s preferences is characterized by some important

features to prevent OCR programs from being able to read the text inside the picture,

therefore breaking the protection scheme.

In particular:

• Its font must be carefully chosen, preferring those which are known to be

difficult to recognize by an OCR.

• Its quality cannot be too much poor, otherwise the image becomes unread-

able even for humans.

• It should contain some pixels which have a different color shade from the

others in order to prevent the identification of the image itself by means of

hashing functions. These pixels are randomly chosen inside the picture as

well as their shades.

Our solution includes also the classic cookie-based protection technique, so that it

is even more difficult to bypass it. The voting flag is stored inside the cookie, along

with the mapping scheme. This fact makes its modification more complicated because

it is not possible just altering the flag value, since it is encrypted (see also [11]).

Therefore, one has to delete the entire cookie to be allowed to vote again. Clearly,

111

just the deletion of the cookie is not sufficient to permit an automatic voting. Indeed,

in such a case, the attacker is only able to perform random votes, since the mapping

scheme is unknown. Thus, the only achievement is to increase the number of votes of

each choice in a constant way, because each preference has an equal probability of

being chosen.

In particular, if the number of choices is n, each of them has a probability p = 1/n

of being the first in the image. In this situation, the only effective automatic its id and

vote for it each time. Therefore, every choice has the same possibility of being cho-

sen, making the automatic voting attack a mere increment of each choice result, with-

out affecting the distribution of votes which remains the same.

However, there is still a method to alter the results of the poll by means of auto-

matic voting. Indeed, if one copies the cookie content before voting, he can then re-

use it for the following votes.

This reused cookie contains always the same mapping schema; therefore it is easy

to vote for the chosen preference for an unlimited number of times.

To prevent such an attack, we store the content of the cookie in order to check it

out at every voting request. Since it is created to be unique, we refuse multiple votes

which use the same cookie content.

4 Security properties

In this section we discuss the security properties of the presented Internet poll system.

First, however, we must discuss which level of security is practically achievable for

polls in an open Internet scenario, where clients and remote networks are not known

and cannot be modified. Then, we will define a notion of security that could apply for

polls in such a scenario, namely "security against massive falsification". Finally we

will argue that our proposed poll scheme satisfies this notion of security and propose

an open challenge to test our claim.

4.1 Practical security for open Internet polls

Practical Internet polls depend on the following facts:

• Clients (Browsers) cannot be customized, nor initialized or modified in any

way.

• It is practically impossible to distribute passwords or secrets of any kind for

the authentication of voters.

• Biometric approaches are either too expensive or too imprecise, or both.

Techniques based on keystroke dynamics [4] may be applicable, but re-

quire too much typed text.

• Remote network architecture is unknown and may not be modified, it may

include proxies and network address translation (NAT).

• Clients may change their IP address due to user mobility and local address

assignment (e.g., DHCP).

112

• Users may connect from more than one client machine.

Given the above constraints, it is immediately clear that a high level of security,

such as that required for voting, is not a practical goal. Even under extremely restric-

tive poll schemes, such as the ones based on IP locking, users can simply change their

workstations, connect from a different network, and vote twice. Since they do not

own cryptographic tokens and are not given passwords, it is impossible to authenti-

cate them and avoid multiple voting. One actually wonders whether even very weak

security properties apply. We try to propose one such notion below.

4.2 Security against massive falsification

We must consider the following important fact:

multiple voting may be programmed - one may write an ad-hoc client program

explicitly designed to kill the target Internet poll scheme.

This is easily understood with cookie-based schemes, as shown in paragraph 2.2.

IP-locking schemes are more robust with respect to automated voting and even

manually repeated voting in general, because the number of IP addresses available for

an individual is limited. Since Internet poll schemes are generally implemented over

HTTP and over the TCP connection oriented transport [Bel89], IP address spoofing is

not possible in general

However IP-locking schemes are highly restrictive, and prevent many kinds of le-

gal voting, as explained in paragraph 2.3.

All techniques that do allow legal votes, meaning actually all known approaches

except IP-locking, are vulnerable to automated attacks. This means that not only false

and repeated votes are possible, but also that such votes may be generated by a pro-

gram in very large numbers. The result of the poll may then be changed completely in

minutes. We call this attack “massive falsification”.

Based on the above discussion, we are now able to define “security against mas-

sive falsification” for Internet polls: an Internet poll scheme is secure against mas-

sive falsification if (1) Internet users are always allowed to participate in the poll

and (2) there is a significant cost in the programming of massive multiple voting.

The first requirement means that users may vote when they have the right to do so.

This basically eliminates IP-locking, where users may be unable to vote just because

somebody else had voted with the IP address they are now using. The second re-

quirement states that it is impossible or difficult to write a program that generates an

arbitrary number of repeated votes, or that it will be possible to eliminate such re-

peated votes in a post-processing phase. 'Massive falsification' is then avoided, but it

If an IP address is spoofed, it will be impossible, in principle, to establish a connection from

that client to any server, since the second message in a TCP three way handshake will not

reach the originator of the first message. However, it should be noted that in some particu-

lar situations it is still possible to realize a form of 'blind' TCP/IP spoofing that would be

critical for poll schemes based on IP locking. For further information about this matter, see

[VKI99, BG00, Lud03].

113

will be possible to send a limited number of false votes, e.g., but manually voting

more than once from different client machines.

4.3 Security properties of order-based polls

We claim that our proposed order-based poll scheme satisfies the requirements of

security against massive falsification:

1. Users may vote if they have not voted already because there is no IP-locking,

one may vote any number of times from the same IP address. Hence, users

behind a proxy or NAT may vote normally. Users with dynamic IP assign-

ments are guaranteed to be able to vote.

2. The order of the options as displayed by the poll window is random and un-

predictable by the client. One must therefore look at the window to know the

order and therefore the option to be selected. If one knew the option order

every time, one could generate the vote automatically and achieve massive

falsification. However, the only way to know option order is to interpret the

image sent to the poll window. The image is explicitly designed to be difficult

to interpret automatically, and yet be readable by human eyes.

4.4 Challenge

We now propose an open challenge, where readers and the Internet security commu-

nity are invited to try to break our scheme. At the address

www.certimeter.com/pollchallenge/thepoll one finds a poll, where it is possible to

vote one of two possible choices: (A) Italian cuisine and (B) British cuisine.

The poll is initialized with 500,000 votes in favor of A and 500,000 votes in favor

of B. We therefore have 50% of people in favor of Italian cuisine and 50% in favor of

British cuisine. At the beginning of every month the poll is reinitialized to those val-

ues. Thus, challengers

have one month of time to break to scheme, but they can try

again. The goal of the challenge is to cause the poll result to be at least 90% in favor

of British cuisine, before the end of the month.

5 Conclusions

In this paper we presented an Internet Poll system that has been implemented and is

based on an unpredictable ordering scheme for poll choices, so that the user is forced

to look at the Web page before sending her vote. This is, to our knowledge, the first

proposed Internet Poll system that (1) is secure against programmed attacks, (2) is not

based on IP locking and (3) does not require user registration.

114

Acknowledgments

The authors thank Regione Piemonte, through grant “Sinapsi” for supporting this

work. The prototype and the challenge setting have been developed with the coopera-

tion of the Department of Computer Science of the University of Turin and the par-

ticipating company, Certimeter S.r.l.

References

1. M. Blum, L. A. von Ahn, and J. Langford, The CAPTCHA Project, Completely Automatic

Public Turing Test to tell Computers and Humans Apart, www.captcha.net, Dept. of Com-

puter Science, Carnegie–Mellon University, November 2000.

2. S.M.Bellovin, Security Problems in the TCP/IP protocol suite, Computer Communication

Review, AT&T Bell Laboratories, 1989.

3. Steven M. Bellovin, A Technique for Counting NATted Hosts, AT&T Labs Research, 2000.

4. F. Bergadano, D. Gunetti and C. Picardi, User Authentication through Keystroke Dynamics,

ACM Transactions on Information and System Security (ACM TISSEC), 5(4), 2002.

5. Allison L. Coates, Richard J. Fateman and Henry S. Baird, Pessimal Print: A Reverse Tur-

ing Test, Sixth International Conference on Document Analysis and Recognition (ICDAR

2001), Seattle, Washington, September 10-13 2001.

6. K.Egevang, P.Francis, The IP Network Address Translator (NAT), RFC-1631, May 1994.

7. R.Fielding, J.Mogul, H.Frystyk, L.Masinter, P.Leach, T.Berners-Lee, Hypertext Transfer

Protocol HTTP 1.1, RFC-2616, June 1999.

8. D.Kristol, L.Montulli, HTTP State Management Mechanism, Request for Comments RFC-

2965, October 2000.

9. Albert Ludwing, Ip Address Spoofing, Univ. Freiburg,

www.ks.uni.freiburg.de/inetwork/papers/ipspoofingPaper.pdf

10. G. Mori and J. Malik, Breaking a Visual CAPTCHA, UC Berkeley,

http://www.cs.berkeley.edu/~mori/gimpy/gimpy.html

11. Joon S.Park, Ravi Sandhu, AreeLatha Ghanta, RBAC on the web by secure cookies, secu-

rity XIII:Status and prospects, Kluwer, 2000.

12. Eran Reshef, Izhar Bar-Gad, Web Application Security, Sanctum Inc., settembre 2000.

13. S. V. Rice, G. Nagy, and T. A. Nartker, OCR: An Illustrated Guide to the Frontier, Kluwer

Academic Publishers, 1999.

14. Marco de Vivo, Gabriela O. de Vivo, Roberto Koeneke, Germinal Isern, Internet Vulner-

abilities Related to TCP/IP and T/TCP, ACM SIGCOMM Computer Communication Re-

view, January 1999.

115