PHISHPIN: AN INTEGRATED, IDENTITY-BASED
ANTI-PHISHING APPROACH
Hicham Tout
School of Computer and Information Sciences, Nova Southeastern University
3301 College Avenue, Fort Lauderdale, Florida, U.S.A.
Keywords: Phishing, Spam, Information security, Identity theft, Social engineering, Encryption, Hash algorithms, One
time password, Digital certificates, Online scams, Web, Pharming.
Abstract: Phishing is a social engineering technique used to fraudulently acquire sensitive information from users by
masquerading as a legitimate entity. One of the primary goals of phishing is to illegally carry fraudulent
financial transactions on behalf of users. The two primary vulnerabilities exploited by phishers are:
Inability of non-technical/unsophisticated users to always identify spoofed emails or Web sites; and the
relative ease with which phishers masquerade as legitimate Web sites. This paper presents Phishpin, an
approach that leverages the concepts of mutual authentication to require online entities to prove their
identities. To this end, Phishpin builds on One-Time-Password, DNS, partial credentials sharing, & client
filtering to prevent phishers from masquerading as legitimate online entities.
1 INTRODUCTION
Phishing is a social engineering technique used to
fraudulently acquire sensitive information from
users by masquerading as a legitimate entity.
Phishing is typically carried over electronic
communications such as email or instant messaging
(Kirda and Kruegel, 2006). One of the primary goals
of phishing is to illegally acquire sensitive
information, such as passwords or social security
numbers, in order to carry fraudulent transactions on
behalf of the victim. Using a forged email that
contains a URL pointing to a fake Web site
masquerading as an online bank or a government
entity, a phisher may lure a victim into giving
his/her Social Security Number, full name, &
address, which can then be used to apply for a credit
card on the victim’s behalf. According to McCall
(2007), phishing attacks escalated in the 12 month
period ending August 2007 to impact 3.6 million
adults and cause losses worth approximately $3.2
billion.
The first publicized phishing attacks occurred at
AOL in the early 90s where phishers posed as AOL
staff members to lure victims into giving their
sensitive account information (Wikipedia, 2008).
Since then the number of phishing attacks have
substantially increased. Attacks have also evolved to
become more sophisticated and malicious targeting
large number of users dealing with financial
institutions. Much effort has gone into the
development of anti-phishing techniques. These
techniques fall into 4 categories: content filtering,
blacklisting, symptom-based prevention, & domain
binding. Many of these techniques focus on enabling
clients to recognize & filter various types of
phishing attacks. While many of these techniques
have proven effective in a number of scenarios, they
also have put much of the burden of proof on either
the online user or client filter or both. Yet online
entities have had few techniques that enable them to
prove their identities without forcing online users to
deploy complex bi-directional authentication
mechanisms.
This paper presents Phishpin, an anti-phishing
technique that integrates One-Time-Password
(OTP), DNS, partial credentials sharing, & client
filtering techniques to prevent phishers from easily
masquerading as legitimate online entities. One of
the primary goals of the proposed approach is to
enable both parties—online users and entities—to
prove their identities—mutual authentication—
without having to divulge sensitive information.
Another goal of the proposed solution is to build an
effective, yet simple mutual authentication
mechanism that runs seamlessly within the client
369
Tout H. (2009).
PHISHPIN: AN INTEGRATED, IDENTITY-BASED ANTI-PHISHING APPROACH.
In Proceedings of the International Conference on Security and Cryptography, pages 369-374
DOI: 10.5220/0002222503690374
Copyright
c
SciTePress
browser. The proposed approach is made up of 3
components: DNS TXT record to store the
legitimate entity’s certificate; one-way hash
algorithm; and client/server plug-in to verify the
authenticity of both online entities and users.
This paper is organized as follows. Section 2
introduces related work and contrasts it with the
approach in this paper. Section 3 discusses the
proposed approach. Finally, conclusions and future
work are summarized in section 4.
2 RELATED WORK
This section enumerates some of the known anti-
phishing techniques. It’s meant to provide a brief
overview of some of the best known efforts in this
area of research. Anti-phishing techniques fall into 4
major categories: content filtering, blacklisting,
symptom-based prevention, & domain binding.
Content/email filtering relies on machine learning
methods, such as Bayesian Additive Regression
Trees (BART) or Support Vector Machines (SVM),
to predict and filter phishing emails (Abu-Nimeh et
al., 2007; Fette, Sadeh, and Tomasic, 2006). Since
email is normally the first step in a phishing attack,
the advantage of this technique is that it intercepts &
eliminates suspected phishing emails before they
reach the user. Contents of the email, the
sender/source, and other attributes are analyzed by
this technique. The main disadvantage of this
technique is that it cannot guarantee that all phishing
emails are filtered (Wu, Miller, and Little, 2006).
Phishers have come up with alternative semantics
that are capable of bypassing these filters. Phishers
have also in certain cases resorted to the use of
images instead of text, which makes the filtering
process more challenging. It’s important to note that
while the majority of phishing attacks are initiated
by email, there has been a surge of new types of
attacks that are initiated by instant messaging or by
hacked Web pages. These types of attacks cannot be
intercepted by email-based solutions.
Blacklisting depends on public lists of known
phishing Web sites/addresses published by trusted
entities such as (Phishtank, 2008). It requires both a
client & a server component. The client component
is implemented as either an email or browser plug-
in. that interacts with a server component, which in
this case is a public Web site that provides a list of
known phishing sites. In the case of an anti-
phishing email plug-in, the client component
compares URLs embedded in every incoming email
to one or more publicly provided lists of suspected
phishing sites. Should it find a match, the email is
either discarded or flagged as a phishing/spam
email. In the case of an anti-phishing browser plug-
in, the client component compares every URL
loaded into the address field of the browser to one or
more publicly provided lists of suspected phishing
sites. Should it find a match, a warning message, in
the form of a popup window is displayed. The
advantage of this technique is the ability of the plug-
in to reference a frequently updated, reliable public
list of known phishing Web sites. This technique
however, suffers from many of the same problems as
signature-based prevention methods—almost always
outdated as phishers continuously use new Web sites
and addresses. In fact most phishing Web sites are
only available online for few hours (Zhang, Hong,
and Cranor, 2007). It’s important to note that
blacklisting have, in certain cases also been used as
a component/step in email filtering solutions since it
runs as an email plug-in in most cases.
Symptom-based prevention analyses the content
of each Web page the user visits and generates
phishing alerts according to the type and number of
symptoms detected (Chou et al., 2004). Symptoms
generated are the result of parsing the contents—text
of the Web page and the URL/address. Symptom-
based prevention uses learning and identification
techniques similar to email filtering such as SVM
and BART. The difference between the two
techniques is that one operates on the contents of the
email, while the other operates on the content of the
Web page being visited. It’s important to note that
unlike email-based filtering, both symptom-based &
blacklisting techniques are not invoked until after
the user presses Web link contained in the email.
The advantage of this technique is that it parses the
content of the visited Web site using machine
learning techniques to conclude whether it’s a
phishing site. This technique may provide a higher
level of detection rate since it parses the content of
the actual visited site and not just the text in the
phishing email. Disadvantages of this technique
include its inability to detect phishing attacks that
use client-side JavaScript and its reliance on warning
messages, which have proven ineffective with most
users (Wu, 2006). It’s important to note that this
technique should be viewed as complementary to
rather than competing with email-based phishing
techniques. The combination of both methods may
enable a defence in depth strategy.
Trusted domain binding is a browser-based
technique that binds sensitive information—mostly
credentials—to a specific domain (Raffetseder,
Kirda, and Kruegel, 2007). Should the user enter
SECRYPT 2009 - International Conference on Security and Cryptography
370
sensitive information in a Web form that belongs to
a different/un-trusted Web site, the browser plug-in
either blocks the transmission of data or warns the
user of the consequences. This technique runs as part
of the Web browser workflow, disables all Web
form fields identified as sensitive, and presents the
user with a specialized form where credentials are
entered. This technique establishes a one-to-one
relationship between a set of credentials and a
trusted domain. Should those credentials be used
with a different domain that is not yet considered
trusted, submission of information is blocked and an
alert is sent to the user. The design principles behind
this technique rely on the assumption that preventing
the user from directly submitting sensitive
information can eliminate most if not all phishing
attacks. The approach builds on top of a survey
conducted by (Wu, 2006), which concluded that the
use of only warning messages did little to sway users
from proceeding forward with what they perceived
to be a trusted site. This approach provides a high
detection rate and does not suffer from many of the
disadvantages associated with email filtering or
blacklisting. However this technique requires a
manual process to identify & bind sensitive
information and to identify trusted domains for
initial binding. It also uses a domain-based binding
process (Wu, Miller, and Little, 2006) and requires a
one-to-one relationship between credentials and the
intended domain. In addition, the methods used by
this technique to distinguish between trusted and
non-trusted domains/Web sites is similar to the one
used by blacklisting.
In contrast, Phishpin combines client/server-
based filtering and domain-based identity techniques
to prevent phishers from masquerading as legitimate
online entities. It integrates PKI, DNS, OTP, &
filtering to enforce the authenticity of online entities
based on primary attributes associated with both
legitimate online entities and online users. One such
attribute is the legitimate entity logo or an imitation
of it, which is used by phishers as a visual deception
tactic intended to trick end-users into believing
they’re connected to a legitimate online entity
(Downs, Holbrook, and Cranor, 2007). Other
attributes may include online user’s name, address,
phone, password, social security number, or pin
number.
3 PHISHPIN
While users are required to use one or more
authentication methods to prove their identities,
online entities have done little to prove they are who
they claim they are. With little Web development or
design experience, a phisher can masquerade as
almost any legitimate online entity. The ease with
which online entities can be spoofed may be
considered one of the primary vulnerabilities in the
fight against phishing attacks. One of the primary
goals of the proposed approach is to address this
vulnerability by enabling both parties—online users
and entities—to prove their identities without having
to divulge sensitive information. Another goal of the
proposed solution is to build an effective, yet simple
mutual authentication mechanism that runs
seamlessly within the client browser.
Phishpin is divided into three major components:
DNS, client plug-in, and server plug-in. The DNS
TXT record is used to store the online entity’s
certificate. The primary purpose of storing the cert is
to enable the client plug-in to validate the certificate
chain and match the distinguished name in the
certificate with the domain name being visited. This
step ensures that not only the certificate chains to a
known and trusted certificate authority but also that
it’s associate with the domain being visited. This
step is considered as the first line of defence against
phishing attacks. It mostly focuses on the legitimacy
of the credential being presented by the online
entity. However, well known DNS attacks that may
spoof legitimate certificates, have been documented.
Therefore it’s important to note that this must not be
the only line of defence used. It’s also worthwhile
noting that such validation must be executed
seamlessly by the browser plug-in without must
input from the online user. Should certificate chain
validation or distinguished name matching fail, the
client plug-in would lock the Web form/page and
inform the online user that authentication of the site
has failed. Displaying warnings without disabling
Web forms has shown to be ineffective as most
online users tend to ignore warning messages.
The role of the client plug-in is to block phishing
attacks by validating the online entity’s certificate
stored in the DNS TXT record; applying content
filtering rules; validating 2nd half of the OTP
received from the server; and generating the 1st half
of OTP. On the other hand, the role of the server
plug-in is to validate the 1st half of the OTP
received from the client and generating the 2nd half
of the OTP. This is in addition to using the selected
hash algorithm to build the initial hash based on user
attributes stored during the account setup process.
The DNS component includes the DNS record to
store the online entity’s certificate. The certificate is
stored in the TXT record.
PHISHPIN: AN INTEGRATED, IDENTITY-BASED ANTI-PHISHING APPROACH
371
The server plug-in includes the following:
An algorithm to generate OTP.
A method to validate 2nd half of OTP received
by the client.
A method to generate 1st half of OTP.
The client filter includes the following:
An algorithm to generate OTP.
A method to validate 1st half of OTP received
by server.
A method to generate 2nd half of OTP.
A method to disable Web form fields.
A Method to validate online entity’s certificate.
The initial account setup process typically
requires online users to enter general and personal
information. This information may include user id,
account number, user name, address, phone,
password, and potentially social security number,
which is mostly used for financial accounts setup. In
addition to the typical account setup, the proposed
approach would require both online users & entities
to determine the hash algorithm and the sequence of
attributes to be used in the OTP hash. While MD5
has been amongst the most commonly used hash
algorithms, successful attacks against MD5 have
been documented. Therefore it’s recommended that
stronger hash algorithms such as SHA-256 be
considered. As illustrated in figure 1 below, besides
enabling an online user to input account or personal
information, the initial account setup process would
also include the following steps:
Register selected hash algorithm with both
client filter and server plug-in.
Figure 1: Initial Account Setup.
Define online user attributes that will be used as
part of the OTP. It is recommended that at least one
or more private attributes are included in order to
ensure that that the resulting hash is built based on
multiple unknown strings that contains half of the
password hash & challenge phrase.
Define the order in which attributes are
concatenated into the source string for the
hash function.
Compute the hash of the online user’s
password; divide it in half; store the second
half of the hash in a secure repository—
entitlement store—at the online entity site;
store the first half of the hash on the online
user’s device. The second half of the hash
must be accessible by the server plug-in
while the first half of the hash must be
accessible by the browser plug-in. Since
this step is performed during account setup,
which also includes setting up the
password, access to the user password
would be made available by the setup script
or application being used to create the
initial account.
As illustrated in figure 2 below, the client plug-
in, which in this case is a browser plug-in, performs
the following functions:
Figure 2: Browser Plug-in.
SECRYPT 2009 - International Conference on Security and Cryptography
372
Parse Web forms embedded in the loaded
Web page for fields that require the user to
enter sensitive information such as user id,
password, credit card number, or social
security number. Should one of these form
fields be detected, a certificate chain
validation in performed on the certificate
stored in the DNS TXT record. In addition
the DN in the certificate is compared to the
domain being connected to. Should
validation fail, all form fields are locked by
the browser plug-in using Java scripting
and a warning is displayed to alert the
online user of a possible phishing attack.
Should the Web form include a Java Applet
or a ActiveX control, the browser plug-in
would intercept the data/stream submitted
by the Applet/ActiveX control to check for
the potential submission of sensitive data.
Apply the selected hash algorithm to the
original password then split the resulting
hash into half—PH1 & PH2. This step may
not be necessary since the initial account
setup process would have hashed the
original password and split it in half. In fact
it’s preferred that this step be performed by
the initial setup since it has access to the
original password. On the other hand, if
performed by either the browser or the
server plug-in then each would have to
have access to the original password.
Concatenate the selected user attributes, a
challenge phrase, and PH1 into string S1.
Generate HS1 by applying the selected
hash algorithm to string S1.
Append HS1 + challenge phrase to the
cookie field in the HTTP request header.
Once received by the server plug-in, the
following steps are taken:
Validate HS1 by applying the selected hash
algorithm to the same set of user attributes
in the required sequence.
If valid, then apply the selected hash
algorithm to the original password then
split the resulting hash into half—PH1 &
PH2. Again this step may not be required
since the initial setup process would have
already created PH1 & PH2.
Concatenate the selected user attributes, a
challenge phrase, and PH2 into string S2.
Generate HS2 by applying the selected
hash algorithm to string S2.
Append HS2 + challenge phrase to the
cookie field in the HTTP response header.
Once the browser plug-in receives the HTTP
response, it authenticates the identity of the online
entity by validating HS2. As a final confirmation,
the browser plug-in will also perform a number of
heuristic checks such as well-formed links/URL and
the use of IP addresses in hyperlinks. The URL
heuristic checks for symbols such as @ and the
number of “Dots” in the URL [13]. Should both the
client & server plug-ins successfully perform mutual
authenticate, the browser plug-in would exit with
success status and allow the user to input data into
the Web form. It’s important to note that once
mutual authentication is performed, there would be
not need for users to re-enter their credentials—
original password. Therefore passwords are never
exchanged between users and online entities except
during the initial account setup process.
One of the advantages of Phishpin is that it
enables both online users and entities to authenticate
each other without revealing sensitive information.
The use of OTP in combination with partial
credentials and certificate chain validation makes it
fairly challenging for phishers to obtain the user’s
credentials. Even in the unlikely scenario where a
phisher is able to reverse the one-way hash, he/she
would not be able to obtain the user’s password
since only half of the hash of the password was
shared.
4 CONCLUSIONS
This paper presented the design of Phishpin, an
integrated anti-phishing approach that combines
client-based filtering and domain-based identity
techniques to prevent phishers from masquerading
as legitimate online entities. Phishpin integrates
OTP, DNS, partial credentials, & filtering to enforce
bi-directional authentication without revealing
sensitive information.
One drawback of the proposed solution is the
level of complexity associated with the original
account setup, which requires online users to synch
up attributes, selected hash method, sequence, &
password with each online entity. That said, the
initial setup process may be made easier by building
one or more automated synch up tools. Another
drawback is the added level of effort required by
online entities to implement the server plug-in.
It’s important to note that the focus of the
proposed solution is on phishing attacks. Pharming,
PHISHPIN: AN INTEGRATED, IDENTITY-BASED ANTI-PHISHING APPROACH
373
DNS poisoning, & malicious code attacks are not
addressed by this solution. Should a hacker gain
access to the client machine where the browser plug-
in is running, she/he may be able to disable or even
uninstall the browser plug-in.
ACKNOWLEDGEMENTS
I would like to thank my Ph.D. advisor, Professor
William Hafner for his support, time, ideas, and
judgements that helped shape this paper.
REFERENCES
Abu-Nimeh, S., Nappa, D., Wang, X., and Nair, S., 2007.
A Comparison of Machine Learning Techniques for
Phishing Detection. In Proceedings of the anti-
phishing working groups 2nd annual eCrime
researchers summit, Pittsburgh, Pennsylvania, USA.
Chou, N., Ledesma, R., Teraguchi, Y., Boneh, D., and
Mitchell, J., 2004. Client-side defense against Web-
based identity theft. In Proceedings of the 11th
Network and Distributed System Security Symposium
(NDSS’04), San Diego, California, USA.
Downs, S. J., Holbrook, M., and Cranor, L. F., 2007.
Behavioral Response to Phishing Risk. In proceedings
of the anti-phishing working groups 2nd annual
eCrime researchers summit (eCrime’07), Pittsburgh,
Pennsylvania, USA.
FDIC, 2004. Putting an end to account-hijacking identity
Theft. http://www.fdic.gov/consumers/consumer/
idtheftstudy/identity_theft.pdf.
Fette, I., Sadeh, N., and Tomasic, A. 2006. Learning to
detect phishing emails. Technical Report CMU-ISRI-
06-112, Institute for Software Research, Carnegie
Mellon University.
http://reportsarchive.adm.cs.cmu.edu/anon/isri2006/ab
stracts/06-112.html.
Kirda, E., and Kruegel, C. 2005. Protecting Users against
Phishing Attacks. In proceedings of the 29th Annual
International Computer Software and Applications
Conference (COMPSAC'05), Edinburgh, UK.
McCall, T., 2007. Gartner Survey Shows Phishing Attacks
Escalated in 2007; More than $3 Billion Lost to These
Attacks, Gartner, 2007. http://www.gartner.com/it/
page.jsp?id=565125.
Phishtank. Phishing, 2008. http://www.phishtank.org.
Raffetseder, T., Kirda, E., and Kruegel,C., 2007. Building
Anti-Phishing Browser Plug-Ins: An Experience
Report. In proceedings of the 3rd International
Workshop on Software Engineering for Secure
Systems (SESS'07), Minneapolis, Minnesota, USA,
2007.
Wikipedia. Phishing, 2008. http://en.wikipedia.org/wiki/
Phishing.
Wu, M. 2006. Fighting Phishing at the User Interface.
http://groups.csail.mit.edu/uid/projects/phishing/minw
u-thesis.pdf.
Wu, M., Miller, R. C., Little, G. 2006. Web Wallet:
Preventing Phishing Attacks by Revealing User
Intentions. In proceedings of the Symposium On
Usable Privacy and Security (SOUP'06), Pittsburgh,
Pennsylvania, USA, 2006.
Zhang, Y., Hong, J. I., and Cranor, L. F. 2007. Cantina: a
Content-based Approach to Detecting Phishing Web
Sites. In proceedings of the 16th International
Conference on World Wide Web (WWW'07), Banff,
Alberta, CA, 2007.
SECRYPT 2009 - International Conference on Security and Cryptography
374