A Machine-learning based Unbiased Phishing Detection Approach

Hossein Shirazi, Landon Zweigle, Indrakshi Ray

2020

Abstract

Phishing websites mimic a legitimate website to capture sensitive information of users. Machine learning is often used to detect phishing websites. In current machine-learning based approaches, the phishing and the genuine sites are classified into two groups based on some features. We feel that this is an inadequate modeling of the problem as the characteristics of different phishing websites may vary widely. Moreover, the current approaches are biased towards groups of over-represented samples. Most importantly, as new features are exploited, the training set must be updated to detect new phishing sites. There is a time lag between the evolution of new phishing sites and retraining of the model, which can be exploited by attackers. We provide an alternative approach that aims to solve the above-mentioned problems. Instead of finding commonalities among non-related genuine websites, we find similarity of a suspicious website to a legitimate target and use machine learning to decide whether the suspicious site is impersonating the target. We define the fingerprint of a legitimate website by using visual and textual characteristics against which a sample is compared to ascertain whether it is fake. We implemented our approach on 14 legitimate websites and tested against 1446 unique samples. Our model reported an accuracy of at least 98% and it is not biased towards any website. This is in contrast to the current machine learning models that may be biased towards groups of over-represented samples and lead to more false-negative errors for less popular websites.

Download


Paper Citation


in Harvard Style

Shirazi H., Zweigle L. and Ray I. (2020). A Machine-learning based Unbiased Phishing Detection Approach.In Proceedings of the 17th International Joint Conference on e-Business and Telecommunications - Volume 3: SECRYPT, ISBN 978-989-758-446-6, pages 423-430. DOI: 10.5220/0009834204230430


in Bibtex Style

@conference{secrypt20,
author={Hossein Shirazi and Landon Zweigle and Indrakshi Ray},
title={A Machine-learning based Unbiased Phishing Detection Approach},
booktitle={Proceedings of the 17th International Joint Conference on e-Business and Telecommunications - Volume 3: SECRYPT,},
year={2020},
pages={423-430},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0009834204230430},
isbn={978-989-758-446-6},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 17th International Joint Conference on e-Business and Telecommunications - Volume 3: SECRYPT,
TI - A Machine-learning based Unbiased Phishing Detection Approach
SN - 978-989-758-446-6
AU - Shirazi H.
AU - Zweigle L.
AU - Ray I.
PY - 2020
SP - 423
EP - 430
DO - 10.5220/0009834204230430