two/three and multi-factor authentication. But
sometimes it happens that we get to a website that
pretends to be trustworthy but a fraudster has created
an exact copy of the site. We want to log in to our
account but logging in doesn't work and our password
and email have just been sent to the scammer. This
method of creating a fraudulent site is called spoofing
in English. Nowadays, DDoS attacks became a real
problem. The number of DDoS attacks in 2021 has
been recorded as high as 9.75 million (Vermer, 2021).
Although DDoS attacks are more frequent, modern
servers can handle them more easily than in the past.
In 2021, hackers managed to obtain just one
single password to the system of the US oil pipeline
company Colonial Pipeline (Turton, 2021). The
hackers gained access to the system after an employee
entered the password to a fraudulent website posing
as the company's VPN. The hackers then locked down
the entire system using ransomware and demanded a
ransom of 75 bitcoins ($4.4 million at the time).
In July 2020, Twitter employees were the target
of a phishing attack, and hackers managed to gain
access to the accounts of many celebrities as Elon
Musk, Bill Gates, and shared the message that if you
send Bitcoin to a certain Bitcoin address, your deposit
will be doubled (Leswing, 2021). The scam was also
shared by hacked accounts of well-known financiers
such as Mike Bloomberg and Warren Buffet. This
was an example of a scam called a Ponzi scheme.
Factors such as page load time, SSL protocol and
contact details play an important role in identifying a
fraudulent site (Fedorko, 2020). If the loading time of
a web page is longer than 5 seconds, it causes a
decrease in the credibility of the page. According to
the latest rules, all websites should have SSL. It is the
"https:" at the beginning of the URL. The presence of
SSL increases the credibility of the website. Also,
visibly accessible contact information - phone
number, e-mail address, brief information about the
company, for example, physical address, ID number,
etc. increase the credibility of the website.
2.1 Related Works
Several studies have focused on what fraudulent sites
have in common and how big the differences are
between phishing sites, fraudulent payment gateways,
fraudulent online stores and sites that pretend to be
legitimate news organizations. For example, one of the
common features that fraudulent sites have in common
are invalid certificates and many buttons with broken
links (Fedorko, 2020). In this study a descriptive
statistic, multiple linear regression and structural
equation modelling were used.
Other research, which worked with a dataset of
phishing websites (Hannousse, 2021), discusses an
importance of the syntax of URLs, i.e., how many
special characters are in a link, how long the link is,
how many times the www subdomain is in the link,
whether the link contains the name of a globally
known brand, and also whether the domain is
registered at all and if so what is its age. These are
features that we can more easily extract and
preprocess for machine learning models. In this study,
following machine learning were used for detection
models training: logistic regression, random forest
and support vector machines. The best performing
model was learned using random forests method.
In 2013, research was conducted where 2046
participants decide whether or not the website
displayed is trustworthy on a scale of 1-5. Those
participants who very frequently ranked websites
with the number 5 or the number 1 were often the
most wrong in their decisions (Rafalak, 2014). In the
study, descriptive statistics were used for estimated
psychological traits levels. The results of this research
are helpful in designing a method to detect fraudulent
websites.
Nowadays, more and more user-generated
content is hosted on web servers that belong to a small
group of giant technology companies. This trend is
leading to a centralized web with many problems.
These could be addressed by decentralizing the web,
which has the potential to ensure that the end-user
always knows that the website they are currently on
is from a legitimate source or not. A study (Kim,
2021) proposes a blockchain-based way of operating
such a decentralized web.
Another way to prevent phishing and password
leaks is by using blockchain encryption of messages
and communications in companies between company
servers when logging into the system. If an employee
sends a login key or password to a corporate system,
the blockchain ensures through a stored hash that only
the target corporate server can read the content of the
message - i.e. the password (Cai, 2017). In this case,
it cannot happen that the content of the message - the
password to the system, can be read by a hacker who
sent a phishing website to the employee.
The study (Rutherford, 2022) demonstrates that
the machine learning approach is viable with
validation accuracy ranging from 49 to 86%. The
support vector machine was able to predict whether a
cadet would be compromised upon receipt of a
phishing attack with a 55% accuracy while a recall
score was 71%. On the other hand, logistic regression
model had the highest 86% accuracy while
maintaining a recall score only of 16%.
Modelling of an Untrustworthiness of Fraudulent Websites Using Machine Learning Algorithms
219