ments they used: LR, KNN, DT, RF and SVM (linear,
sigmoid and RBF kernels). For our experiments we
used a larger number of algorithms: LR, LDA, NB
(Bernoulli and Gaussian), LDA, XGB, MLP, SVM
(with linear, sigmoid, polynomial and RBF kernels)
and DT, KNN, RF were taken from (Coste, 2023).
Thus, we had 120 different hybrid models, while they
used 10 configurations. In their case, the best indi-
vidual algorithm is not included in the best ensemble
model. This is in contrast with our results, which con-
firm the best model is retrieved in the best ensemble
as well.
5 CONCLUSIONS AND FUTURE
WORK
Experiments in the malicious web links detection do-
main are very challenging and complex to develop.
We employ the usage of 7 ML algorithms LR, LDA
NB (Bernoulli and Gaussian), LDA, XGB, MLP,
SVM (with linear, sigmoid, polynomial and RBF ker-
nels), which we calibrate and compare. Moreover,
we chose 10 different configurations with which we
continue to experiment with hybrid models formed
out of three of them. The best ensembles managed
to improve the metric scores compared to the single
models. Moreover, we observed that the best single
model, XGB, was included in the best performing en-
sembles ADA-BNB-XGB, ADA-XGB-SVM-rbf and
LDA-XGB-SVM-rbf. Our best solution has 96.31%
precision with ADA-XGB-SVM-rbf hybrid model. In
addition, some proposed models such as MLP, LR and
GNB manage to improve previous literature results.
Concerning future work, we plan to elaborate fur-
ther with the detection models and create stacked
models constructed from five classifiers and a meta-
classifier. Moreover, our methodology can work with
multiple datasets, and it would be an idea to experi-
ment in a cross-dataset environment. Moreover, we
could elaborate on the data collection process by es-
tablishing a centralized dataset with real samples re-
trieved during a specific period. Additional infor-
mation regarding links, such as DNS information,
WHOIS information, properties about the web server,
should be collected in real time.
REFERENCES
Alsaedi, M., Ghaleb, F. A., Saeed, F., Ahmad, J., and Alasli,
M. (2022). Cyber threat intelligence-based malicious
url detection model using ensemble learning. Sensors,
22(9):3373.
Coste, C.-I. (2023). Malicious web links detection -
a comparative analysis of machine learning algo-
rithms. Studia Universitatis Babes
,
-Bolyai Informat-
ica, 68(1):21–36.
Fortra (2022). The 2021 gone phishing tournament results:
Everything you need to know.
Islam, M., Poudyal, S., Gupta, K. D., et al. (2019). Mapre-
duce implementation for malicious websites classifi-
cation. International Journal of Network Security &
Its Applications (IJNSA) Vol, 11.
Janet, B., Kumar, R. J. A., et al. (2021). Malicious url de-
tection: A comparative study. In 2021 International
Conference on Artificial Intelligence and Smart Sys-
tems (ICAIS), pages 1147–1151, Tamil Nadu, India.
IEEE, IEEE.
Johnson, C., Khadka, B., Basnet, R. B., and Doleck, T.
(2020). Towards detecting and classifying malicious
urls using deep learning. J. Wirel. Mob. Networks
Ubiquitous Comput. Dependable Appl., 11(4):31–48.
Naveen, I. N. V. D., Manamohana, K., and Verma, R.
(2019). Detection of malicious urls using ma-
chine learning techniques. International Journal of
Innovative Technology and Exploring Engineering,
8(4S2):389–393.
Pakhare, P. S., Krishnan, S., and Charniya, N. N. (2021).
Malicious url detection using machine learning and
ensemble modeling. In Computer Networks, Big Data
and IoT, pages 839–850. Springer, Singapore.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V.,
Thirion, B., Grisel, O., Blondel, M., Prettenhofer,
P., Weiss, R., Dubourg, V., Vanderplas, J., Passos,
A., Cournapeau, D., Brucher, M., Perrot, M., and
Duchesnay, E. (2011). Scikit-learn: Machine learning
in Python. Journal of Machine Learning Research,
12:2825–2830.
Sajedi, H. (2019). An ensemble algorithm for discovery of
malicious web pages. International Journal of Infor-
mation and Computer Security, 11(3):203–213.
Song, X., Chen, C., Cui, B., and Fu, J. (2020). Malicious
javascript detection based on bidirectional lstm model.
Applied Sciences, 10(10):3440.
Subasi, A., Balfaqih, M., Balfagih, Z., and Alfawwaz, K.
(2021). A comparative evaluation of ensemble classi-
fiers for malicious webpage detection. Procedia Com-
puter Science, 194:272–279.
Tung, S. P., Wong, K. Y., Kuzminykh, I., Bakhshi, T., and
Ghita, B. (2022). Using a machine learning model
for malicious url type detection. In Koucheryavy,
Y., Balandin, S., and Andreev, S., editors, Internet of
Things, Smart Spaces, and Next Generation Networks
and Systems, pages 493–505, Cham. Springer Interna-
tional Publishing.
Urcuqui, C., Navarro, A., Osorio, J., and Garc
´
ıa, M. (2017).
Machine learning classifiers to detect malicious web-
sites. SSN, 1950:14–17.
Vundavalli, V., Barsha, F., Masum, M., Shahriar, H., and
Haddad, H. (2020). Malicious url detection using su-
pervised machine learning techniques. In 13th Inter-
national Conference on Security of Information and
Networks, pages 1–6.
Malicious Web Links Detection Using Ensemble Models
275