email. As a countermeasure, we proposed an end-
user email support system based on the analysis of
the writing style of a person. We presented two possi-
ble approaches to solve the problem (i) sender email
verification which we exploited the characterization
of the overall writing style of a sender and (ii) end
to end email verification, which considers the end to
end writing style in the sender-receiver communica-
tion. As a verification system, we proposed an author-
ship email verification based on a binary text classi-
fier. We compared two text classification approaches
(i) features engineering based and (ii) word embed-
ding based. In both the scenarios experimented are
tested two training techniques based on different split-
ting of the dataset: (i) independent from the email
length and (ii) dependent from the email length. The
analysis of the results shows: (i) the higher accuracy
of the word embedding based classifiers respect to the
features engineering based in both the scenarios; (ii)
the effectiveness of the training technique based on
the dataset splitting dependent from the email length
and (iii) the better accuracy obtained by the end to end
email verification respect to the traditional sender ver-
ification. With the high accuracy reached in the email
author verification, it has been proved that the author-
ship mechanism is a promising support approach to
use in contrast to the spear-phishing scam emails.
ACKNOWLEDGEMENTS
This work has been partially supported by H2020 EU-
funded projects SPARTA, GA 830892, C3ISP, GA
700294 and EIT-Digital Project HII, PRIN Governing
Adaptive.
REFERENCES
Allman, E., Callas, J., Delany, M., Libbey, M., Fenton, J.,
and Thomas, M. (2007). Domainkeys identified mail
(dkim) signatures. Technical report, RFC 4871, May.
Brocardo, M. L., Traore, I., Saad, S., and Woungang, I.
(2013). Authorship verification for short messages us-
ing stylometry. In 2013 International Conference on
Computer, Information and Telecommunication Sys-
tems (CITS), pages 1–6. IEEE.
Brocardo, M. L., Traore, I., and Woungang, I. (2015). Au-
thorship verification of e-mail and tweet messages ap-
plied for continuous authentication. Journal of Com-
puter and System Sciences, 81(8):1429–1440.
Connor, J. T., Martin, R. D., and Atlas, L. E. Recurrent neu-
ral networks and robust time series prediction. IEEE
transactions on neural networks, 5.
Dasarathy, B. V. (1991). Nearest neighbor (nn) norms: Nn
pattern classification techniques. IEEE Computer So-
ciety Tutorial.
Donahue, J., Anne Hendricks, L., Guadarrama, S.,
Rohrbach, M., Venugopalan, S., Saenko, K., and Dar-
rell, T. (2015). Long-term recurrent convolutional net-
works for visual recognition and description. In Pro-
ceedings of the IEEE conference on computer vision
and pattern recognition, pages 2625–2634.
Freund, Y., Schapire, R., and Abe, N. (1999). A short in-
troduction to boosting. Journal-Japanese Society For
Artificial Intelligence, 14(771-780):1612.
Graves, A., Mohamed, A.-r., and Hinton, G. (2013).
Speech recognition with deep recurrent neural net-
works. In 2013 IEEE international conference on
acoustics, speech and signal processing, pages 6645–
6649. IEEE.
Ho, T. K. (1995). Random decision forests. In Proceedings
of 3rd international conference on document analysis
and recognition, volume 1, pages 278–282. IEEE.
Hoffman, P. (2002). Smtp service extension for secure smtp
over transport layer security.
Kiefer, J., Wolfowitz, J., et al. (1952). Stochastic estimation
of the maximum of a regression function. The Annals
of Mathematical Statistics, 23(3):462–466.
Klimt, B. and Yang, Y. (2004). The enron corpus: A
new dataset for email classification research. In Eu-
ropean Conference on Machine Learning, pages 217–
226. Springer.
Kucherawy, M. and Zwicky, E. (2015). Domain-based
message authentication, reporting, and conformance
(dmarc).
Litvak, M. (2018). Deep dive into authorship verification
of email messages with convolutional neural network.
In Annual International Symposium on Information
Management and Big Data, pages 129–136. Springer.
Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013).
Efficient estimation of word representations in vector
space. arXiv preprint arXiv:1301.3781.
Myers, J. G. (1999). Smtp service extension for authentica-
tion.
Peng, C.-Y. J., Lee, K. L., and Ingersoll, G. M. (2002). An
introduction to logistic regression analysis and report-
ing. The journal of educational research, 96(1):3–14.
Quinlan, J. R. (1986). Induction of decision trees. Machine
learning, 1(1):81–106.
Radicati Group, I. (2019). Emailstatistics report, 2019-
2023.
Ruder, S., Ghaffari, P., and Breslin, J. G. (2016). Character-
level and multi-channel convolutional neural networks
for large-scale authorship attribution. arXiv preprint
arXiv:1609.06686.
Schuster, M. and Paliwal, K. K. (1997). Bidirectional re-
current neural networks. IEEE Transactions on Signal
Processing, 45(11):2673–2681.
Shrestha, P., Sierra, S., Gonzalez, F., Montes, M., Rosso, P.,
and Solorio, T. (2017). Convolutional neural networks
for authorship attribution of short texts. In Proceed-
ings of the 15th Conference of the European Chapter
of the Association for Computational Linguistics: Vol-
ume 2, Short Papers, pages 669–674.
Email Spoofing Attack Detection through an End to End Authorship Attribution System
73