6 CONCLUSIONS AND FUTURE
WORK
In this paper, we present an automated comprehen-
sive email feature engineering framework that has
been developed for the purpose of spam detection and
classification. The framework incorporates a scal-
able mechanism for automated feature engineering
and classification algorithms for spam classification.
Currently, the proposed framework is capable of pro-
ducing high accuracy for spam classification with 148
manual email features used. The automated feature
engineering scheme further improves the classifica-
tion accuracy of some of the classifiers up to 12%,
with more features being added into the analysis.
For future work, we propose to look into the pro-
cess optimization of the developed framework, to en-
able more efficient feature engineering and classifica-
tion processes.
REFERENCES
Al-Shboul, B. A., Hakh, H., Faris, H., Aljarah, I., and Al-
sawalqah, H. (2016). Voting-based classification for
e-mail spam detection. Journal of ICT Research and
Applications, 10(1):29–42.
Alqatawna, J., Faris, H., Jaradat, K., Al-Zewairi, M., and
Omar, A. (2015). Improving knowledge based spam
detection methods: The effect of malicious related
features in imbalance data distribution. International
Journal of Communications, Network and System Sci-
ences, 8(5):118–129.
Alqatawna, J., Hadi, A., Al-Zwairi, M., and Khader, M.
(2016). A preliminary analysis of drive-by email
attacks in educational institutes. In Cybersecurity
and Cyberforensics Conference (CCC), pages 65–69.
IEEE.
Blanzieri, E. and Bryl, A. (2008). A survey of learning-
based techniques of email spam filtering. Artif. Intell.
Rev., 29(1):63–92.
Caruana, G. and Li, M. (2008). A survey of emerging
approaches to spam filtering. ACM Comput. Surv.,
44(2):9:1–9:27.
Choi, W. H. (2012). Finding appropriate lexical diver-
sity measurements for small-size corpus. In Applied
Mechanics and Materials, volume 121, pages 1244–
1248. Trans Tech Publ.
Domingos, P. (2012). A few useful things to know about
machine learning. Communications of the ACM,
55(10):78–87.
Faris, H., Aljarah, I., and Alqatawna, J. (2015). Optimiz-
ing feedforward neural networks using krill herd algo-
rithm for e-mail spam detection. In Applied Electrical
Engineering and Computing Technologies (AEECT),
2015 IEEE Jordan Conference on , vol., no., pp.1-5,
pages 1–5. IEEE.
Guzella, T. S. and Caminhas, W. M. (2009). A review of
machine learning approaches to spam filtering. Expert
Systems with Applications, 36(7):10206 – 10222.
G.Vijayasekaran, S. (2018). Spam and email detection in
big data plaftorm using naive bayesian classifier. In-
ternational Journal of Computer Science and Mobile
Computing, Vol. 7, Issue. 4.
Halaseh, R. A. and Alqatawna, J. (2016). Analyzing cy-
bercrimes strategies: The case of phishing attack. In
Cybersecurity and Cyberforensics Conference (CCC),
pages 82–88. IEEE.
Hanif Bhuiyan, Akm Ashiquzzaman, T. I. J. S. B. . J. A.
(2018). A survey of existing e-mail spam filter-
ing methods considering machine learning techniques.
Global journal of computer science and technology:
Software and Data Engineering, 18 issue 2 Version
1.0.
Herzallah, W., Faris, H., and Adwan, O. (2018). Feature
engineering for detecting spammers on twitter: Mod-
elling and analysis. Journal of Information Science,
44(2):230–247.
Hijawi, W., Faris, H., Alqatawna, J., Ala’M, A. Z., and Al-
jarah, I. (2017a). Improving email spam detection us-
ing content based feature engineering approach. In
Applied Electrical Engineering and Computing Tech-
nologies (AEECT). IEEE.
Hijawi, W., Faris, H., Alqatawna, J., Aljarah, I., Al-Zoubi,
A., and Habib, M. (2017b). Emfet: E-mail features
extraction tool. arXiv preprint arXiv:1711.08521.
Kanter, J. M. and Veeramachaneni, K. (2015a). Deep fea-
ture synthesis: Towards automating data science en-
deavors. In Data Science and Advanced Analytics
(DSAA), 2015. 36678 2015. IEEE International Con-
ference on, pages 1–10. IEEE.
Kanter, J. M. and Veeramachaneni, K. (2015b). Deep fea-
ture synthesis: Towards automating data science en-
deavors. In 2015 IEEE International Conference on
Data Science and Advanced Analytics (DSAA), pages
1–10.
Kaspersky (2016 (accessed May 20, 2017)). Spam and
phishing in Q3 2016.
Katz, G., Shin, E. C. R., and Song, D. (2016). Explorekit:
Automatic feature generation and selection. In 2016
IEEE 16th International Conference on Data Mining
(ICDM), pages 979–984.
Khurana, U., Turaga, D., Samulowitz, H., and Parthas-
rathy, S. (2016). Cognito: Automated feature engi-
neering for supervised learning. In 2016 IEEE 16th
International Conference on Data Mining Workshops
(ICDMW), pages 1304–1307.
Koitka, S. and Friedrich, C. M. (2016). Traditional feature
engineering and deep learning approaches at medical
classification task of imageclef 2016. In CLEF (Work-
ing Notes), pages 304–317.
Lam, H. T., Thiebaut, J., Sinn, M., Chen, B., Mai, T., and
Alkan, O. (2017a). One button machine for automat-
ing feature engineering in relational databases. CoRR,
abs/1706.00327.
Lam, H. T., Thiebaut, J.-M., Sinn, M., Chen, B., Mai, T.,
and Alkan, O. (2017b). One button machine for au-
ICISSP 2019 - 5th International Conference on Information Systems Security and Privacy
436