Evolutionary Symbiotic Feature Selection for Email Spam Detection

Paulo Cortez, Rui Vaz, Miguel Rocha, Miguel Rio, Pedro Sousa

Abstract

This work presents a symbiotic filtering approach enabling the exchange of relevant word features among different users in order to improve local anti-spam filters. The local spam filtering is based on a Content-Based Filtering strategy, where word frequencies are fed into a Naive Bayes learner. Several Evolutionary Algorithms are explored for feature selection, including the proposed symbiotic exchange of the most relevant features among different users. The experiments were conducted using a novel corpus based on the well known Enron datasets mixed with recent spam. The obtained results show that the symbiotic approach is competitive.

References

  1. De Jong, K. (2006). Evolutionary computation: a Unified Approach. The MIT Press.
  2. Dudley, J., Barone, L., and While, L. (2008). Multiobjective spam filtering using an evolutionary algorithm, pages 123-130. IEEE.
  3. Evangelista, P., Maia, P., and Rocha, M. (2009). Implementing metaheuristic optimization algorithms with jecoli. In Intelligent Systems Design and Applications, 2009. ISDA'09. Ninth International Conference on, pages 505-510. IEEE.
  4. Fawcett, T. (2006). An introduction to roc analysis. Pattern Recognition Letters, 27:861-874.
  5. Flexer, A. (1996). Statistical Evaluation of Neural Networks Experiments: Minimum Requirements and Current Practice. In Proc. of the 13th European Meeting on Cybernetics and Systems Research, volume 2, pages 1005-1008, Vienna, Austria.
  6. Garriss, S., Kaminsky, M., Freedman, M., Karp, B., Mazières, D., and Yu, H. (2006). RE: reliable email. In Proc. of the 3rd conference on Networked Systems Design and Implementation (NSDI), pages 297-310, San Jose, CA. USENIX Association Berkeley, USA.
  7. Gray, A. and Haahr, M. (2004). Personalised, Collaborative Spam Filtering. In 1st Conference on E-Mail and AntiSpam CEAS.
  8. Guyon, I. and Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3:1157-1182.
  9. Lopes, C., Cortez, P., Sousa, P., Rocha, M., and Rio, M. (2011). Symbiotic filtering for spam email detection. Expert Systems with Applications, 38(8):9365-9372.
  10. Lopez-Herrera, A., Herrera-Viedma, E., and Herrera, F. (2008). A multiobjective evolutionary algorithm for spam e-mail filtering. In Intelligent System and Knowledge Engineering, 2008. ISKE 2008. 3rd International Conference on, volume 1, pages 366 -371.
  11. Méndez, J., Cid, I., Glez-Pen˜a, D., Rocha, M., and FdezRiverola, F. (2008). A Comparative Impact Study of Attribute Selection Techniques on Naive Bayes Spam Filters. In Springer, editor, 8th Industrial Conference on Data Mining, volume LNAI 5077, pages 213-227.
  12. Metsis, V., Androutsopoulos, I., and Paliouras, G. (2006). Spam filtering with naive bayes - which naive bayes? In Third Conference on Email and AntiSpam CEAS, pages 125-134. Citeseer.
  13. Mierswa, I., Wurst, M., Klinkenberg, R., Scholz, M., and Euler, T. (2006). Yale: Rapid prototyping for complex data mining tasks. In Proc. of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 935-940. ACM.
  14. Radcliffe, N. (1993). Genetic set recombination. Foundations of Genetic Algorithms, 2:203-219.
  15. Zhang, Y., Li, H., Niranjan, M., and Rockett, P. (2008). Applying cost-sensitive multiobjective genetic programming to feature extraction for spam e-mail filtering. In Proc. of the 11th European conference on Genetic programming, pages 325-336. Springer-Verlag.
Download


Paper Citation


in Harvard Style

Cortez P., Vaz R., Rocha M., Rio M. and Sousa P. (2012). Evolutionary Symbiotic Feature Selection for Email Spam Detection . In Proceedings of the 9th International Conference on Informatics in Control, Automation and Robotics - Volume 1: ICINCO, ISBN 978-989-8565-21-1, pages 159-164. DOI: 10.5220/0004010201590164


in Bibtex Style

@conference{icinco12,
author={Paulo Cortez and Rui Vaz and Miguel Rocha and Miguel Rio and Pedro Sousa},
title={Evolutionary Symbiotic Feature Selection for Email Spam Detection},
booktitle={Proceedings of the 9th International Conference on Informatics in Control, Automation and Robotics - Volume 1: ICINCO,},
year={2012},
pages={159-164},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004010201590164},
isbn={978-989-8565-21-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 9th International Conference on Informatics in Control, Automation and Robotics - Volume 1: ICINCO,
TI - Evolutionary Symbiotic Feature Selection for Email Spam Detection
SN - 978-989-8565-21-1
AU - Cortez P.
AU - Vaz R.
AU - Rocha M.
AU - Rio M.
AU - Sousa P.
PY - 2012
SP - 159
EP - 164
DO - 10.5220/0004010201590164