sages and more immunity to virus, worms or phish-
ing attacks. EAS requires more communication and
computation when compared with the simpler CBF
method. However, the increase in computation is still
affordable for a common user and the communication
costs are low, around the size of one email message
for every batch (e.g. 100 messages). Moreover, the
execution of a batch for the EA is not computationally
expensive. For example, under the tested computer,
the average execution times for 100 generations of the
EAS were 11s for user pla and 41s for user mar.
5 CONCLUSIONS
This paper proposes a novel distributed feature selec-
tion approach for spam detection making use of a EA
engine for the search of the best features and adopts
a SF stategy to share features among distinct users.
The goal is to reuse features that were considered rel-
evant for other users in order to improve spam detec-
tion at a personalized level. The NB classifier was
adopted as the local CBF and tested in a new cor-
pus that performs a realistic mixture of ham messages
from five Enron users with recent spam. The perfor-
mance of EAS was compared with two local EA algo-
rithms (EAR and EAM), as well as the simpler CBF
method based on the information gain criterion. The
results show that even considering a small simbiotic
group (i.e. 5 users), EAS achieves the best spam de-
tection performance, as measured by the AUC metric.
ACKNOWLEDGEMENTS
The work of P. Cortez and P. Sousa was funded
by FEDER, through the program COMPETE and
Portuguese Foundation for Science and Technology
(FCT), by project FCOMP-01-0124-FEDER-022674.
REFERENCES
De Jong, K. (2006). Evolutionary computation: a Unified
Approach. The MIT Press.
Dudley, J., Barone, L., and While, L. (2008). Multi-
objective spam filtering using an evolutionary algo-
rithm, pages 123–130. IEEE.
Evangelista, P., Maia, P., and Rocha, M. (2009). Implement-
ing metaheuristic optimization algorithms with jecoli.
In Intelligent Systems Design and Applications, 2009.
ISDA’09. Ninth International Conference on, pages
505–510. IEEE.
Fawcett, T. (2006). An introduction to ROC analysis. Pat-
tern Recognition Letters, 27:861–874.
Flexer, A. (1996). Statistical Evaluation of Neural Networks
Experiments: Minimum Requirements and Current
Practice. In Proc. of the 13th European Meeting on
Cybernetics and Systems Research, volume 2, pages
1005–1008, Vienna, Austria.
Garriss, S., Kaminsky, M., Freedman, M., Karp, B.,
Mazi
`
eres, D., and Yu, H. (2006). RE: reliable email.
In Proc. of the 3rd conference on Networked Systems
Design and Implementation (NSDI), pages 297–310,
San Jose, CA. USENIX Association Berkeley, USA.
Gray, A. and Haahr, M. (2004). Personalised, Collaborative
Spam Filtering. In 1st Conference on E-Mail and Anti-
Spam CEAS.
Guyon, I. and Elisseeff, A. (2003). An introduction to vari-
able and feature selection. Journal of Machine Learn-
ing Research, 3:1157–1182.
Lopes, C., Cortez, P., Sousa, P., Rocha, M., and Rio, M.
(2011). Symbiotic filtering for spam email detection.
Expert Systems with Applications, 38(8):9365–9372.
Lopez-Herrera, A., Herrera-Viedma, E., and Herrera, F.
(2008). A multiobjective evolutionary algorithm for
spam e-mail filtering. In Intelligent System and
Knowledge Engineering, 2008. ISKE 2008. 3rd Inter-
national Conference on, volume 1, pages 366 –371.
M
´
endez, J., Cid, I., Glez-Pe
˜
na, D., Rocha, M., and Fdez-
Riverola, F. (2008). A Comparative Impact Study of
Attribute Selection Techniques on Naive Bayes Spam
Filters. In Springer, editor, 8th Industrial Conference
on Data Mining, volume LNAI 5077, pages 213–227.
Metsis, V., Androutsopoulos, I., and Paliouras, G. (2006).
Spam filtering with naive bayes – which naive bayes?
In Third Conference on Email and AntiSpam CEAS,
pages 125–134. Citeseer.
Mierswa, I., Wurst, M., Klinkenberg, R., Scholz, M., and
Euler, T. (2006). Yale: Rapid prototyping for complex
data mining tasks. In Proc. of the 12th ACM SIGKDD
international conference on Knowledge discovery and
data mining, pages 935–940. ACM.
Radcliffe, N. (1993). Genetic set recombination. Founda-
tions of Genetic Algorithms, 2:203–219.
Zhang, Y., Li, H., Niranjan, M., and Rockett, P. (2008). Ap-
plying cost-sensitive multiobjective genetic program-
ming to feature extraction for spam e-mail filtering.
In Proc. of the 11th European conference on Genetic
programming, pages 325–336. Springer-Verlag.
ICINCO2012-9thInternationalConferenceonInformaticsinControl,AutomationandRobotics
164