sages and more immunity to virus, worms or phish-
ing attacks. EAS requires more communication and
computation when compared with the simpler CBF
method. However, the increase in computation is still
affordable for a common user and the communication
costs are low, around the size of one email message
for every batch (e.g. 100 messages). Moreover, the
execution of a batch for the EA is not computationally
expensive. For example, under the tested computer,
the average execution times for 100 generations of the
EAS were 11s for user pla and 41s for user mar.
This paper proposes a novel distributed feature selec-
tion approach for spam detection making use of a EA
engine for the search of the best features and adopts
a SF stategy to share features among distinct users.
The goal is to reuse features that were considered rel-
evant for other users in order to improve spam detec-
tion at a personalized level. The NB classifier was
adopted as the local CBF and tested in a new cor-
pus that performs a realistic mixture of ham messages
from five Enron users with recent spam. The perfor-
mance of EAS was compared with two local EA algo-
rithms (EAR and EAM), as well as the simpler CBF
method based on the information gain criterion. The
results show that even considering a small simbiotic
group (i.e. 5 users), EAS achieves the best spam de-
tection performance, as measured by the AUC metric.
The work of P. Cortez and P. Sousa was funded
by FEDER, through the program COMPETE and
Portuguese Foundation for Science and Technology
(FCT), by project FCOMP-01-0124-FEDER-022674.
