Phishing data set. For both data sets the use of fea-
ture penalization outperforms the model obtained us-
ing C
2
= 0, which can be considered a variant of the
ARD model presented in (Chapelle et al., 2002). This
fact proves the importance of feature selection in rel-
atively high dimensional data sets, such as the ones
presented in this work.
6 CONCLUSIONS
In this work we present an embedded approach for
feature selection using SVM applied to phishing and
spam classification. A comparison with other feature
selection methods shows its advantages:
• It outperforms other techniques in terms of classi-
fication accuracy.
• It is not necessary to set a priori the number of
features to be selected, unlike other feature se-
lection approaches. The algorithm determines the
optimal feature number according to the regular-
ization parameter C
2
.
• It can be used with other kernel functions, such as
linear and polynomial kernels.
Even if several parameters have to be tuned, the
computational effort can be reduced since the search
for a feature subset can be obtained automatically, re-
ducing computational time by avoiding a validation
step on finding an adequate number of features.
Future work has to be done in various directions.
First, we consider the extension to highly imbalanced
data sets, a very relevant topic in phishing and spam
classification, and in pattern recognition in general.
Furthermore, the current scenario for spam and phish-
ing classification suggests the extension of the pro-
posed embedded feature selection technique to very
large databases as an important research opportunity.
ACKNOWLEDGEMENTS
Support from the Chilean “Instituto Sistemas Com-
plejos de Ingenier
´
ıa” (ICM: P-05-004-F) is greatly ac-
knowledged.
REFERENCES
Asuncion, A. and Newman, D. (2007). UCI machine learn-
ing repository.
Bergholz, A., Beer, J. D., Glahn, S., Moens, M.-F., Paass,
G., and Strobel, S. (2010). New filtering approaches
for phishing email. Journal of Computer Security,
18(1):7–35.
Bradley, P. and Mangasarian, O. (1998). Feature selec-
tion v
´
ıa concave minimization and support vector ma-
chines. In Int. Conference on Machine Learning,
pages 82–90.
Canu, S. and Grandvalet, Y. (2002). Adaptive scaling for
feature selection in SVMs. Advances in Neural Infor-
mation Processing Systems, 15:553–560.
Chapelle, O., Vapnik, V., Bousquet, O., and Mukherjee,
S. (2002). Choosing multiple parameters for support
vector machines. Machine Learning, 46:131–159.
Goodman, J., Cormack, G. V., and Heckerman, D. (2007).
Spam and the ongoing battle for the inbox. Commun.
ACM, 50(2):24–33.
Guyon, I., Gunn, S., Nikravesh, M., and Zadeh, L. A.
(2006). Feature extraction, foundations and applica-
tions. Springer, Berlin.
Guyon, I., Saffari, A., Dror, G., and Cawley, G. (2009).
Model selection: Beyond the bayesian frequentist di-
vide. Journal of Machine Learning research, 11:61–
87.
L’Huillier, G., Hevia, A., Weber, R., and Rios, S. (2010).
Latent semantic analysis and keyword extraction for
phishing classification. In ISI’10: Proceedings of the
IEEE International Conference on Intelligence and
Security Informatics, pages 129–131, Vancouver, BC,
Canada. IEEE.
Maldonado, S. and Weber, R. (2009). A wrapper method
for feature selection using support vector machines.
Information Sciences, 179:2208–2217.
Maldonado, S., Weber, R., and Basak, J. (2011). Kernel-
penalized SVM for feature selection. Information Sci-
ences, 181(1):115–128.
Neumann, J., Schn
¨
orr, C., and Steidl, G. (2005). Combined
svm-based feature selection and classification. Ma-
chine Learning, 61:129–150.
Rakotomamonjy, A. (2003). Variable selection using SVM-
based criteria. Journal of Machine Learning research,
3:1357–1370.
Tang, Y., Krasser, S., Alperovitch, D., and Judge, P. (2008).
Spam sender detection with classification modeling on
highly imbalanced mail server behavior data. In Pro-
ceedings of the International Conference on Artificial
Intelligence and Pattern Recognition, AIPR’08, pages
174–180. ISRST.
Taylor, B., Fingal, D., and Aberdeen, D. (2007). The war
against spam: A report from the front line. In In NIPS
2007 Workshop on Machine Learning in Adversarial
Environments for Computer Security.
Vapnik, V. (1998). Statistical Learning Theory. John Wiley
and Sons.
Weston, J., Elisseeff, A., Sch
¨
olkopf, B., and Tipping, M.
(2003). The use of zero-norm with linear models and
kernel methods. Journal of Machine Learning re-
search, 3:1439–1461.
Weston, J., Mukherjee, S., Chapelle, O., Ponntil, M., Pog-
gio, T., and Vapnik, V. (2001). Feature selection for
SVMs. In Advances in Neural Information Processing
Systems 13, volume 13.
ICPRAM 2012 - International Conference on Pattern Recognition Applications and Methods
450