The results here indicate that even an unsupervi-
sed technique such as EM clustering could be quite
strong against the current crop of image spam. Ho-
wever, if spammers use somewhat more advanced
techniques, it is highly unlikely that the resulting
image spam can be effectively detected using any
combination of the 38 image processing based fea-
tures we have considered in this paper.
7 CONCLUSION
Using samples of real-world ham and spam images,
we showed that machine learning algorithms based on
features extracted by image processing techniques can
be used to construct strong classifiers. Our results on
these real-world datasets improves slightly over the
related work in (Annadatha and Stamp, 2016).
We also showed that it is not difficult to generate
much stronger image spam, in the sense that the de-
tection problem is significantly more challenging. In
addition, we showed that such improved image spam
cannot be reliably detected using the image proces-
sing based features considered here. These results
improve over the challenge dataset presented in (An-
nadatha and Stamp, 2016), in the sense that the chal-
lenge dataset in this paper is significantly more diffi-
cult to distinguish from ham, even when using a richer
and more informative feature set. These results indi-
cate that we will likely need new approaches to detect
image spam in the future.
More research is needed to develop and analyze
improved methods for image spam detection. To this
end, we have developed a large image spam challenge
dataset that we will provide to any researchers in this
field. By experimenting on this challenge dataset,
it will be possible to directly compare results based
on different proposed detection techniques. Additi-
onal experiments involving this dataset using neural
networks and deep learning would be timely, and it
would be interesting to have such a direct compari-
son between the SVM analysis in this paper, and deep
learning techniques.
REFERENCES
Annadatha, A. and Stamp, M. (2016). Image spam analy-
sis and detection. Journal of Computer Virology and
Hacking Techniques, 23:1–14.
Bradley, A. P. (1997). The use of the area under the
roc curve in the evaluation of machine learning algo-
rithms. Pattern Recognition, 30(7):1145–1159.
Chowdhury, M., Gao, J., and Chowdhury, M. (2015). Image
Spam Classification Using Neural Network, pages
622–632. Springer International Publishing, Austra-
lia.
Dhanaraj, S. and Karthikeyani, V. (2013). A study on e-mail
image spam filtering techniques. In 2013 Internatio-
nal Conference on Pattern Recognition, Informatics
and Mobile Engineering, pages 49–55.
Dredze, M., Gevaryahu, R., and Elias-Bachrach, A. (2007).
Learning fast classifiers for image spam. In CEAS.
Fumera, G., Pillai, I., and Roli, F. (2006). Spam filtering
based on the analysis of text information embedded
into images. Journal of Machine Learning Research,
7:2699–2720.
Gao, Y., Choudhary, A., and Hua, G. (2010). A comprehen-
sive approach to image spam detection: From server
to client solution. IEEE Transactions on Information
Forensics and Security, 5(4):826–836.
Gao, Y., Yang, M., Zhao, X., Pardo, B., Wu, Y., Pappas,
T. N., and Choudhary, A. (2008). Image spam hun-
ter. In 2008 IEEE International Conference on Acou-
stics, Speech and Signal Processing, pages 1765–
1768. IEEE.
Kumaresan, T., Sanjushree, S., Suhasini, K., and Palani-
samy, C. (2015). Article: Image spam filtering using
support vector machine and particle swarm optimiza-
tion. IJCA Proceedings on National Conference on
Information Processing and Remote Computing, NCI-
PRC 2015(1):17–21. Full text available.
Lai, C.-C. and Tsai, M.-C. (2004). An empirical perfor-
mance comparison of machine learning methods for
spam e-mail categorization. In Fourth Internatio-
nal Conference on Hybrid Intelligent Systems, 2004.
HIS’04., pages 44–48. IEEE.
Nixon, M. (2008). Feature extraction and image proces-
sing. Academic Press.
Soranamageswari, M. and Meena, C. (2010). Statistical fe-
ature extraction for classification of image spam using
artificial neural networks. In 2010 Second Internatio-
nal Conference on Machine Learning and Computing,
pages 101–105.
SpamAssasin (2005). Spamassasin: The apache spamassa-
sin project.
Stamp, M. (2017). Machine Learning with Applications in
Information Security. Chapman and Hall/CRC.
Whitney, L. (2009). Report: Spam now 90 percent of all
e-mail. CNET News, 26.
Yajamanam, S., Selvin, V. R. S., Troia, F. D., and Stamp,
M. (2018). Deep learning versus gist descriptors for
image-based malware classification. In Proceedings
of 2nd International Workshop on Formal Methods for
Security Engineering, ForSE ’18.
Support Vector Machines for Image Spam Analysis
441