majority classifier according to accuracy of
individual classifiers on validating set fails to display
the expected improvement. On the other hand, the
study also reveals the importance of exploring the
subcategorization that exists in the original dataset,
where substantial improvement has been noticed,
which brings the achieved accuracy marginally
outperforming many of the state of the art results
employing the same dataset. This also opens
interesting perspective work in order to explore the
theoretical foundation of such mechanism.
REFERENCES
R. Shams, and R. E. Mercer, "Classifying spam emails
using text and readability features.," in Data Mining
(ICDM), 2013 IEEE 13th International Conference on,
2013.
The Radicati Group, Inc., "Email Statistics Report, 2015-
2019," Retrieved Augest 14, 2017 from Radicati's
database.
Z. Chuan, et al., "A LVQ-based neural network anti-spam
email approach.," ACM SIGOPS Operating Systems
Review, vol. 39, no. 1, pp. 34-39, 2005.
E. Harris, "The Next Step in the Spam Control War:
Greylisting by Evan Harris," 21 08 2003. [Online].
Available:
http://projects.puremagic.com/greylisting/whitepaper.h
tml. [Accessed 2 08 2017].
O. Amayri and N. Bouguila, "A study of spam filtering
using support vector machines.," Artificial Intelligence
Review, vol. 34, no. 1, pp. 73-108, 2010.
T. Joachims, "Text categorization with support vector
machines: Learning with many relevant features.," in
Machine learning, 1998.
V. Metsis, I. Androutsopoulos and G. Paliouras, "Spam
filtering with naive bayes-which naive bayes?," CEAS,
vol. 17, pp. 28-69, 2006.
I. Androutsopoulos, et al., "An evaluation of naive bayesian
anti-spam filtering.," 2000.
A. Saberi, M. Vahidi and B. M. Bidgoli, "Learn to detect
phishing scams using learning and ensemble?
methods.," in Proceedings of the 2007 IEEE/WIC/ACM
International Conferences on Web Intelligence and
Intelligent Agent Technology-Workshops, 2007.
K. Tretyakov, "Machine learning techniques in spam
filtering.," Data Mining Problem-oriented Seminar,
vol. 3, no. 177, pp. 60-79, 2004.
P. Willett, "The Porter stemming algorithm: then and
now.," Program, vol. 40, no. 3, pp. 219-223, 2006.
G. Salton, and M. J. McGill, "Introduction to Modern
Information Retrieval.," 1986.
V. Vapnik, "The nature of statistical learning theory," 1995.
A. McCallum and K. Nigram, "A comparison of event
models for naive bayes text classification," AAAI-98
workshop on learning for text categorization, vol. 752,
pp. 41-48, 1998.
D. Ruta and B. Gabrys, "Classifier selection for majority
voting.," Information fusion, vol. 6, no. 1, pp. 63-81,
2005.
J. W. Tukey, "Exploratory data analysis," 1977.
L. Breiman, "Bagging predictors," Machine learning, vol.
24, no. 2, pp. 123-140, 1996.
D. W. Opitz, and J. W. Shavlik, "Generating accurate and
diverse members of a neural-network ensemble," 1996.
L. Zhang, J. Zhu and T. Yao, "An evaluation of statistical
spam filtering techniques.," ACM Transactions on
Asian Language Information Processing , vol. 3, no. 4,
pp. 243-269, 2004.
A. Bratko, et al., "Spam filtering using statistical data
compression models.," Journal of machine learning
research, vol. 7, pp. 2673-2698, Dec 2006.
I. Katakis, G. Tsoumakas and I. Vlahavas, "Tracking
recurring contexts using ensemble classifiers: an
application to email filtering.," Knowledge and
Information Systems, vol. 22, no. 3, pp. 371-391, 2010.
K. M. Schneider, "On word frequency information and
negative evidence in Naive Bayes text classification.,"
in Advances in Natural Language Processing.
S. Eyheramendy, D. D. Lewis and D. Madigan, "On the
naive bayes model for text categorization," Citeseer,
2003.
PN. Tan, M. Steinbach and V. Kumar., "Classification:
basic concepts, decision trees, and model evaluation."
in Introduction to Data Mining, 2006, pp. 145-205.