Authors:
Pablo Bermejo
;
José A. Gámez
and
José M. Puerta
Affiliation:
University of Castilla-La Mancha, Spain
Keyword(s):
Text mining, e-mail foldering, attribute construction, X-of-N attribute, wrapper approach, forward search.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Artificial Intelligence and Decision Support Systems
;
Bayesian Networks
;
Biomedical Engineering
;
Business Analytics
;
Data Engineering
;
Data Mining
;
Databases and Information Systems Integration
;
Datamining
;
Enterprise Information Systems
;
Health Information Systems
;
Sensor Networks
;
Signal Processing
;
Soft Computing
Abstract:
E-mail classification is one of the outstanding tasks in text mining, however most of the efforts in this topic have been devoted to the detection of spam or junk e-mail, that is, a classification problem with only two possible classes: spam and not-spam. In this paper we deal with a different e-mail classification problem known as e-mail foldering which consists on the classification of incoming mail into the different folders previously created by the user. This task has received less attention and is quite complex due to the (usually large) cardinality of the class variable (the number of folders). In this paper we try to improve the classification accuracy by looking for new attributes derived from the existing ones by using a data-driven approach. The attribute is constructed by taking into account the type of classifier to be used later and following a wrapper approach guided by a forward greedy search. The experiments carried out show that in all the cases the accuracy of the
classifier is improved when the new attribute is added to the original ones.
(More)