Table 3: Axioms derived by C 4.5 Decision Tree mining and their performance.
S. No. Rules Accuracy(%)
1
Email Label = ‘sports’ & sports = ‘False’ & gender = ‘Male’ & computers =
‘True’ then Interested
100
2
Email Label = ‘religion’ & religion = ‘False’ & medicine = ‘True’ & age =
‘Junior’ then NotInterested
83.3
3
Email Label = ‘politics’ & age = ‘Junior’ & religion = ‘False’ & computers =
‘True’ & gender = ‘Male’ then Interested
83.33
4
Email Label = ‘sports’ & sports = ‘False’ & gender = ‘Female’ then NotInter-
ested
77.78
5
Email Label = ‘religion’ & religion = ‘False’ & medicine = ‘True’ & age =
‘Senior’ & gender = ‘Female’ then NotInterested
66.67
ferences in implementation (as outlined in Table 2),
the major improvement being the automatic labeling
of emails by a categorizer.
As proposed by (Kim et al., 2007), Rule Accuracy
or Rule Confidence can be used to calculate correct-
ness of each rule in the ontology. For our classifica-
tion scheme, this score can be a metric based on con-
fidence of the Rule because more is the confidence of
Rule, more is the probability that the Response pre-
dicted is correct. Some of the rules along with their
accuracy are listed in Table 3. The overall accuracy
of our system was found to be 67.6%.
5 CONCLUSIONS
We have extended, and implemented a method for
email classification originally proposed by (Kim et
al., 2007). The salient feature of the approach is that it
incorporates user preferences towards the final classi-
fication decision. Another important feature is utiliza-
tion of a user preference ontology to classify emails.
We have removed human intervention from the task of
providing label to each email manually, by integrat-
ing the ontology based classification system with an
automated email categorization system. The limited
amount of data collected might have been a constraint
towards getting clear correlation between user pref-
erences and their responses since the collected data
might have not captured all correlations.
This work also establishes the better performance
of the proposed term weighing method over the con-
ventional TF and TF-IDF methods when tested on
emails. Various feature selection methods were also
compared using the naive Bayesian classifier among
which Chi-square method gave best results. However,
since emails behave little differently than normal doc-
uments during categorization, incorporation of more
heuristics and parameters might improve the accuracy
of the categorizer and how the effect of any heuristic
varies with change in categorization algorithms would
be an interesting thing to observe.
REFERENCES
Brewer, D., Thirumalai, S., Gomadam, K., Li, K., 2006. To-
wards an Ontology Driven Spam Filter. In Proceed-
ings of 22nd International Conference on Data Engi-
neering Workshops.
Itskevitch, J. 2001. Automatic hierarchical e-mail classi-
fication using association rules. MS Thesis, Simon
Fraser University.
Kim, J., Dou, D., Liu, H., Kwak, D., 2007. Constructing
A User Preference Ontology for Anti-spam Mail Sys-
tems. In Proceedings of the 20th Conference of the
Canadian Society For Computational Studies of intel-
ligence on Advances in Artificial intelligence. Mon-
treal, Canada.
Rennie, J. Ken Lang 2010. 20newsgroup dataset
http://people.csail.mit.edu/jrennie/20Newsgroups
Youn S., McLeod D., 2006. Ontology Development Tools
for Ontology-Based Knowledge Management. Ency-
clopedia of E-Commerce, E-government and Mobile
Commerce, Idea Group Inc.
Youn, S., Mcleod, D., 2007. Spam Email Classification us-
ing an Adaptive Ontology. In Proceedings of 4th In-
ternational Conference on Information Technology:
New Generations (ITNG). Las Vegas, NV.
Youn S., Mcleod D., 2009. Spam Decisions on Gray E-
mail using Personalized Ontologies. In Proceedings
of the 2009 ACM symposium on Applied Computing
SAC09, Honolulu, Hawaii, U.S.A.
Zhang, L. and Yao, T., 2003. Filtering Junk Email with a
Maximum Entropy Model. In ICCPOL03, Shen yang,
China.
Zhang, L., Zhu, J., Yao, T., 2004. An Evaluation of Spam
Filtering Techniques. In ACM transactions on Asian
Language Information Processing, Vol.3 No.4
KEOD 2010 - International Conference on Knowledge Engineering and Ontology Development
170