emails. Consequently, zoning of emails and auto-
mated ordering of the resulting pictures maybe helps
identifying spam emails.
7 FUTURE WORK AND
CONCLUSION
This project and the implementation of CoZo are
not finished. With this status of implementation, we
tested the approach of Content Zoning. One of our
next challenges is to adapt the application so that it
will be possible to have a semi-automatically recog-
nition of the zones. Semi-automatically means in this
case that most obvious zones are detected automati-
cally, e.g. price, currency symbol, etc. This would
simplify the zoning for the user, but the more com-
plex zoning should still be realised by the user for
getting more precise results. The complexity of this
function is to detect the zones correctly even when
having no legal formats. An example is “price” zone,
where numbers are represented by characters, e.g. “0”
replaced by “o”.
The next step is to implement an automated zon-
ing so that the system calculates the zones and impor-
tant variables without any input from users. There-
after, the evaluation of the zoned emails must be au-
tomated and the system can be added to an existing
spam filter.
There are different possibilities for classifying an
email into the various types of spam emails using
Content Zoning. One approach would be to define
a similarity metric according to the calculated zone
statistics (e.g. zone ordering, zone sizes, etc.). An-
other approach is to use picture matching to realise
a comparison between two emails. To do this, the
emails are e.g. coloured in order to denote the dif-
ferent zones. The actual comparison is of course not
limited to basic (exact) picture matching but can as
well include more sophisticated image analysis tech-
niques. By doing this, we can perform a comparison
of a new email to our existing spam email types (fi-
nancial and pharmaceutical here) to decide whether
or not the email is spam at all.
To conclude the article, we can say that Content
Zoning is a promising technique for spam detection
and supports existing spam filters with additional in-
formation. The results show that the picture of the
zoned email and the calculated variables contain indi-
cators whether an email is spam or not.
ACKNOWLEDGEMENTS
This project is a part of the TRIAS (TRIAS, 2005)
project of the Computer Science and Communication
research unit from the University of Luxembourg. We
want to thank the staff of the MINE-team for their
support and advise.
REFERENCES
Androutsopoulos, I., Koutsias, J., Chandrinos, K.,
Paliouras, G., and Spyropoulos, C. (2000). An evalu-
ation of naive bayesian anti-spam filtering.
Brucks, C. and Wagner, C. (2006). Spam analysis for net-
work protection. TFE Thesis - University of Luxem-
bourg.
Feltrim, V., Teufel, S., Nunes, G. G., and Alusio, S. (2005).
Argumentative Zoning applied to Critiquing Novices’
Scientific Abstracts. In Computing Attitude and Af-
fect in Text: Theory and Applications, pages 233–245.
Springer, Dordrecht, The Netherlands.
Hilker, M. and Schommer, C. (2006). SANA security anal-
ysis in internet traffic through artificial immune sys-
tems. Proceedings of the Trustworthy Software Work-
shop Saarbruecken, Germany.
Rigoutsos, I. and Huynh, T. (2004). Chung-kwei: a pattern-
discovery-based system for the automatic identifica-
tion of unsolicited e-mail messages (spam). In Proc.
of the Conference on Email and Anti-Spam (CEAS).
Roesch, M. (1999). SNORT - lightweight intrusion detec-
tion for networks. LISA, 13:229–238.
Teufel, S. (1999). Argumentative zoning: Information ex-
traction from scientific text. Phd Thesis, University of
Edinburgh, England.
TRIAS (2005). Logic of trust and reliability of
information agents in science. CSC, Univer-
sity of Luxembourg, Project description, Link:
http://wiki.uni.lu/mine/TRIAS.html.
Wittel, G. and Wu, S. (2004). On attacking statistical spam
filters. In Proc. of the Conference on Email and Anti-
Spam (CEAS).
COZO - CONTENT ZONING FOR SPAM EMAILS
27