Phish-IDetector: Message-Id Based Automatic Phishing Detection

Rakesh M. Verma, Nirmala Rai


Phishing attacks are a well known problem in our age of electronic communication. Sensitive information like credit card details, login credentials for account, etc. are targeted by phishers. Emails are the most common channel for launching phishing attacks. They are made to resemble genuine ones as much as possible to fool recipients into divulging private and sensitive data, causing huge monetary losses every year. This paper presents a novel approach to detect phishing emails, which is simple and effective. It leverages the unique characteristics of the Message-ID field of an email header for successful detection and differentiation of phishing emails from legitimate ones. Using machine learning classifiers on n-gram features extracted from Message-IDs, we obtain over 99% detection rate with low false positives.


  1. Basnet, R. B. and Sung, A. H. (2010). Classifying phishing emails using confidence-weighted linear classifiers. In International Conference on Information Security and Artificial Intelligence (ISAI), pages 108-112.
  2. Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2):123-140.
  3. Breiman, L. (2001). Random forests. Machine Learning, 45(1):5-32.
  4. Chen, T.-C., Stepan, T., Dick, S., and Miller, J. (2014). An anti-phishing system employing diffused information. ACM Transactions on Information and System Security (TISSEC), 16(4):16.
  5. Costales, B., Janse, G., Abmann, C., and Shapiro, G. N. (2007). Sendmail (4th ed.). In Sendmail (4th ed.). O'Reilly.
  6. Crammer, K. (2009). Confidence weighted learning library.
  7. Fette, I., Sadeh, N., and Tomasic, A. (2007). Learning to detect phishing emails. In Proceedings of the 16th International Conference on World Wide Web, pages 649-656. ACM.
  8. Freund, Y. and Schapire, R. E. (1996). Experiments with a new boosting algorithm. In Thirteenth International Conference on Machine Learning, pages 148-156, San Francisco. Morgan Kaufmann.
  9. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., and Witten, I. H. (2009). The WEKA data mining software: An update. ACM SIGKDD Explorations Newsletter, 11(1):10-18.
  10. Hamid, I. R. A. and Abawajy, J. (2011). Hybrid feature selection for phishing email detection. In Algorithms and Architectures for Parallel Processing, pages 266- 275. Springer.
  11. Irani, D., Webb, S., Giffin, J., and Pu, C. (2008). Evolutionary study of phishing. In eCrime Researchers Summit, 2008, pages 1-10. IEEE.
  12. John, G. H. and Langley, P. (1995). Estimating continuous distributions in bayesian classifiers. In Eleventh Conference on Uncertainty in Artificial Intelligence, pages 338-345, San Mateo. Morgan Kaufmann.
  13. Mejer, A. and Crammer, K. (2010). Confidence in structured-prediction using confidence-weighted models. In Proceedings of the 2010 conference on empirical methods in natural language processing, pages 971-981. Association for Computational Linguistics.
  14. Nazario, J. (2004). The online phishing corpus. jose/wiki/doku.php.
  15. Pasupatheeswaran, S. (2008). Email 'message-ids' helpful for forensic analysis?
  16. Platt, J. et al. (1999). Fast training of support vector machines using sequential minimal optimization. Advances in kernel methods-support vector learning, 3.
  17. Quinlan, J. R. (2014). C4. 5: programs for machine learning. Elsevier.
  18. Resnick, P. (2001). Internet
  19. SpamAssassin, A. (2006). Spamassassin public mail corpus.
  20. Toolan, F. and Carthy, J. (2010). Feature selection for spam and phishing detection. In eCrime Researchers Summit (eCrime), 2010, pages 1-12. IEEE.
  21. Verma, R. and Hossain, N. (2014). Semantic feature selection for text with application to phishing email detection. In Information Security and Cryptology-ICISC 2013, pages 455-468. Springer.
  22. Verma, R., Shashidhar, N., and Hossain, N. (2012). Detecting phishing emails the natural language way. In ESORICS, pages 824-841.

Paper Citation

in Harvard Style

M. Verma R. and Rai N. (2015). Phish-IDetector: Message-Id Based Automatic Phishing Detection . In Proceedings of the 12th International Conference on Security and Cryptography - Volume 1: SECRYPT, (ICETE 2015) ISBN 978-989-758-117-5, pages 427-434. DOI: 10.5220/0005574304270434

in Bibtex Style

author={Rakesh M. Verma and Nirmala Rai},
title={Phish-IDetector: Message-Id Based Automatic Phishing Detection},
booktitle={Proceedings of the 12th International Conference on Security and Cryptography - Volume 1: SECRYPT, (ICETE 2015)},

in EndNote Style

JO - Proceedings of the 12th International Conference on Security and Cryptography - Volume 1: SECRYPT, (ICETE 2015)
TI - Phish-IDetector: Message-Id Based Automatic Phishing Detection
SN - 978-989-758-117-5
AU - M. Verma R.
AU - Rai N.
PY - 2015
SP - 427
EP - 434
DO - 10.5220/0005574304270434