Going a Step Beyond the Black and White Lists for URL Accesses in the Enterprise by Means of Categorical Classifiers

A. M. Mora, P. De las Cuevas, J. J. Merelo

2014

Abstract

Corporate systems can be secured using an enormous quantity of methods, and the implementation of Black or White lists is among them. With these lists it is possible to restrict (or to allow) the users the execution of applications or the access to certain URLs, among others. This paper is focused in the latter option. It describes the whole processing of a set of data composed by URL sessions performed by the employees of a company; from the preprocessing stage, including labelling and data balancing processes, to the application of several classification algorithms. The aim is to define a method for automatically make a decision of allowing or denying future URL requests, considering a set of corporate security policies. Thus, this work goes a step beyond the usual black and white lists, since they can only control those URLs that are specifically included in them, but not by making decisions based in similarity (through classification techniques), or even in other variables of the session, as it is proposed here. The results show a set of classification methods which get very good classification percentages (95-97%), and which infer some useful rules based in additional features (rather that just the URL string) related to the user's access. This led us to consider that this kind of tool would be very useful tool for an enterprise.

References

  1. Alfaro-Cid, E., Sharman, K., and Esparcia-Alczar, A. (2007). A genetic programming approach for bankruptcy prediction using a highly unbalanced database. In Giacobini, M., editor, Applications of Evolutionary Computing, volume 4448 of Lecture Notes in Computer Science, pages 169-178. Springer Berlin Heidelberg.
  2. Anderson, A. J. P. (1980). Computer security threat monitoring and surveillance. Technical report, James P. Anderson Co., Fort Washington, PA.
  3. Blanco, L., Dalvi, N., and Machanavajjhala, A. (2011). Highly efficient algorithms for structural clustering of large websites. In WWW 7811 Proceedings of the 20th international conference on World wide web., pages 437-446. ACM.
  4. Breiman, L. (2001). Random forests. Machine Learning, 45(1):5-32.
  5. Chawla, N. (2005). Data mining for imbalanced datasets: An overview. In Maimon, O. and Rokach, L., editors, Data Mining and Knowledge Discovery Handbook, pages 853-867. Springer US.
  6. Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer, W. P. (2002). Smote: Synthetic minority oversampling technique. J. Artif. Int. Res., 16(1):321-357.
  7. Chen, H., Chung, W., Qin, Y., Chau, M., Xu, J. J., Wang, G., Zheng, R., and Atabakhsh, H. (2003). Crime data mining: An overview and case studies. In Proceedings of the 3rd National Conference for Digital Government Research (dg.o 2003), volume 130, pages 1-5. Digital Government Society of North America.
  8. Clifton, C. and Marks, D. (1996). Security and privacy implications of data mining. In ACM SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery, pages 15-19.
  9. Danezis, G. (2009). Inferring privacy policies for social networking services. In Proceedings of the 2Nd ACM Workshop on Security and Artificial Intelligence, AISec 7809, pages 5-10, New York, NY, USA. ACM.
  10. de Vel, O., Anderson, A., Corney, M., and Mohay, G. (2001). Mining e-mail content for author identification forensics. SIGMOD Record, 30(4):55-64.
  11. Domingos, P. and Pazzani, M. (1997). On the optimality of the simple bayesian classifier under zero-one loss. Machine Learning, 29:103-137.
  12. Elomaa, T. and Kaariainen, M. (2001). An analysis of reduced error pruning. Artificial Intelligence Research, 15(-):163-187.
  13. Frank, E. and Witten, I. H. (1998). Generating accurate rule sets without global optimization. In Shavlik, J., editor, Fifteenth International Conference on Machine Learning, pages 144-151. Morgan Kaufmann.
  14. Frank, E. and Witten, I. H. (2011). Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann Publishers, third edition.
  15. Greenstadt, R. and Beal, J. (2008). Cognitive security for personal devices. In Proceedings of the 1st ACM Workshop on Workshop on AISec, AISec 7808, pages 27-30, New York, NY, USA. ACM.
  16. Guo, X., Yin, Y., Dong, C., Yang, G., and Zhou, G. (2008). On the class imbalance problem. In Natural Computation, 2008. ICNC 7808. Fourth International Conference on, volume 4, pages 192-201.
  17. Japkowicz, N. and Stephen, S. (2002). The class imbalance problem: A systematic study. Intell. Data Anal., 6(5):429-449.
  18. Kelley, P. G., Hankes Drielsma, P., Sadeh, N., and Cranor, L. F. (2008). User-controllable learning of security and privacy policies. In Proceedings of the 1st ACM Workshop on Workshop on AISec, AISec 7808, pages 11-18, New York, NY, USA. ACM.
  19. Khonji, M., Jones, A., and Iraqi, Y. (2011). A novel phishing classification based on url features. In GCC Conference and Exhibition (GCC), pages 221-224. IEE.
  20. Lim, Y. T., Cheng, P. C., Clark, J., and Rohatgi, P. (2008a). Policy evolution with genetic programming: A comparison of three approaches. In Evolutionary Computation, 2008. CEC 2008. (IEEE World Congress on Computational Intelligence). IEEE Congress on, pages 1792-1800.
  21. Lim, Y. T., Cheng, P. C., Rohatgi, P., and Clark, J. A. (2008b). Mls security policy evolution with genetic programming. In Proceedings of the 10th Annual Conference on Genetic and Evolutionary Computation, GECCO 7808, pages 1571-1578, New York, NY, USA. ACM.
  22. MacQueen, J. et al. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, number 14, pages 281-297. California, USA.
  23. Martin, B. (1995). Instance-based learning: Nearest neighbor with generalization. Master's thesis, University of Waikato, Hamilton, New Zealand.
  24. Mora, A., De las Cuevas, P., Merelo, J., Zamarripa, S., Juan, M., Esparcia-Alczar, A., Burvall, M., Arfwedson, H., and Hodaie, Z. (2014). MUSES: A corporate user-centric system which applies computational intelligence methods. In et al., D. S., editor, 29th Symposium On Applied Computing, pages 1719-1723.
  25. Oppliger, R. (2011). Security and privacy in an online world. IEEE Computer, 44(9):21-22.
  26. Quinlan, J. R. (1987). Simplifying decision trees. ManMachine Studies, 27(3):221-234.
  27. Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo, CA.
  28. Seigneur, J.-M., K ölndorfer, P., Busch, M., and Hochleitner, C. (2013). A Survey of Trust and Risk Metrics for a BYOD Mobile Working World. In Third International Conference on Social Eco-Informatics.
  29. Suarez-Tangil, G., Palomar, E., Fuentes, J., Blasco, J., and Ribagorda, A. (2009). Automatic rule generation based on genetic programming for event correlation. In Herrero, l., Gastaldo, P., Zunino, R., and Corchado, E., editors, Computational Intelligence in Security for Information Systems, volume 63 of Advances in Intelligent and Soft Computing, pages 127-134. Springer Berlin Heidelberg.
  30. Team, S. (2013a). Squid website.
  31. Team, T. J. D. (2013b). Drools documentation. version 6.0.1.final.
  32. Team, T. J. D. (2013c). Drools website.
  33. Wessels, D. (2004). Squid: The Definitive Guide. O'Reilly Media, Inc., 1 edition.
Download


Paper Citation


in Harvard Style

Mora A., Cuevas P. and Merelo J. (2014). Going a Step Beyond the Black and White Lists for URL Accesses in the Enterprise by Means of Categorical Classifiers . In Proceedings of the International Conference on Evolutionary Computation Theory and Applications - Volume 1: ECTA, (IJCCI 2014) ISBN 978-989-758-052-9, pages 125-134. DOI: 10.5220/0005170601250134


in Bibtex Style

@conference{ecta14,
author={A. M. Mora and P. De las Cuevas and J. J. Merelo},
title={Going a Step Beyond the Black and White Lists for URL Accesses in the Enterprise by Means of Categorical Classifiers},
booktitle={Proceedings of the International Conference on Evolutionary Computation Theory and Applications - Volume 1: ECTA, (IJCCI 2014)},
year={2014},
pages={125-134},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005170601250134},
isbn={978-989-758-052-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Evolutionary Computation Theory and Applications - Volume 1: ECTA, (IJCCI 2014)
TI - Going a Step Beyond the Black and White Lists for URL Accesses in the Enterprise by Means of Categorical Classifiers
SN - 978-989-758-052-9
AU - Mora A.
AU - Cuevas P.
AU - Merelo J.
PY - 2014
SP - 125
EP - 134
DO - 10.5220/0005170601250134