Practical Multi-pattern Matching Approach for Fast and Scalable Log Abstraction

Daniel Tovarňák

Abstract

Log abstraction, i.e. the separation of static and dynamic part of log message, is becoming an indispensable task when processing logs generated by large enterprise systems and networks. In practice, the log message types are described via regex matching patterns that are in turn used to actually facilitate the abstraction process. Although the area of multi-regex matching is well studied, there is a lack of suitable practical implementations available for common programming languages. In this paper we present an alternative approach to multi-pattern matching for the purposes of log abstraction that is based on a trie-like data structure we refer to as regex trie. REtrie is easy to implement and the real world experiments show its scalability and good performance even for thousands of matching patterns.

References

  1. Aho, A. V. and Corasick, M. J. (1975). Efficient string matching: An aid to bibliographic search. Communications of the ACM, 18(6):333-340.
  2. Azodi, A., Jaeger, D., Cheng, F., and Meinel, C. (2013a). A new approach to building a multi-tier direct access knowledgebase for IDS/SIEM systems. Proceedings - 2013 IEEE 11th International Conference on Dependable, Autonomic and Secure Computing, DASC 2013.
  3. Azodi, A., Jaeger, D., Cheng, F., and Meinel, C. (2013b). Pushing the limits in event normalisation to improve attack detection in IDS/SIEM systems. Proceedings - 2013 International Conference on Advanced Cloud and Big Data, CBD 2013, pages 69-76.
  4. Chuvakin, A., Schmidt, K., and Phillips, C. (2012). Logging and log management: the authoritative guide to understanding the concepts surrounding logging and log management. Newnes.
  5. Cox, R. (2007). Regular expression matching can be simple and fast (but is slow in java, perl, php, python, ruby). URL: http://swtch.com/˜rsc/regexp/regexp1.
  6. Cox, R. (2010). Regular expression matching in the wild. URL: http://swtch.com/˜ rsc/regexp/regexp3.htm.l
  7. Heinz, S., Zobel, J., and Williams, H. E. (2002). Burst tries: a fast, efficient data structure for string keys. ACM Transactions on Information Systems, 20(2):192-223.
  8. Hopcroft J.E. Motwani R.Ullman J.D. (2001). Introduction to automata theory, languages, and computation-.
  9. Jaeger, D., Azodi, A., Cheng, F., and Meinel, C. (2015). Normalizing Security Events with a Hierarchical Knowledge Base, volume 9311 of Lecture Notes in Computer Science. Springer International Publishing, Cham.
  10. Leis, V., Kemper, A., and Neumann, T. (2013). The adaptive radix tree: ARTful indexing for main-memory databases. In 2013 IEEE 29th International Conference on Data Engineering (ICDE), pages 38-49. IEEE.
  11. Lucchesi, C. L. and Kowaltowski, T. (1993). Applications of finite automata representing large vocabularies. Softw. Pract. Exper., 23(1):15-30.
  12. Makanju, A., Zincir-Heywood, A. N., and Milios, E. E. (2012). A lightweight algorithm for message type extraction in system application logs. IEEE Transactions on Knowledge and Data Engineering, 24(11).
  13. Morrison, D. R. (1968). PATRICIA-Practical Algorithm To Retrieve Information Coded in Alphanumeric. Journal of the ACM, 15(4):514-534.
  14. Nagappan, M. and Vouk, M. a. (2010). Abstracting log lines to log event types for mining software system logs. 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010), pages 114-117.
  15. Wang, K., Fu, Z., Hu, X., and Li, J. (2014). Practical regular expression matching free of scalability and performance barriers. Computer Communications, 54.
  16. 1: function ABSTRACTMESSAGE(message, trie)
  17. 2: result ? SEARCH(message, trie, [])
  18. 3: if result 6= Ø then
  19. 4: ((pat_name, tok_names), captures) ? result
  20. 5: return (pat_name, mapFromList(zip(tok_names, captures)) else
Download


Paper Citation


in Harvard Style

Tovarňák D. (2016). Practical Multi-pattern Matching Approach for Fast and Scalable Log Abstraction . In Proceedings of the 11th International Joint Conference on Software Technologies - Volume 1: ICSOFT-EA, (ICSOFT 2016) ISBN 978-989-758-194-6, pages 319-329. DOI: 10.5220/0006006603190329


in Bibtex Style

@conference{icsoft-ea16,
author={Daniel Tovarňák},
title={Practical Multi-pattern Matching Approach for Fast and Scalable Log Abstraction},
booktitle={Proceedings of the 11th International Joint Conference on Software Technologies - Volume 1: ICSOFT-EA, (ICSOFT 2016)},
year={2016},
pages={319-329},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006006603190329},
isbn={978-989-758-194-6},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 11th International Joint Conference on Software Technologies - Volume 1: ICSOFT-EA, (ICSOFT 2016)
TI - Practical Multi-pattern Matching Approach for Fast and Scalable Log Abstraction
SN - 978-989-758-194-6
AU - Tovarňák D.
PY - 2016
SP - 319
EP - 329
DO - 10.5220/0006006603190329