Parallel Finite State Machines for Very Fast Distributable Regular Expression Matching

Luis Quesada, Fernando Berzal, Francisco J. Cortijo

2012

Abstract

Regular expressions provide a flexible means for matching strings and they are often used in data-intensive applications. They are formally equivalent to either deterministic finite automata (DFAs) or nondeterministic finite automata (NFAs). Both DFAs and NFAs are affected by two problems known as amnesia and acalculia, and DFAs are also affected by a problem known as insomnia. Existing techniques require an automata conversion and compaction step that prevents the use of existing automaton databases and hinders the maintenance of the resulting compact automata. In this paper, we propose Parallel Finite State Machines (PFSMs), which are able to run any DFA- or NFA-like state machines without a previous conversion or compaction step. PFSMs report, online, all the matches found within an input string and they solve the three aforementioned problems. Parallel Finite State Machines require quadratic time and linear memory and they are distributable. Parallel Finite State Machines make very fast distributed regular expression matching in data-intensive applications feasible.

References

  1. Aho, A. V. (1990). Algorithms for finding patterns in strings, volume A, pages 255-300. MIT Press, Cambridge, MA, USA.
  2. Aho, A. V. and Corasick, M. J. (1975). Efficient string matching: An aid to bibliographic search. Communications of the ACM, 18(6):333-340.
  3. Becchi, M. and Cadambi, S. (2007). Memory-efficient regular expression search using state merging. In Proc. of the IEEE INFOCOM'2007, pages 1064-1072.
  4. Ficara, D., Giordano, S., Procissi, G., Vitucci, F., Antichi, G., and Pietro, A. D. (2008). An improved DFA for fast regular expression matching. ACM SIGCOMM Computer Communication Review, 38(5):31-40.
  5. Gómez, L. I. and Vaisman, A. A. (2008). RE-SPaM: Using regular expressions for sequential pattern mining in trajectory databases. In Proc. of the IEEE ICDMW'2008, pages 395-398.
  6. Han, J., Kamber, M., and Pei, J. (2006). Data Mining: Concepts and Techniques. Morgan Kaufmann, 2nd edition.
  7. Hopcroft, J. E., Motwani, R., and Ullman, J. D. (2007). Introduction to Automata Theory, Languages, and Computation. Pearson Education, 3rd edition.
  8. Hosoya, H. and Pierce, B. (2007). Regular expression pattern matching for XML. In Proc. of the ACM SIGPLAN-SIGACT POPL'2007, volume 36 of ACM SIGPLAN Notices, pages 67-80.
  9. Kumar, S., Chandrasekaran, B., Turner, J., and Varghese, G. (2007). Curing regular expressiong matching algorithms from insomnia, amnesia and acalculia. In Proc. of the ACM/IEEE ANCS'2007, pages 155-164.
  10. Kumar, S., Turner, J., and Williams, J. (2006). Advanced algorithms for fast and scalable deep packet inspection. In Proc. of the ACM/IEEE ANCS'2006, pages 81-92.
  11. Lee, T.-H. (2007). Generalized Aho-Corasick algorithm for signature based anti-virus applications. In Proc. of the ICCCN'2007, pages 792-797.
  12. Levine, J. R., Mason, T., and Brown, D. (1992). lex & yacc. O'Reilly, 2nd edition.
  13. Pasetto, D., Petrini, F., and Agarwal, V. (2010). Tools for very fast regular expression matching. Computer, 43(3):50-58.
  14. Quesada, L., Berzal, F., and Cortijo, F. J. (2011). Lamb: A lexical analyzer with ambiguity support. In Proc. of the ICSOFT'2011, volume 1, pages 297-300.
  15. Sipser, M. (2005). Introduction to the Theory of Computation. Course Technology, 2nd edition.
  16. Sun, Y., Liu, H., Valgenti, V. C., and Kim, M. S. (2010). Hybrid regular expression matching for deep packet inspection on multi-core architecture. In Proc. of the ICCCN'2010, pages 1-7.
Download


Paper Citation


in Harvard Style

Quesada L., Berzal F. and J. Cortijo F. (2012). Parallel Finite State Machines for Very Fast Distributable Regular Expression Matching . In Proceedings of the 7th International Conference on Software Paradigm Trends - Volume 1: ICSOFT, ISBN 978-989-8565-19-8, pages 105-110. DOI: 10.5220/0003949901050110


in Bibtex Style

@conference{icsoft12,
author={Luis Quesada and Fernando Berzal and Francisco J. Cortijo},
title={Parallel Finite State Machines for Very Fast Distributable Regular Expression Matching},
booktitle={Proceedings of the 7th International Conference on Software Paradigm Trends - Volume 1: ICSOFT,},
year={2012},
pages={105-110},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003949901050110},
isbn={978-989-8565-19-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 7th International Conference on Software Paradigm Trends - Volume 1: ICSOFT,
TI - Parallel Finite State Machines for Very Fast Distributable Regular Expression Matching
SN - 978-989-8565-19-8
AU - Quesada L.
AU - Berzal F.
AU - J. Cortijo F.
PY - 2012
SP - 105
EP - 110
DO - 10.5220/0003949901050110