Data partitioning linearly reduces memory use
with respect to the number of processors. Assuming a
maximum match length of twice the segment length,
data partitioning also reduces processing time linearly
with respect to the number of processors.
4 CONCLUSIONS AND FUTURE
WORK
Regular expressions provide a flexible means for
matching strings and they are often used in data-
intensive applications.
The implementation of regular expression match-
ing typically relies on deterministic finite automata
(DFAs) or nondeterministic finite automata (NFAs).
Both DFAs and NFAs are affected by amnesia and
acalculia, and DFAs also suffer from insomnia.
Techniques exist that solve those problems by
converting sets of regular expressions into compact
state machines. This approach, however, presents two
major drawbacks: it prevents the use of existing au-
tomata that could be already available in antivirus sig-
nature databases or complexdata filters; and it hinders
the maintenance of the resulting automata, since the
whole set has to be converted and compacted again
whenever a regular expression has to be added or re-
moved from the set.
We have proposed Parallel Finite State Machines
(PFSMs), which allow multiple active states and effi-
ciently find all the matches of regular expressions in
an input string, solving amnesia and acalculia. PF-
SMs also mitigate the effect of insomnia by reducing
the number of states in the resulting automaton.
As PFSMs do not require any conversion or com-
paction step, they are able to run on existing DFA-
or NFA-like machines, and they allow the addition or
removal of regular expressions with zero downtime,
making the maintenance of the automaton easier.
PFSMs can perform regular expression matching
in quadratic time and have linear memory space re-
quirements, apart from automata storage, in the worst
case. On top of that, PFSMs support three different
approaches for parallelization with almost linear scal-
ability in the practice.
Therefore, PFSMs make very fast distributed reg-
ular expression matching in data-intensive applica-
tions feasible.
We plan to apply PFSMs to improve the Lamb
lexical analyzer and cure it from its partial acalculia.
We also plan to apply PFSMs to data-intensivepattern
matching applications.
ACKNOWLEDGEMENTS
Work partially supported by research project
TIN2009-08296.
REFERENCES
Aho, A. V. (1990). Algorithms for finding patterns in
strings, volume A, pages 255–300. MIT Press, Cam-
bridge, MA, USA.
Aho, A. V. and Corasick, M. J. (1975). Efficient string
matching: An aid to bibliographic search. Commu-
nications of the ACM, 18(6):333–340.
Becchi, M. and Cadambi, S. (2007). Memory-efficient reg-
ular expression search using state merging. In Proc.
of the IEEE INFOCOM’2007, pages 1064–1072.
Ficara, D., Giordano, S., Procissi, G., Vitucci, F., Antichi,
G., and Pietro, A. D. (2008). An improved DFA for
fast regular expression matching. ACM SIGCOMM
Computer Communication Review, 38(5):31–40.
G´omez, L. I. and Vaisman, A. A. (2008). RE-SPaM: Us-
ing regular expressions for sequential pattern min-
ing in trajectory databases. In Proc. of the IEEE
ICDMW’2008, pages 395–398.
Han, J., Kamber, M., and Pei, J. (2006). Data Mining: Con-
cepts and Techniques. Morgan Kaufmann, 2nd edi-
tion.
Hopcroft, J. E., Motwani, R., and Ullman, J. D. (2007). In-
troduction to Automata Theory, Languages, and Com-
putation. Pearson Education, 3rd edition.
Hosoya, H. and Pierce, B. (2007). Regular expression
pattern matching for XML. In Proc. of the ACM
SIGPLAN-SIGACT POPL’2007, volume 36 of ACM
SIGPLAN Notices, pages 67–80.
Kumar, S., Chandrasekaran, B., Turner, J., and Varghese,
G. (2007). Curing regular expressiong matching algo-
rithms from insomnia, amnesia and acalculia. In Proc.
of the ACM/IEEE ANCS’2007, pages 155–164.
Kumar, S., Turner, J., and Williams, J. (2006). Advanced al-
gorithms for fast and scalable deep packet inspection.
In Proc. of the ACM/IEEE ANCS’2006, pages 81–92.
Lee, T.-H. (2007). Generalized Aho-Corasick algorithm for
signature based anti-virus applications. In Proc. of the
ICCCN’2007, pages 792–797.
Levine, J. R., Mason, T., and Brown, D. (1992). lex & yacc.
O’Reilly, 2nd edition.
Pasetto, D., Petrini, F., and Agarwal, V. (2010). Tools
for very fast regular expression matching. Computer,
43(3):50–58.
Quesada, L., Berzal, F., and Cortijo, F. J. (2011). Lamb: A
lexical analyzer with ambiguity support. In Proc. of
the ICSOFT’2011, volume 1, pages 297–300.
Sipser, M. (2005). Introduction to the Theory of Computa-
tion. Course Technology, 2nd edition.
Sun, Y., Liu, H., Valgenti, V. C., and Kim, M. S. (2010).
Hybrid regular expression matching for deep packet
inspection on multi-core architecture. In Proc. of the
ICCCN’2010, pages 1–7.
ICSOFT2012-7thInternationalConferenceonSoftwareParadigmTrends
110