pattern matching algorithms when executed in par-
allel on a hybrid distributed and shared memory ar-
chitecture. The algorithms were used to locate all
the appearances of any pattern from a finite pattern
set on four biological databases; the genome of Es-
cherichia coli from the Large Canterbury Corpus, the
SWISS-PROT Amino Acid sequence database and
the FASTA Amino Acid (FAA) and FASTA Nucleidic
Acid (FNA)sequences of the A-thaliana genome. The
pattern set used consisted of 100.000 patterns where
each pattern had a length of m = 8 characters.
It was concluded that the parallelization rate of
most multiple pattern matching algorithms depends
on the type of sequence database used. The par-
allel implementation of the algorithms had the best
speedup when used on the E.coli and the worst on
the FAA sequence database. It was also shown that
the Wu-Manber algorithm was up to 19.2 times faster
than its sequential implementation, the Commentz-
Walter was up to 14.5 times faster while the Salmela-
Tarhio-Kyt¨ojoki family of multiple pattern matching
algorithms had a speedup of up to 15.3 times.
The work presented in this chapter could be ex-
tended with a more accurate performance prediction
model as well as with experiments that use additional
parameters like patterns of varying length and larger
pattern sets. Since biological databases and sets of
multiple patterns are usually inherently parallel in na-
ture, future research could focus on the performance
evaluation of the presented algorithms when parallel
processed on modern parallel architectures such as
Graphics Processor Units.
REFERENCES
Aho, A. and Corasick, M. (1975). Efficient string matching:
an aid to bibliographic search. Communications of the
ACM, 18(6):333–340.
Ayguad´e, E., Blainey, B., Duran, A., Labarta, J., Mart´ınez,
F., Martorell, X., and Silvera, R. (2003). Is the sched-
ule clause really necessary in openmp? In Interna-
tional workshop on OpenMP applications and tools,
volume 2716, pages 147–159.
Boukerche, A., de Melo, A. C. M. A., Ayala-Rinc´on, M.,
and Walter, M. E. M. T. (2007). Parallel strategies for
the local biological sequence alignment in a cluster of
workstations. J. Parallel Distrib. Comput., 67:170–
185.
Chaichoompu, K., Kittitornkun, S., and Tongsima, S.
(2006). MT-clustalW: multithreading multiple se-
quence alignment. In IPDPS.
Commentz-Walter, B. (1979). A string matching algorithm
fast on the average. Proceedings of the 6th Collo-
quium, on Automata, Languages and Programming,
pages 118–132.
Cuvillo, J., Tian, X., Gao, G., and Girkar, M. (2003). Per-
formance study of a whole genome comparison tool
on a hyper-threading multiprocessor. In ISHPC, pages
450–457.
Jacob, A. C., Sanyal, S., Paprzycki, M., Arora, R., and
Ganzha, M. (2007). Whole genome comparison on a
network of workstations. In ISPDC’07, pages 31–36.
Kouzinopoulos, C. and Margaritis, K. (2009). Parallel im-
plementation of exact two dimensional pattern match-
ing algorithms using MPI and OpenMP. In 9th Hel-
lenic European Research on Computer Mathematics
and its Applications Conference.
Kouzinopoulos, C. and Margaritis, K. (2010). Experimental
Results on Algorithms for Multiple Keyword Match-
ing. In IADIS International Conference on Informat-
ics, pages 274–277.
Kouzinopoulos, C., Michailidis, P., and Margaritis, K.
(2011). Parallel Processing of Multiple Pat-
tern Matching Algorithms for Biological Sequences:
Methods and Performance Results. InTech.
Li, K.-B. (2003). ClustalW-MPI: ClustalW analysis using
distributed and parallel computing. Bioinformatics,
19(12):1585–1586.
Li, Y. and Chen, C.-K. (2005). Parallelization of multiple
genome alignment. In HPCC’05, pages 910–915.
Liao, C. and Chapman, B. (2007). Invited paper: A
compile-time cost model for openmp. In Proceedings
of the 21st International Parallel and Distributed Pro-
cessing Symposium.
Navarro, G. and Raffinot, M. (2002). Flexible pattern
matching in strings: practical on-line search algo-
rithms for texts and biological sequences. Cambridge
University Press.
Rashid, N. A., Abdullah, R., and Talib, A. Z. H. (2007). Par-
allel homologous search with hirschberg algorithm: a
hybrid mpi-pthreads solution. In Proceedings of the
11th WSEAS International Conference on Comput-
ers, pages 228–233, Stevens Point, Wisconsin, USA.
World Scientific and Engineering Academy and Soci-
ety (WSEAS).
Salmela, L., Tarhio, J., and Kyt¨ojoki, J. (2006). Multipattern
string matching with q -grams. Journal of Experimen-
tal Algorithmics, 11:1–19.
Watson, B. (1995). Taxonomies and toolkits of regular lan-
guage algorithms. PhD thesis, Eindhoven University
of Technology.
Wu, S. and Manber, U. (1994). A fast algorithm for multi-
pattern searching. pages 1–11. Technical report TR-
94-17.
Zomaya, A. (2006). Parallel Computing for Bioinformatics
and Computational Biology: Models, Enabling Tech-
nologies, and Case Studies. Wiley.
PERFORMANCE STUDY OF PARALLEL HYBRID MULTIPLE PATTERN MATCHING ALGORITHMS FOR
BIOLOGICAL SEQUENCES
187