genome scans, such as our search for the conglomer-
ations of variable motifs, with a potential of reducing
days of computation to just a few hours. This can
be of particular importance for the tools implemented
as a part of a web server. Our earlier version of the
matcher (Singh and Stojanovic, 2009) implemented at
http://bioinformatics.uta.edu/toolkit/motifs/ used di-
rect pattern matching (i.e. not based on PWMs), and
the development of this algorithm has allowed us to
consider the matrix–based approach, too.
ACKNOWLEDGEMENTS
The author is grateful to Abanish Singh, whose ef-
fort on motif finding and the implementation of the
whole–genome motif search has made us aware of the
need for this algorithm. This work has been partially
supported by NIH grant 5R03LM009033-02.
REFERENCES
Aho, A. and Corasick, M. (1975). Efficient string matching:
an aid to bibliographic search. Comm. Assoc. Com-
put. Mach., 18:333–340.
Apostolico, A., Bock, M., Lonardi, S., and Xu, X. (2000).
Efficient detection of unusual words. J. Comput. Biol.,
7:71–94.
Bryne, J., Valen, E., Tang, M., Marstrand, T., Winther, O.,
da Piedade, I., Krogh, A., Lenhard, B., and Sandelin,
A. (2008). JASPAR, the open access database of
transcription factor–binding profiles: new content
and tools in the 2008 update. Nucleic Acids Res.,
36:D102–D106.
Gershenzon, N. I., Stormo, G. D., and Ioshikhes, I. P.
(2005). Computational technique for improvement
of the position–weight matrices for the DNA/protein
binding sites. Nucleic Acids Res., 33:2290–2301.
Hannenhalli, S. and Wang, L.-S. (2005). Enhanced position
weight matrices using mixture models. Bioinformat-
ics, 21:i204–i212.
Hughes, J., Estep, P., Tavazoie, S., and Church, G. (2000).
Computational identification of cis–regulatory ele-
ments associated with groups of functionally related
genes in Saccharomyces cerevisiae. J. Mol. Biol.,
296:1205–1214.
Kel, A. E., G
¨
ossling, E., Reuter, I., Cheremushkin, E., Kel-
Margoulis, O. V., and Wingender, E. (2003). Match:
A tool for searching transcription factor binding sites
in dna sequences. Nucleic Acids Res., 31(13):3576–
3579.
Khambata-Ford, S., Liu, Y., Gleason, C., Dickson, M., Alt-
man, R., Batzoglou, S., and Myers, R. (2003). Iden-
tification of promoter regions in the human genome
by using a retroviral plasmid library–based functional
reporter gene assay. Genome Res., 13:1765–1774.
Knuth, D., Morris, J., and Pratt, V. (1977). Fast pattern
matching in strings. SIAM J. Computing, 6:323–350.
Liefooghe, A., Touzet, H., and Varr, J.-S. (2006). Large
Scale Matching for Position Weight Matrices. In Pro-
ceedings of the 7
th
Annual Symposium on Combina-
torial Pattern Matching, CPM 2006, volume 4009 of
LNCS, pages 401–412. Springer–Verlag.
Nelson, C., Hersh, B., and Carroll, S. B. (2004). The reg-
ulatory content of intergenic DNA shapes genome ar-
chitecture. Genome Biol., 5:R25.
Qin, Z., McCue, L., Thompson, W., Mayerhofer, L.,
Lawrence, C., and Liu, J. (2003). Identification of co-
regulated genes through Bayesian clustering of pre-
dicted regulatory binding sites. Nature Biotechnology,
21:435–439.
Singh, A. and Stojanovic, N. (2006). An efficient algorithm
for the identification of repetitive variable motifs in
the regulatory sequences of co–expressed genes. In
Proceedings of the 21
st
International Symposium on
Computer and Information Sciences, volume 4263 of
LNCS, pages 182–191. Springer–Verlag.
Singh, A. and Stojanovic, N. (2009). Genome–wide search
for putative transcriptional modules in eukaryotic se-
quences. In Proceedings of BIOCOMP’09, pages
848–854.
Stojanovic, N. (2009). A study on the distribution of phylo-
genetically conserved blocks within clusters of mam-
malian homeobox genes. Genetics and Molecular Bi-
ology, 32:666–673.
Stormo, G. (1990). Consensus patterns in DNA. Methods
Enzym., 183:211–221.
The ENCODE Project Consortium (2007). The ENCODE
pilot project: Identification and analysis of func-
tional elements in 1% of the human genome. Nature,
447:799–816.
van Helden, J. (2004). Metrics for comparing regulatory se-
quences on the basis of pattern counts. Bioinformatics,
20:399–406.
Wingender, E. (2008). The TRANSFAC project as an exam-
ple of framework technology that supports the analy-
sis of genomic regulation. Briefings in Bioinformatics,
9:326–332.
Young, J. E., Vogt, T., Gross, K. W., and Khani, S. C.
(2003). A short, highly active photoreceptor–specific
enhancer/promoter region upstream of the human
rhodopsin kinase gene. Investigative Ophtamology
and Visual Science, 44:4076–4085.
LINEAR--TIME MATCHING OF POSITION WEIGHT MATRICES
73