5 CONCLUSIONS
We have developed a new algorithm
CORECLUST based on generalized hidden
Markov models, which is able to successfully
predict regulatory modules in eukaryotic genomes
for a set of PWMs starting from a set of co-
regulated and/or orthologous genes. CORECLUST
utilizes the cross-species conservation without
relying on multiple alignment, which can be useful
for analysis of poorly alignable intergenic regions.
The main disadvantage of the algorithm is the
limitation of number of used PWMs, as it causes
the increase of the HMM parameters, which can
result in model overfitting. The future work aims
to overcome this limitation by reducing the number
of training parameters to only significant ones.
Nevertheless, CORECLUST demonstrates better
performance than other methods. The main
biological advantage of the method is that it
reveals regulatory regions structure, which could
help in better understanding of the transcriptional
regulation process.
ACKNOWLEDGEMENTS
We are grateful to Mikhail Gelfand and Dmitri
Pervouchine for useful discussions and
encouragement, and to Dmitry Vinogradov for
technical assistance.
REFERENCES
Moore, R., Lopes, J., 1999. Paper templates. In
TEMPLATE’06, 1st International Conference on
Template Production. SciTePress.
Smith, J., 1998. The book, The publishing company.
London, 2
nd
edition.
Aerts, S., Van Loo, P., Thijs, G., Moreau, Y., De Moor,
B., 2003. Computational detection of cis -regulatory
modules. In Bioinformatics, 19 Suppl 2.
Bailey, T. L., Noble, W. S., 2003. Searching for
statistically significant regulatory modules. In
Bioinformatics, 19 Suppl 2.
Baum L., 1972. An equality and associated
maximization technique in statistical estimation for
probabilistic functions of Markov processes. In
Inequalities, 3.
Fariselli, P., Martelli, P. L., Casadio, R., 2005. A new
decoding algorithm for hidden Markov models
improves the prediction of the topology of all-beta
membrane proteins. In BMC Bioinformatics.
Frith, M. C., Hansen, U., Weng, Z., 2001. Detection of
cis-element clusters in higher eukaryotic DNA. In
Bioinformatics, 6 Suppl 4.
Frith, M. C., Hansen, U., Weng, Z., 2001. Detection of
cis-element clusters in higher eukaryotic DNA. In
Bioinformatics,17, no. 10.
Frith, M. C., Li, M. C., Weng, Z., 2003. Cluster-Buster:
finding dense clusters of motifs in DNA sequences.
In Nucleic Acids Research, 31, no. 13.
Halfon, M. S., Gallo, S. M., Bergman, C. M., 2008.
REDfly 2.0: an integrated database of cis-regulatory
modules and transcription factor binding sites in
Drosophila. In Nucleic Acids Research, 36.
Hallikas, O., Palin, K., Sinjushina, N., Rautiainen, R.,
Partanen, J., Ukkonen, E., Taipale, J., 2006.
Genome-wide Prediction of Mammalian Enhancers
Based on Analysis of Transcription-Factor Binding
Affinity. In Cell, 124(1).
Hu, J., Hu, H., Li, X., 2008. MOPAT: a graph-based
method to predict recurrent cis-regulatory modules
from known motifs. In Nucleic Acids Research,
36(13).
Johansson, O., Alkema, W., Wasserman, W. W.,
Lagergren, J., 2003. Identification of functional
clusters of transcription factor binding motifs in
genome sequences: the MSCAN algorithm. In
Bioinformatics, 19 Suppl 1.
Kel, A., Konovalova, T., Waleev, T., Cheremushkin, E.,
Kel-Margoulis, O., Wingender, E., 2006. Composite
Module Analyst: a fitness-based tool for
identification of transcription factor binding site
combinations. In Bioinformatics, 22(10).
Klepper, K., Sandve, G. K., Abul, O., Johansen, J.,
Drablos, F., 2008. Assessment of composite motif
discovery methods. In BMC Bioinformatics, 9.
Lebrecht, D., Foehr, M., Smith, E., Lopes, F. J. P.,
Vanario-Alonso, C. E., Reinitz, J., Burz, D. S., et al.,
2005. Bicoid cooperative DNA binding is critical for
embryonic patterning in Drosophila. In
Proceedings
of the National Academy of Sciences of the United
States of America, 102(37).
Maeda, T., Gupta, M. P., Stewart, A. F. R., 2002. TEF-1
and MEF2 transcription factors interact to regulate
muscle-specific promoters. In Biochemical and
Biophysical Research Communications, 294(4).
Makeev, V. J., Lifanov, A. P., Nazina, A. G.,
Papatsenko, D. A., 2003. Distance preferences in the
arrangement of binding motifs and hierarchical
levels in organization of transcription regulatory
information. In Nucleic Acids Research, 31(20).
Matys, V., Kel-Margoulis, O. V., Fricke, E., Liebich, I.,
Land, S., Barre-Dirrie, A., Reuter, I., et al., 2006.
TRANSFAC and its module TRANSCompel:
transcriptional gene regulation in eukaryotes. In
Nucleic Acids Research, 34.
Papatsenko, D., Goltsev, Y., Levine, M., 2009.
Organization of developmental enhancers in the
Drosophila embryo. In Nucleic Acids Research, 37,
no. 17.
Rabiner, L. R., 1989. A tutorial on hidden markov
models and selected applications in speech
BIOINFORMATICS 2012 - International Conference on Bioinformatics Models, Methods and Algorithms
40