A SUBSPACE METHOD FOR THE DETECTION OF TRANSCRIPTION FACTOR BINDING SITES

Erola Pairo, Santiago Marco, Alexandre Perera

Abstract

Transcription Factor binding sites are short and degenerate sequences, located mostly at the promoter of the gene, where some proteins bind in order to regulate transcription. Locating these sequences is an important issue, and many experimental and computational methods have been developed. Algorithms to search binding sites are usually based on Position Specific Scoring Matrices (PSSM), where each position is treated independently. Mapping symbolical DNA to numerical sequences, a detector has been built with a Principal Component Analysis of the numerical sequences, taking into account covariances between positions. When a treatment of missing values is incorporated the Q-residuals detector, based on PCA, performs better than a PSSM algorithm. The performance on the detector depends on the estimation of missing values and the percentage of missing values considered in the model.

References

  1. Anastassiou, D. (2001). Genomic signal processing. Signal Processing Magazine, IEEE, 18(4):8-20.
  2. Bailey, T. and Elkan, C. (2006). Meme:discovering and analizing dna and protein sequence motifs. Nucleic acids research, 34:W369-W373.
  3. Bailey, T. and Gribskov, M. (1998). Combining evidence using p-values: Application to sequence homology searches. Bioinformatics, 14:48-54.
  4. Baker, W., van den Broek, A., Camon, E., Hingamp, P., Sterk, P., Stoesser, G., and Tuli, M. A. (2000). The EMBL Nucleotide Sequence Database. Nucl. Acids Res., 28(1):19-23.
  5. Bembom, O., Kelez, S., and van der Laan, M. J. (2007). Supervised Detection of Conserved Motifs in DNA Sequences with Cosmo. Statistical Applications in Genetics and Molecular Biology, 6:article 8.
  6. Bishop, C. (1999). Variational principal components. In Artificial Neural Networks, 1999. ICANN 99. Ninth International Conference on (Conf. Publ. No. 470), volume 1, pages 509-514 vol.1.
  7. Bulyk, M. L., Johnson, P. L. F., and Church, G. M. (2002). Nucleotides of transcription factor binding sites exert interdependent effects on the binding affinities of transcription factors. Nucl. Acids Res., 30(5):1255-1261.
  8. Edgar, R. (2004). Muscle: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res, 32(5):1792-1797.
  9. Elnitski, L., Jin, V. X., Farnham, P. J., and Jones, S. J. (2006). Locating mammalian transcription factor binding sites: A survey of computational and experimental techniques. Genome Research, 16(12):1455- 1464.
  10. Kel, A., Gossling, E., Reuter, I., Cheremushkin, E., Kel-Margoulis, O., and Wingender, E. (2003). MATCHTM: a tool for searching transcription factor binding sites in DNA sequences. Nucl. Acids Res., 31(13):3576-3579.
  11. Oba, S., Sato, M.-a., Takemasa, I., Monden, M., Matsubara, K.-i., and Ishii, S. (2003). A Bayesian missing value estimation method for gene expression profile data. Bioinformatics, 19(16):2088-2096.
  12. Sandelin, A., Alkema, W., Engstrom, P., Wasserman, W. W., and Lenhard, B. (2004). JASPAR: an open-access database for eukaryotic transcription factor binding profiles. Nucl. Acids Res., 32(suppl 1):D91-94.
  13. Schmid, C. D., Perier, R., Praz, V., and Bucher, P. (2006). EPD in its twentieth year: towards complete promoter coverage of selected model organisms. Nucl. Acids Res., 34:D82-85.
  14. Silverman, B. and Linske, R. (1986). A measure of dna periodicity. Journal of Theoretical Biology, 118:295- 300.
  15. Stacklies, W., Redestig, H., Scholz, M., Walther, D., and Selbig, J. (2007). pcaMethods a bioconductor package providing PCA methods for incomplete data. Bioinformatics, 23(9):1164-1167.
  16. Stormo, G. (2000). Dna binding sites: Representation and discovery. Bioinformatics, 16:16-23.
  17. Thompson, J. D., Higgins, D. G., and Gibson, T. J. (1994). CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucl. Acids Res., 22(22):4673-4680.
  18. Wingender, E., Chen, X., Hehl, R., Karas, H., Liebich, I., Matys, V., Meinhardt, T., Prubeta, M., Reuter, I., and Schacherer, F. (2000). TRANSFAC: an integrated system for gene expression regulation. Nucl. Acids Res., 28(1):316-319.
Download


Paper Citation


in Harvard Style

Pairo E., Marco S. and Perera A. (2010). A SUBSPACE METHOD FOR THE DETECTION OF TRANSCRIPTION FACTOR BINDING SITES . In Proceedings of the First International Conference on Bioinformatics - Volume 1: BIOINFORMATICS, (BIOSTEC 2010) ISBN 978-989-674-019-1, pages 102-107. DOI: 10.5220/0002697301020107


in Bibtex Style

@conference{bioinformatics10,
author={Erola Pairo and Santiago Marco and Alexandre Perera},
title={A SUBSPACE METHOD FOR THE DETECTION OF TRANSCRIPTION FACTOR BINDING SITES},
booktitle={Proceedings of the First International Conference on Bioinformatics - Volume 1: BIOINFORMATICS, (BIOSTEC 2010)},
year={2010},
pages={102-107},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002697301020107},
isbn={978-989-674-019-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the First International Conference on Bioinformatics - Volume 1: BIOINFORMATICS, (BIOSTEC 2010)
TI - A SUBSPACE METHOD FOR THE DETECTION OF TRANSCRIPTION FACTOR BINDING SITES
SN - 978-989-674-019-1
AU - Pairo E.
AU - Marco S.
AU - Perera A.
PY - 2010
SP - 102
EP - 107
DO - 10.5220/0002697301020107