PERFECTOS-APE - Predicting Regulatory Functional Effect of SNPs by Approximate P-value Estimation

Ilya E. Vorontsov, Ivan V. Kulakovskiy, Grigory Khimulya, Daria D. Nikolaeva, Vsevolod J. Makeev

2015

Abstract

Single nucleotide polymorphisms (SNPs) and variants (SNVs) are often found in regulatory regions of human genome. Nucleotide substitutions in promoter and enhancer regions may affect transcription factor (TF) binding and alter gene expression regulation. Nowadays binding patterns are known for hundreds of human TFs. Thus one can assess possible functional effects of allele variations or mutations in TF binding sites using sequence analysis. We present PERFECTOS-APE, the software to PrEdict Regulatory Functional Effect of SNPs by Approximate P-value Estimation. Using a predefined collection of position weight matrices (PWMs) representing TF binding patterns, PERFECTOS-APE identifies transcription factors whose binding sites can be significantly affected by given nucleotide substitutions. PERFECTOS-APE supports both classic PWMs under the position independency assumption, and dinucleotide PWMs accounting for the dinucleotide composition and correlations between nucleotides in adjacent positions within binding sites. PERFECTOS-APE uses dynamic programming to calculate PWM score distribution and convert the scores to P-values with an optional binary search mode using a precomputed P-value list to speed-up the computations. Software is written in Java and is freely available as standalone program and online tool: http://opera.autosome.ru/perfectosape/. We have tested our algorithm on several disease associated SNVs as well as on a set of cancer somatic mutations occurring in intronic regions of the human genome.

References

  1. Andersen, M. C., Engström, P. G., Lithwick, S., Arenillas, D., Eriksson, P., Lenhard, B., Wasserman, W. W., and Odeberg, J. (2008). In silico detection of sequence variations modifying transcriptional regulation. PLoS computational biology, 4(1):e5.
  2. Barenboim, M. and Manke, T. (2013). Chromos: an integrated web tool for snp classification, prioritization and functional interpretation. Bioinformatics, 29(17):2197-2198.
  3. Fletcher, O., Johnson, N., Orr, N., Hosking, F. J., Gibson, L. J., Walker, K., Zelenika, D., Gut, I., Heath, S., Palles, C., et al. (2011). Novel breast cancer susceptibility locus at 9q31. 2: results of a genome-wide association study. Journal of the National Cancer Institute.
  4. Heinz, S., Benner, C., Spann, N., Bertolino, E., Lin, Y. C., Laslo, P., Cheng, J. X., Murre, C., Singh, H., and Glass, C. K. (2010). Simple combinations of lineagedetermining transcription factors prime cis-regulatory elements required for macrophage and b cell identities. Molecular cell, 38(4):576-589.
  5. Jolma, A., Kivioja, T., Toivonen, J., Cheng, L., Wei, G., Enge, M., Taipale, M., Vaquerizas, J. M., Yan, J., Sillanpää, M. J., et al. (2010). Multiplexed massively parallel selex for characterization of human transcription factor binding specificities. Genome research, 20(6):861-873.
  6. Khurana, E., Fu, Y., Colonna, V., Mu, X. J., Kang, H. M., Lappalainen, T., Sboner, A., Lochovsky, L., Chen, J., Harmanci, A., et al. (2013). Integrative annotation of variants from 1092 humans: Application to cancer genomics. Science, 342(6154):1235587.
  7. Korhonen, J., Martinmäki, P., Pizzi, C., Rastas, P., and Ukkonen, E. (2009). Moods: fast search for position weight matrix matches in dna sequences. Bioinformatics, 25(23):3181-3182.
  8. Kulakovskiy, I., Levitsky, V., Oshchepkov, D., Bryzgalov, L., Vorontsov, I., and Makeev, V. (2013a). From binding motifs in chip-seq data to improved models of transcription factor binding sites. Journal of bioinformatics and computational biology, 11(01).
  9. Kulakovskiy, I. V., Medvedeva, Y. A., Schaefer, U., Kasianov, A. S., Vorontsov, I. E., Bajic, V. B., and Makeev, V. J. (2013b). Hocomoco: a comprehensive collection of human transcription factor binding sites models. Nucleic acids research, 41(D1):D195-D202.
  10. Levitsky, V. G., Kulakovskiy, I. V., Ershov, N. I., Oschepkov, D. Y., Makeev, V. J., Hodgman, T., and Merkulova, T. I. (2014). Application of experimentally verified transcription factor binding sites models for computational analysis of chip-seq data. BMC genomics, 15(1):80.
  11. Long, J., Cai, Q., Shu, X.-O., Qu, S., Li, C., Zheng, Y., Gu, K., Wang, W., Xiang, Y.-B., Cheng, J., et al. (2010). Identification of a functional genetic variant at 16q12. 1 for breast cancer risk: results from the asia breast cancer consortium. PLoS genetics, 6(6):e1001002.
  12. Macintyre, G., Bailey, J., Haviv, I., and Kowalczyk, A. (2010). is-rsnp: a novel technique for in silico regulatory snp detection. Bioinformatics, 26(18):i524-i530.
  13. Manke, T., Heinig, M., and Vingron, M. (2010). Quantifying the effect of sequence variation on regulatory interactions. Human mutation, 31(4):477-483.
  14. Mathelier, A. and Wasserman, W. W. (2013). The next generation of transcription factor binding site prediction. PLoS computational biology, 9(9):e1003214.
  15. Nik-Zainal, S., Alexandrov, L. B., Wedge, D. C., Van Loo, P., Greenman, C. D., Raine, K., Jones, D., Hinton, J., Marshall, J., Stebbings, L. A., et al. (2012). Mutational processes molding the genomes of 21 breast cancers. Cell, 149(5):979-993.
  16. Orr, N., Lemnrau, A., Cooke, R., Fletcher, O., Tomczyk, K., Jones, M., Johnson, N., Lord, C. J., Mitsopoulos, C., Zvelebil, M., et al. (2012). Genome-wide association study identifies a common variant in rad51b associated with male breast cancer risk. Nature genetics, 44(11):1182-1184.
  17. Ostrow, S. L., Barshir, R., DeGregori, J., Yeger-Lotem, E., and Hershberg, R. (2014). Cancer evolution is associated with pervasive positive selection on globally expressed genes. PLoS genetics, 10(3):e1004239.
  18. Pachkov, M., Erb, I., Molina, N., and Van Nimwegen, E. (2007). Swissregulon: a database of genome-wide annotations of regulatory sites. Nucleic acids research, 35(suppl 1):D127-D131.
  19. Ponomarenko, J. V., Merkulova, T. I., Vasiliev, G. V., Levashova, Z. B., Orlova, G. V., Lavryushev, S. V., Fokin, O. N., Ponomarenko, M. P., Frolov, A. S., and Sarai, A. (2001). rsnp guide, a database system for analysis of transcription factor binding to target sequences: application to snps and site-directed mutations. Nucleic acids research, 29(1):312-316.
  20. Portales-Casamar, E., Thongjuea, S., Kwon, A. T., Arenillas, D., Zhao, X., Valen, E., Yusuf, D., Lenhard, B., Wasserman, W. W., and Sandelin, A. (2009). Jaspar 2010: the greatly expanded open-access database of transcription factor binding profiles. Nucleic acids research, page gkp950.
  21. Riva, A. (2012). Large-scale computational identification of regulatory snps with rsnp-mapper. BMC genomics, 13(Suppl 4):S7.
  22. Teng, M., Ichikawa, S., Padgett, L. R., Wang, Y., Mort, M., Cooper, D. N., Koller, D. L., Foroud, T., Edenberg, H. J., Econs, M. J., et al. (2012). regsnps: a strategy for prioritizing regulatory single nucleotide substitutions. Bioinformatics, 28(14):1879-1886.
  23. Touzet, H., Varré, J.-S., et al. (2007). Efficient and accurate p-value computation for position weight matrices. Algorithms Mol Biol, 2(1510.1186):1748-7188.
Download


Paper Citation


in Harvard Style

E. Vorontsov I., V. Kulakovskiy I., Khimulya G., D. Nikolaeva D. and J. Makeev V. (2015). PERFECTOS-APE - Predicting Regulatory Functional Effect of SNPs by Approximate P-value Estimation . In Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2015) ISBN 978-989-758-070-3, pages 102-108. DOI: 10.5220/0005189301020108


in Bibtex Style

@conference{bioinformatics15,
author={Ilya E. Vorontsov and Ivan V. Kulakovskiy and Grigory Khimulya and Daria D. Nikolaeva and Vsevolod J. Makeev},
title={PERFECTOS-APE - Predicting Regulatory Functional Effect of SNPs by Approximate P-value Estimation},
booktitle={Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2015)},
year={2015},
pages={102-108},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005189301020108},
isbn={978-989-758-070-3},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2015)
TI - PERFECTOS-APE - Predicting Regulatory Functional Effect of SNPs by Approximate P-value Estimation
SN - 978-989-758-070-3
AU - E. Vorontsov I.
AU - V. Kulakovskiy I.
AU - Khimulya G.
AU - D. Nikolaeva D.
AU - J. Makeev V.
PY - 2015
SP - 102
EP - 108
DO - 10.5220/0005189301020108