Authors:
Ilya E. Vorontsov
1
;
Ivan V. Kulakovskiy
2
;
Grigory Khimulya
1
;
Daria D. Nikolaeva
3
and
Vsevolod J. Makeev
4
Affiliations:
1
Vavilov Institute of General Genetics, Russian Federation
;
2
Vavilov Institute of General Genetics and Engelhardt Institute of Molecular Biology, Russian Federation
;
3
Lomonosov Moscow State University, Russian Federation
;
4
Vavilov Institute of General Genetics, Engelhardt Institute of Molecular Biology and Moscow Institute of Physics and Technology, Russian Federation
Keyword(s):
Single Nucleotide Polymorphism, SNP, Single Nucleotide Variant, SNV, P-value, Transcription Factor Binding Site, TFBS, Position Weight Matrix, PWM, PSSM, Transcriptional Regulation.
Related
Ontology
Subjects/Areas/Topics:
Algorithms and Software Tools
;
Bioinformatics
;
Biomedical Engineering
;
Sequence Analysis
;
Web Services in Bioinformatics
Abstract:
Single nucleotide polymorphisms (SNPs) and variants (SNVs) are often found in regulatory regions of human genome. Nucleotide substitutions in promoter and enhancer regions may affect transcription factor (TF) binding and alter gene expression regulation. Nowadays binding patterns are known for hundreds of human TFs. Thus one can assess possible functional effects of allele variations or mutations in TF binding sites using sequence analysis. We present PERFECTOS-APE, the software to PrEdict Regulatory Functional Effect of SNPs by Approximate P-value Estimation. Using a predefined collection of position weight matrices (PWMs) representing TF binding patterns, PERFECTOS-APE identifies transcription factors whose binding sites can be significantly affected by given nucleotide substitutions. PERFECTOS-APE supports both classic PWMs under the position independency assumption, and dinucleotide PWMs accounting for the dinucleotide composition and correlations between nucleotides in adjacent
positions within binding sites. PERFECTOS-APE uses dynamic programming to calculate PWM score distribution and convert the scores to P-values with an optional binary search mode using a precomputed P-value list to speed-up the computations. Software is written in Java and is freely available as standalone program and online tool: http://opera.autosome.ru/perfectosape/. We have tested our algorithm on several disease associated SNVs as well as on a set of cancer somatic mutations occurring in intronic regions of the human genome.
(More)