Table 6: The first row reports the number of pseudogenes analyzed for each gene (attested+not reported), the second row
reports the number of CpG islands identified in each group. The last row displays the maximum CpG island length in each
group.
RPL14 RPL19 RPL22 RPL36 RPL37 PTEN KRAS RAP1A RAP1B CX43 HDAC1 sum
analyzed 10 21 24 21 27 1 1 2 5 1 3 116
CpG 0 2 2 4 1 1 1 2 3 0 0 16
max len. 132 142 268 90 805 95 100 111
Table 7: The table reports the scores of the alignments among the upstream regions (bottom-left) and among the downstream
regions (top-right) of the pseudogenes of RAP1B.
AC113404.3 RAP1BP1 RAP1BP2 RAP1BP3 AL161670.1
AC113404.3 qc=100%, pi=92.83% qc=100%, pi=87.23% qc=99%, pi=91.40% no significant similarity
RAP1BP1 qc=32%, pi=83.45% qc=100%, pi=82.47% qc=99%, pi=86.17% no significant similarity
RAP1BP2 qc=16%, pi=73.33% qc=31%, pi=70.34% qc=100%, pi=81.27% qc=2%, pi=100%
RAP1BP3 qc=35%, pi=72.15% qc=27%, pi=70.75% qc=28%, pi=65.94% no significant similarity
AL161670.1 qc=32%, pi=89.44% qc=33%, pi=82.68% qc=18%, pi=85.07% qc=2%, pi=92.26%
detected (Zheng et al., 2007).
PolyA tails. Although it was observed that a
polyA tail is present beyond a processed pseudogene
in about half of the cases (Zhang et al., 2002), the
presence of a polyA tail (with a possible polyadeny-
lation signal) could help the definition of a sequence
as a processed pseudogene.
CpG islands and motif discovery. The accepted
definition of what is a CpG island was proposed
in 1987 as being a 200 bp stretch of DNA with a
C+G content of 50% and an observed CpG/expected
CpG in excess of 0.6 (Gardiner-Garden and Frommer,
1987). However, any definition of CpG island, af-
ter all, is arbitrary (Takai and Jones, 2002). Using a
HMM designed for the purpose, we found some CpG
islands of different lengths and located at different
distances from the pseudogenes. Then we tried to find
a motif (or signal) in the upstream regions in which a
CpG island is present. The issue of searching for pos-
sible promoter sequences within these orphan CpG re-
gions is a promising future development of this work.
The experiments with the Gibbs sampling showed a
surprising similarity between the flanking regions of
some pseudogenes of the same gene. This suggests
that generation of the processed pseudogenes should
be further investigated.
REFERENCES
E. S. Lander et al., International Human Genome Sequenc-
ing Consortium Initial sequencing and analysis of the
human genome, in Nature 409 (2001), 860-921, doi:
10.1038/35057062
M. T. Maurano et al. Systematic localization of common
disease-associated variation in regulatory DNA. in
Science 337 (2012), 1190-1195. doi: 10.1126/sci-
ence.1222794
M. A. Schaub et al. Linking disease associations with
regulatory information in the human genome. in
Genome Research 22(9) (2012), 1748-2759. doi:
10.1101/gr.136127.111
A. F. Martinez et al. An ultraconserved brain-specific en-
hancer within DGRL3 (LPHN3) underpins attention-
deficit/hyperactivity disorder susceptibility. in Bi-
ological Psychiatry 80 (2016), 943-954. doi:
10.1016/j.biopsych.2016.06.026
J. Amiel, S. Benko, C.T. Gordon and S. Lyonnet. Disrup-
tion of long-distance higly conserved noncoding ele-
ments in neurocristopathies. in Annals of the New York
Academy Sciences 1214 (2010), 34-46
C. Braconi et al. Expression and functional role of a tran-
scribed noncoding RNA with an ultraconserved ele-
ment in hepatocellular carcinoma. in Proceedings of
the National Academy of Sciences 108 (2011), 786-
791. doi: 10.1073/pnas.1010198108.
B. Bao et al. Genetic variants in ultraconserved regions
associate with prostate cancer recurrence and sur-
vival. in Scientific Reports 6 (2016), 22124 doi:
10.1038/srep22124
Feng Zhang and James R. Zhang. Non-coding genetic vari-
ants in human disease. in Human Molecular Genetics
24 (2015), R102-R110. doi: 10.1093/hmg/ddv259
Eugene V. Koonin. Orthologs, Paralogs and Evolu-
tionary Genomics. in Annual Review of Genet-
ics 39(1) (2005), 309-338. doi: 10.1146/an-
nurev.genet.39.073003.114725
D. Zheng et al. Pseudogenes in the ENCODE regions: con-
sensus annotation, analysis of transcription, and evo-
lution. in Genome Research 17 (2007), 839-851. doi:
10.1101/gr.5586307
Yoshihito Niimura, Masatoshi Nei. Extensive gains and
losses of olfactory receptor genes in mammalian evo-
lution. in PLoS ONE 2(8) (2007), 860-921. doi:
10.1371/journal.pone.0000708
Zhaolei Zhang, Paul Harrison, Mark Gerstein. Identifica-
tion and analysis of over 200 ribosomal protein pseu-
dogenes in the human genome. in Genome Research
12(10) (2002), 1466-1482. doi:10.1101/gr.331902
L. Poliseno et al. A coding-independent function of gene
and pseudogene mRNAs regulates tumor biology. in
Nature 465 (2010), 1033-1038. doi: 10.1038/na-
ture09144
Non-coding DNA: A Methodology for Detection and Analysis of Pseudogenes
99