fication of tandem repeat for its high accuracy. SRF is
the representative program of signal processing algo-
rithms to identify tandem repeat. It finds repetitions
by converting the target DNA sequence from time do-
main to frequency domain using Fourier transform.
The tandem repeat detection using PT, which is called
PTF in this paper, is one of the algorithms for detect-
ing tandem repeat based on signal processing. How-
ever, it does not use Fourier transforms but uses pe-
riod transform to find repetitions.
We performed experiments to derive the consen-
sus tandem repeat unit of each tandem repeat of the
human genome by using the conventional schemes.
Table 2 and Table 3 compare the results of our
proposed scheme with those of three conventional
schemes. Among the conventional schemes, TRF
finds the most exact consensus tandem repeat unit in
all the given tandem repeats, whereas PTF shows poor
performance to detect consensus tandem repeat units.
The number of fails in Table 2 and Table 3 means the
number of the cases that a consensus tandem repeat
is not detected because the given DNA sequence is
not determined to be a tandem repeat. Thus, PTF is
only usable to find the consensus tandem repeat units
of the tandem repeats of human chromosome Y. This
is because PTF does not consider the mutations of in-
sertion and deletion of nucleotide bases. Although the
performance of TRF is similar to TR Analyzer, TRF
is inadequate to find the consensus tandem repeat unit
of tandem repeats that are lengthy and highly broken
like TR2 of human chromosome 8 as shown in Table
2 and Table 3. Therefore, TR Analyzer is the most ap-
propriate tool to derive the consensus tandem repeat
unit to date, though there are many other tools that
can be substituted.
4 CONCLUSIONS AND FURTHER
WORKS
We proposed a system model for analyzing MTRA,
which derives the consensus tandem repeat units
based on the homology of the multiple tandem re-
peats and shows the structure of MTRA though a
simple diagram representation. The proposed sys-
tem model was performed on four MTRAs of the hu-
man genome, which are chromosome 7 (57,937,500
- 58,056,406 bp), chromosome 8 (46,832,500 -
47,458,334 bp), chromosome 22 (16,505,625 -
16,627,187 bp), and chromosome Y (25,000 -
117,031 bp). The algorithm for deriving a consensus
tandem repeat unit of a tandem repeat in the proposed
system model can be substituted by a conventional
scheme that finds tandem repeat. However, in view of
deriving an exact consensus tandem repeat unit, the
experimental results showed that the proposed algo-
rithm is the most appropriate for deriving a consensus
tandem repeat unit to date.
The analysis of MTRA was performed based on
the hypothesis that the homologous tandem repeats
of an MTRA are originated from a same tandem re-
peat and MTRAs are very important to biological
phenomenon. This hypothesis is sufficiently plausi-
ble considering the high identity of the homologous
tandem repeats of an MTRA and their highly struc-
tured unique patterns. However, since the hypothesis
should be verified biologically, we are going to per-
form the biological experiments of MTRA with the
systematic analysis.
REFERENCES
Benson, G. (1999). Tandem repeats finder: a program
to analyze dna sequences. Nucleic Acids Research,
27(2):573–580.
Brodzik, A. (2007). Quaternionic periodicity transform: an
algebraic solution to the tandem repeat detection prob-
lem. Bioinformatics, 23(6):694–700.
Buchner, M. and Janjarasjitt, S. (2003). Detection and vi-
sualization of tandem repeats in dna sequences. IEEE
Transactions on Signal Processing, 51(9):2280–2287.
Christian, M., Dennis, J., and John, M. (2001). Strbase:
a short tandem repeat dna database for the human
identity testing community. Nucleic Acids Research,
29(1):320–322.
Chung, B., Lee, K., Shin, K., Kim, W., Kwon, D., You, R.,
Lee, Y., Cho, K., and Cho, D. (2011). Reminer: a
tool for unbiased mining and analysis of repetitive el-
ements and their arrangement structures of large chro-
mosomes. Genomics, 98(5):381–389.
Edgar, R. and Myers, E. (2005). Piler: identification and
classification of genomic repeats. Bioinformatics,
21(Suppl. 1):i152–i158.
Hauth, A. and Joseph, D. (2002). Beyond tandem repeats:
complex pattern structures and distant regions of sim-
ilarity. Bioinformatics, 18(Suppl. 1):S31–S37.
Humberto, C. and David, L. (1998). The multiple sequence
alignment problem in biology. SIAM Journal on Ap-
plied Mathematics, 48(5):1073–1082.
Just, W. (2001). Computational complexity of multiple se-
quence alignment with sp-score. Journal of Computa-
tional Biology, 8(6):615–623.
Kazazian, H. (2004). Mobile elements: drivers of genome
evolution. Science, 303(5664):1626–1632.
Kim, W., Lee, K., Shin, K., You, R., Lee, Y., Cho, K., and
Cho, D. (2012). Reminer-ii: A tool for rapid iden-
tification and configuration of repetitive element ar-
rays from large mammalian chromosomes as a single
query. Genomics, 100(3):131–140.
Lipman, D., Altschul, S., and Kececioglu, J. (1989). A tool
for multiple sequence alignment. Proceedings of the
SystematicAnalysisofStructureofMultipleTandemRepeatArraysintheHumanGenome
51