Search of Possible Insertions in Bacterial Genes

Eugene Korotkov, Yulia Suvorova, Maria Korotkova


It is known that nucleotide sequences are not homogeneous and from this heterogeneity arises the task of segmentation of a sequence into a set of homogeneous parts by the points called change points. In the work we investigated a special case of change points in genes – paired change points (PCP). We used a well-known property of coding sequences – triplet periodicity. The sequence that we are especially interested in consists of three successive parts: the first and the last parts have similar triplet periodicity (TP) and the middle part is of another TP type. We aimed to find genes with PCP and provide explanation for the phenomenon. We developed a mathematical method for PCP detection based on new measure of similarity between TP matrixes. Among 66936 studied genes we found 2700 genes with PCP and 6459 genes with single change point (SCP). We suppose that PCP could be associated with double fusion or insertion events.


  1. Altschul, S. F. et al., 1990. Basic local alignment search tool. Journal of molecular biology, 215(3), pp.403-410.
  2. Aroul-Selvam, R., Hubbard, T. & Sasidharan, R., 2004. Domain insertions in protein structures. Journal of molecular biology, 338(4), pp.633-641.
  3. Bernaola-Galván, P. et al., 2000. Finding borders between coding and noncoding DNA regions by an entropic segmentation method. Physical Review Letters, 85(6), pp.1342-1345.
  4. Bhattacharya, P., 1994. Some aspects of change-point analysis. In Carlstein, E., Müller, H.-G., Siegmund, D. (eds.), Change Point Problems, IMS Lecture Notes - Monograph Series, 23(1980), pp.28-56.
  5. Boeckmann, B. et al., 2003. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic acids research, 31(1), pp.365-370.
  6. Boys, R. J., Henderson, D. A. & Wilkinson, D. J., 2000. Detecting homogeneous segments in DNA sequences by using hidden Markov models. Journal of the Royal Statistical Society: Series C (Applied Statistics), 49(2), pp.269-285.
  7. Braun, J. V & Müller, H.-G., 1998. Statistical methods for DNA sequence segmentation. Statistical Science, 13(2), pp.142-162.
  8. Churchill, G. A., 1989. Stochastic models for heterogeneous DNA sequences. Bulletin of mathematical biology, 51(1), pp.79-94.
  9. Craig, C. C., 1936. On the frequency function of xy. he Annals of Mathematical Statistics, 7(1), pp.1-15.
  10. Deng, S. et al., 2012. Detecting the borders between coding and non-coding DNA regions in prokaryotes based on recursive segmentation and nucleotide doublets statistics. BMC Genomics, 13(Suppl 8), p.S19.
  11. Elton, R. A., 1974. Theoretical models for heterogeneity for base composition in DNA. Journal of Theoretical Biology, 45(2), pp.533-553.
  12. Evans, G. E. et al., 2010. Estimating Change-Points in Biological Sequences via the Cross-Entropy Method. Annals of Operations Research, 189(1), pp.155-165.
  13. Fickett, J. W., Torney, D. C. & Wolf, D. R., 1992. Base compositional structure of genomes. Genomics, 13(4), pp.1056-1064.
  14. Frenkel, F. E. & Korotkov, E. V, 2008. Classification analysis of triplet periodicity in protein-coding regions of genes. Gene, 421(1-2), pp.52-60.
  15. Frenkel, F. E. & Korotkov, E. V, 2009. Using triplet periodicity of nucleotide sequences for finding potential reading frame shifts in genes. DNA research: an international journal for rapid publication of reports on genes and genomes, 16(2), pp.105-14.
  16. Hovmoller, S. & Zhou, T., 2004. Protein shape strings and DNA sequences.
  17. Korotkov, E. V et al., 2003. The informational concept of searching for periodicity in symbol sequences. Molekuliarnaia Biologiia, 37(3), pp.436-451.
  18. Korotkov, E. V & Korotkova, M.A., 2010. Study of the triplet periodicity phase shifts in genes. Journal of integrative bioinformatics, 7(3).
  19. Korotkova, M. A., Kudryashov, N. A. & Korotkov, E. V, 2011. An approach for searching insertions in bacterial genes leading to the phase shift of triplet periodicity. Genomics, proteomics & bioinformatics, 9(4-5), pp.158-70.
  20. Kullback, S., 1997. Information Theory and Statistics. S. Kullback, ed., New York: Dover publications.
  21. Li, W. et al., 2002. Applications of recursive segmentation to the analysis of DNA sequences. Computers & chemistry, 26(5), pp.491-510.
  22. Li, W., 1997. The study of correlation structures of DNA sequences: a critical review. Computers chemistry, 21(4), pp.257-271.
  23. Melodelima, C., Gautier, C. & Piau, D., 2007. A markovian approach for the prediction of mouse isochores. Journal of Mathematical Biology, 55(3), pp.353-364.
  24. Nicorici, D. & Astola, J., 2004. Segmentation of DNA into Coding and Noncoding Regions Based on Recursive Entropic Segmentation and Stop-Codon Statistics. EURASIP Journal on Advances in Signal Processing, 2004(1), pp.81-91.
  25. Nur, D. et al., 2009. Bayesian hidden Markov model for DNA sequence segmentation: A prior sensitivity analysis. Computational Statistics & Data Analysis, 53(5), pp.1873-1882.
  26. Ogata, H. et al., 1999. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Research, 27(1), pp.29-34.
  27. Papapetrou, P., Benson, G. & Kollios, G., 2012. Mining poly-regions in DNA. International journal of data mining and bioinformatics, 6(4), pp.406-28.
  28. Shao, J., Yan, X. & Shao, S., 2012. SNR of DNA sequences mapped by general affine transformations of the indicator sequences. Journal of Mathematical Biology.
  29. Suvorova, Y.M., Rudenko, V.M. & Korotkov, E. V, 2012. Detection change points of triplet periodicity of gene. Gene, 491(1), pp.58-64.
  30. Vinckenbosch, N., Dupanloup, I. & Kaessmann, H., 2006. Evolutionary fate of retroposed gene copies in the human genome. Proceedings of the National Academy of Sciences of the United States of America, 103(9), pp.3220-3225.

Paper Citation

in Harvard Style

Korotkov E., Suvorova Y. and Korotkova M. (2014). Search of Possible Insertions in Bacterial Genes . In Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2014) ISBN 978-989-758-012-3, pages 99-108. DOI: 10.5220/0004721800990108

in Bibtex Style

author={Eugene Korotkov and Yulia Suvorova and Maria Korotkova},
title={Search of Possible Insertions in Bacterial Genes},
booktitle={Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2014)},

in EndNote Style

JO - Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2014)
TI - Search of Possible Insertions in Bacterial Genes
SN - 978-989-758-012-3
AU - Korotkov E.
AU - Suvorova Y.
AU - Korotkova M.
PY - 2014
SP - 99
EP - 108
DO - 10.5220/0004721800990108