cswHMM: A NOVEL CONTEXT SWITCHING HIDDEN MARKOV MODEL FOR BIOLOGICAL SEQUENCE ANALYSIS

Vojtěch Bystrý, Matej Lexa

2012

Abstract

In this work we created a sequence model that goes beyond simple linear patterns to model a specific type of higher-order relationship possible in biological sequences. Particularly, we seek models that can account for partially overlaid and interleaved patterns in biological sequences. Our proposed context-switching model (cswHMM) is designed as a variable-order hidden Markov model (HMM) with a specific structure that allows switching control between two or more sub-models. An important feature of our model is the ability of its sub-models to store their last active state, so when each sub-model resumes control it can continue uninterrupted. This is a fundamental variation on the closely related jumping HMMs. A combination of as few as two simple linear HMMs can describe sequences with complicated mixed dependencies. Tests of this approach suggest that a combination of HMMs for protein sequence analysis, such as pattern mining based HMMs or profile HMMs, with the context-switching approach can improve the descriptive ability and performance of the models.

References

  1. Bailey, T. L., et al., 2009. MEME SUITE: tools for motif discovery and searching, Nucl. Acids Res. 37(suppl 2)
  2. Bejerano, G., Yona, G., 2001. Variations on probabilistic suffix trees: statistical modeling and prediction of protein families Bioinformatics 17(1): 23-43
  3. Eddy, S. R., 1998. Profile hidden Markov models. Bioinformatics 14(9): 755-763
  4. Fernandez-Fuentes, N., Dybas, J. M., Fiser, A., 2010. Structural Characteristics of Novel Protein Folds. PLoS Comput Biol 6(4)
  5. Finn R. D., et al., 2010. The Pfam protein families database. Nucleic Acids Research, Database Issue 38: D211-222
  6. Ganapathiraju, M., et al., 2005. Computational Biology and Language.
  7. Karplus, K., Barrett, C., Hughey, R., 1998. Hidden Markov models for detecting remote protein homologies. Bioinformatics 14(10): 846-856
  8. Karypis, G., 2002. CLUTO a Clustering Toolkit, Technical Report 02-017, Dept. of Computer Science, Univ. of Minnesota, http://www.cs.umn.edu/cluto
  9. Krogh, A., Larsson, B., von Heijne, G., Sonnhammer, E. L., 2001. Predicting transmembrane protein topology with a hidden Markov model: Application to complete genomes. JMol Biol 305:567-580.(2001)
  10. Majoros, W. H., Korf, I., Ohler, U., 2009. Gene Prediction Methods Bioinformatics, 99-119,
  11. Nicolas, H. et al., 2004. Recent improvements to the PROSITE database Nucl. Acids Res. 32(suppl 1): D134-D137
  12. Pachter L., Alexandersson M., Cawley S., 2001. Applications of generalized pair hidden Markov models to alignment and gene finding problems (RECOMB 7801). ACM, New York, NY, USA, 241-248.
  13. Schultz, A. K., et al., 2006. A jumping profile Hidden Markov Model and applications to recombination sites in HIV and HCV genomes, BMC Bioinformatics 2006, 7:265
  14. Viklund H., Elofsson A., 2004. Best a-helical transmembrane protein topology predictions are achieved using hidden Markov models and evolutionary information Protein Sci. 13(7): 1908- 1917.
  15. Zaki, M. J., Carothers, Ch. D., Szymanski, B. K., 2010. VOGUE: A variable order hidden Markov model with duration based on frequent sequence mining. ACM Trans. Knowl. Discov. Data 4, 1, Article 5 (January)
  16. Zaki, M. J., 2001. SPADE: An efficient algorithm for mining frequent sequences. Mach. Learn. J. 42, 1/2, 31-60.
Download


Paper Citation


in Harvard Style

Bystrý V. and Lexa M. (2012). cswHMM: A NOVEL CONTEXT SWITCHING HIDDEN MARKOV MODEL FOR BIOLOGICAL SEQUENCE ANALYSIS . In Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2012) ISBN 978-989-8425-90-4, pages 208-213. DOI: 10.5220/0003780902080213


in Bibtex Style

@conference{bioinformatics12,
author={Vojtěch Bystrý and Matej Lexa},
title={cswHMM: A NOVEL CONTEXT SWITCHING HIDDEN MARKOV MODEL FOR BIOLOGICAL SEQUENCE ANALYSIS},
booktitle={Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2012)},
year={2012},
pages={208-213},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003780902080213},
isbn={978-989-8425-90-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2012)
TI - cswHMM: A NOVEL CONTEXT SWITCHING HIDDEN MARKOV MODEL FOR BIOLOGICAL SEQUENCE ANALYSIS
SN - 978-989-8425-90-4
AU - Bystrý V.
AU - Lexa M.
PY - 2012
SP - 208
EP - 213
DO - 10.5220/0003780902080213