ARRAY-BASED GENOME COMPARISON OF ARABIDOPSIS ECOTYPES USING HIDDEN MARKOV MODELS

Michael Seifert, Ali Banaei, Jens Keilwagen, Michael Florian Mette, Andreas Houben, François Roudier, Vincent Colot, Ivo Grosse, Marc Strickert

2009

Abstract

Arabidopsis thaliana is an important model organism in plant biology with a broad geographic distribution including ecotypes from Africa, America, Asia, and Europe. The natural variation of different ecotypes is expected to be reflected to a substantial degree in their genome sequences. Array comparative genomic hybridization ( ACGH ) can be used to quantify the natural variation of different ecotypes at the DNA level. Besides, such ACGH data provides the basics to establish a genome-wide map of DNA copy number variation for different ecotypes. Here, we present a new approach based on Hidden Markov Models (HMMs) to predict copy number variations in ACGH experiments. Using this approach, an improved genome-wide characterization of DNA segments with decreased or increased copy numbers is obtained in comparison to the routinely used segMNT algorithm. The software and the data set used in this case study can be downloaded from http://dig.ipk-gatersleben.de/HMMs/ACGH/ACGH.html.

References

  1. Baum, L. E. (1972). An equality and associated maximization technique in statistical estimation for probabilistic functions of Markov processes. Inequalities, 3:1-8.
  2. Borevitz, J. O., Liang, D., Plouffe, D., Chang, H.-S., Zhu, T., Weigel, D., Berry, C. C., Winzeler, E., and Chory, J. (2003). Large-scale identification of single-feature polymorphisms in complex genomes. Genome Res, 13:513-523.
  3. Cahan, P., Godfrey, L. E., Eis, P. S., Richmond, T. A., Selzer, R. R., Brent, M., McLeod, H. L., Ley, T. J., and Graubert, T. A. (2008). wuHMM: a robust algorithm to detect DNA copy number variation using long oligonucleotide microarray data. Nucleic Acids Research, 36(7):1-11.
  4. Dempster, A., Laird, N., and Rubin, D. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39(1):1-38.
  5. Durbin, R., Eddy, S., Krogh, A., and Mitchision, G. (1998). Biological sequence analysis - Probabilistic models of proteins and nucleic acids. Cambridge University Press.
  6. Fan, C., Vibranovski, M. D., Chen, Y., and Long, M. (2007). A Microarray Based Genomic Hybridization Method for Identification of New Genes in Plants: Case Analyses of Arabidopsis and Oryza. J Integr Plant Biol, 49(6):915-926.
  7. Fridlyand, J., Snijders, A. M., Pinkel, D., Albertson, D. G., and Jain, A. N. (2004). Hidden Markov models approach to the analysis of array CGH data. J Multivariate Analysis, 90:132-153.
  8. Hupé, P., Stransky, N., Thiery, J.-P., Radvanyi, F., and Barillot, E. (2004). Analysis of array CGH data: from signal ratio to gain and loss of DNA regions. Bioinformatics, 20(18):3413-3422.
  9. James, N., Graham, N., Celments, D., Schildknecht, B., and May, S. (2007). AtEnsEMBL: A Post-Genomic Resource Browser for Arabidopsis. Methods Mol Biol, 406:213-228.
  10. Jong, K., Marchiori, E., Meijer, G., Vaar, A. v. d., and Ylstra, B. (2004). Breakpoint identification and smoothing of array comparative genomic hybridization data. Bioinformatics, 20(18):3636-3637.
  11. Lai, W. R., Johnson, M. D., Kucherlapati, R., and Park, P. J. (2005). Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data. Bioinformatics, 21(19):3763-3770.
  12. Mantripragada, K. K., Buckley, P. G., de Stahl, T. D., and Dumanski, J. P. (2004). Genomic microarrays in the spotlight. Trends Genet, 20:87-94.
  13. Marioni, J. C., Thorne, N. P., and Tavaré (2006). BioHMM: a heterogeneous hidden Markov model for segmenting array CGH data. Bioinformatics, 22(9):1144-1146.
  14. Martienessen, R. A., Doerge, R. W., and Colot, V. (2005). Epigenomic mapping in Arabidopsis using tiling microarrays. Chromosome Research, 13:299-308.
  15. Rabiner, L. (1989). A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE, 77(2):257-286.
  16. Richardson, S. and Green, P. J. (1997). On Bayesian Analysis of Mixtures with an Unknown Number of Components. Journal of the Royal Statistical Society, Series B, 59(4):731-792.
  17. Roche NimbleGen, Inc. (2008). A Performance Comparison of Two CGH Segmentation Analysis Algorithms: DNACopy and segMNT. Available online: http://www.nimblegen.com.
  18. Rueda, O. M. and Díaz-Uriate, R. (2007). Flexible and Accurate Dection of Genomic Copy-Number Changes from aCGH. PLoS Comput Biol, 3(6).
  19. Willenbrock, H. and Fridlyand, J. (2005). A comparison study: applying segmentation to array CGH data for downstream analyses. Bioinformatics, 21(22):4084- 4091.
Download


Paper Citation


in Harvard Style

Seifert M., Banaei A., Keilwagen J., Mette M., Houben A., Roudier F., Colot V., Grosse I. and Strickert M. (2009). ARRAY-BASED GENOME COMPARISON OF ARABIDOPSIS ECOTYPES USING HIDDEN MARKOV MODELS . In Proceedings of the International Conference on Bio-inspired Systems and Signal Processing - Volume 1: BIOSIGNALS, (BIOSTEC 2009) ISBN 978-989-8111-65-4, pages 3-11. DOI: 10.5220/0001123700030011


in Bibtex Style

@conference{biosignals09,
author={Michael Seifert and Ali Banaei and Jens Keilwagen and Michael Florian Mette and Andreas Houben and François Roudier and Vincent Colot and Ivo Grosse and Marc Strickert},
title={ARRAY-BASED GENOME COMPARISON OF ARABIDOPSIS ECOTYPES USING HIDDEN MARKOV MODELS},
booktitle={Proceedings of the International Conference on Bio-inspired Systems and Signal Processing - Volume 1: BIOSIGNALS, (BIOSTEC 2009)},
year={2009},
pages={3-11},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001123700030011},
isbn={978-989-8111-65-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Bio-inspired Systems and Signal Processing - Volume 1: BIOSIGNALS, (BIOSTEC 2009)
TI - ARRAY-BASED GENOME COMPARISON OF ARABIDOPSIS ECOTYPES USING HIDDEN MARKOV MODELS
SN - 978-989-8111-65-4
AU - Seifert M.
AU - Banaei A.
AU - Keilwagen J.
AU - Mette M.
AU - Houben A.
AU - Roudier F.
AU - Colot V.
AU - Grosse I.
AU - Strickert M.
PY - 2009
SP - 3
EP - 11
DO - 10.5220/0001123700030011