Summarizing Genome-wide Phased Genotypes using Phased PC Plots

Sergio Torres-Sánchez, Nuria Medina-Medina, María M. Abad-Grau

Abstract

Ordination in reduced space such as principal component (PC) analysis and their visual representation in PC plots may help to uncover important patterns among samples in highly dimensional data sets. When used with data sets obtained from genome-wide genotyping, they may show biologically relevant relationships among populations, such as population structure and admixture. Extending the PC analysis to genome-wide phased genotypes may help to reveal different levels of inbreeding between or within populations as well as to evaluate the quality of the haplotyping technique used. We have developed a method to perform PC analysis to a data set of genome-wide phased genotypes and to plot results keeping information about individuals. The method has been implemented in the computer program PCPhaser. To increase the method applicability and reduce development time, PCPhaser implements the method through the transformation of the input data set by segregating haplotypes and using software EIGENSOFT to perform PC analysis. Given this transformation, the proposed method can be applied through any other software able to perform PCA, although PCPhaser will be still required to draw the phased PC plots. PCPhaser is a linux-based software that can be downloaded from http://bios.ugr.es/PCPhaser.

References

  1. Brisbin, A. (2010). Linkage analysis for categorical traits and ancestry assignment in admixed individuals. PhD thesis, Cornell University, Ithaca, New York.
  2. Browning, B. L. and Browning, S. R. (2009). A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. The American Journal of Human Genetics, 84(2):210-223.
  3. Consortium', T. . G. P. (2010). A map of human genome variation from population-scale sequencing. Nature, 467:1061-73.
  4. Delaneau, O., Marchini, J., and Zagury, J.-F. (2011). A linear complexity phasing method for thousands of genomes. Nature Methods, 9(2):179-81.
  5. HapMap-Consortium, T. I. (2003). The international hapmap project. Nature, 426:789-796.
  6. HapMap-Consortium, T. I. (2010). Integrating common and rare genetic variation in diverse human populations. Nature, 467(7311):52-58.
  7. Jombart, T., Pontier, D., and Dufour, A.-B. (2009). Genetic markers in the playground of multivariate analysis. Heredity, 102:330-41.
  8. Lao, O., Lu, T. T., Nothnagel, M., et al. (2008). Correlation between genetic and geographic structure in europe. Curr. Bio., 18:1241-8.
  9. Nicholson, G., Smith, A., Johnson, F., et al. (2002). Assessing population differentiation and isolation from single-nucleotide polymorphism data. JRSS (B), 64:695-715.
  10. Novembre, J., Toby, Bryc, K., et al. (2008). Genes mirror geography within europe. Nature, 456(7218):98-101.
  11. Pariset, L., Savarese, M., Cappuccio, I., and Valentini, A. (2003). Use of microsatellites for genetic variation and imbreeding analysis in sarda sheep flocks of central italy. Journal of Animal Breeding Genetics, 120:425-32.
  12. Patterson, N., Price, A. L., and Reich, D. (2006). Population structure and eigenanalysis. PLoS Genetics, 2(12):2074-93.
  13. Sebastiani, P., Abad-Grau, M., Alpargu, G., and Ramoni, M. F. (2004). Robust transmission/disequilibrium test for incomplete family genotypes. Genetics, 168(4):2329-37.
  14. Silva-Zolezzi, I., Hidalgo-Miranda, A., Estrada-Gil, J., et al. (2009). Analysis of genomic diversity in mexican mestizo populations to develop genomic medicine in mexico. PNAS, 106(21):8611-16.
  15. Turner, D. J. and Hurles, M. E. (2003). High-throughput haplotype determiantion over long distances by haplotype fusion pcr and ligation haplotyping. Nature Protocols, 4:1771-83.
  16. Wang, C., Szpiech, Z., Degnan, J., et al. (2010). Comparing spatial maps of human population-genetic variation using procrustes analysis. Stat. Appl. Genet. Molec. Biol., 9(1):13.
Download


Paper Citation


in Harvard Style

Torres-Sánchez S., Medina-Medina N. and Abad-Grau M. (2014). Summarizing Genome-wide Phased Genotypes using Phased PC Plots . In Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2014) ISBN 978-989-758-012-3, pages 130-135. DOI: 10.5220/0004793501300135


in Bibtex Style

@conference{bioinformatics14,
author={Sergio Torres-Sánchez and Nuria Medina-Medina and María M. Abad-Grau},
title={Summarizing Genome-wide Phased Genotypes using Phased PC Plots},
booktitle={Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2014)},
year={2014},
pages={130-135},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004793501300135},
isbn={978-989-758-012-3},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2014)
TI - Summarizing Genome-wide Phased Genotypes using Phased PC Plots
SN - 978-989-758-012-3
AU - Torres-Sánchez S.
AU - Medina-Medina N.
AU - Abad-Grau M.
PY - 2014
SP - 130
EP - 135
DO - 10.5220/0004793501300135