Summarizing Genome-wide Phased Genotypes using Phased PC Plots

Sergio Torres-Sánchez, Nuria Medina-Medina, María M. Abad-Grau


Ordination in reduced space such as principal component (PC) analysis and their visual representation in PC plots may help to uncover important patterns among samples in highly dimensional data sets. When used with data sets obtained from genome-wide genotyping, they may show biologically relevant relationships among populations, such as population structure and admixture. Extending the PC analysis to genome-wide phased genotypes may help to reveal different levels of inbreeding between or within populations as well as to evaluate the quality of the haplotyping technique used. We have developed a method to perform PC analysis to a data set of genome-wide phased genotypes and to plot results keeping information about individuals. The method has been implemented in the computer program PCPhaser. To increase the method applicability and reduce development time, PCPhaser implements the method through the transformation of the input data set by segregating haplotypes and using software EIGENSOFT to perform PC analysis. Given this transformation, the proposed method can be applied through any other software able to perform PCA, although PCPhaser will be still required to draw the phased PC plots. PCPhaser is a linux-based software that can be downloaded from


