Improved whole-chromosome phasing for disease and population genetic studies

O Delaneau, JF Zagury, J Marchini - Nature methods, 2013 - nature.com
Nature methods, 2013nature.com
To the Editor: Methods that can accurately estimate haplotypes from single-nucleotide
polymorphism (SNP) genotype data are important because they are widely used in many
areas of genetic analysis. Examples include the creation of haplotype reference panels, pre-
phasing1 before genotype imputation in genome-wide association studies (GWAS), and
population genetic analysis. The task is an inverse problem in which we observe a set of
SNP genotypes in a sample, typically using a genome-wide SNP microarray, and wish to …
To the Editor: Methods that can accurately estimate haplotypes from single-nucleotide polymorphism (SNP) genotype data are important because they are widely used in many areas of genetic analysis. Examples include the creation of haplotype reference panels, pre-phasing1 before genotype imputation in genome-wide association studies (GWAS), and population genetic analysis. The task is an inverse problem in which we observe a set of SNP genotypes in a sample, typically using a genome-wide SNP microarray, and wish to infer the underlying haplotypes carried by the study individuals. We recently described the SHAPEIT1 method for inferring haplotype phase from genotype data2, which improves accuracy and computational efficiency compared to other methods for sample sizes up to~ 1,200. Here we present SHAPEIT2, a method that combines features of SHAPEIT1 and Impute2 (ref. 3) to substantially enhance performance. We use the SHAPEIT1 Markov model that represents the space of haplotypes consistent with a given individual’s genotypes across a whole chromosome. The transition probabilities of this model are estimated by applying the Impute2 ‘surrogate family’phasing approach in local windows of size W. In each window, K informative haplotypes are chosen to update the transition probabilities of the Markov model. The method generalizes the Impute2 method so that it can be applied across a whole chromosome with linear computational scaling in K (Supplementary Methods).
nature.com