Genotype Phasing by Entropy Minimization

Introduction

A Single Nucleotide Polymorphism (SNP) is a position in the genome at which two or more of the possible four nucleotides occur in a large percentage of the population. SNPs account for most of the genetic variability between individuals, and mapping SNPs in the human population has become the next high-priority in genomics after the completion of the Human Genome project. In diploid organisms such as humans, there are two non-identical copies of each autosomal chromosome. A description of the SNPs in a chromosome is called a haplotype.

At present, it is prohibitively expensive to directly determine the haplotypes of an individual, but it is possible to obtain rather easily the conflated SNP information in the so called genotype. Computational methods for genotype phasing, i.e., inferring haplotypes from genotype data, have received much attention in recent years as haplotype information leads to increased statistical power of disease association tests. ENT is a highly scalable genotype phasing algorithm based on entropy minimization. ENT is capable of phasing both unrelated and related genotypes coming from complex pedigrees.

ENT source code

ENT Web-Interface

Contact Information

gusev@cs.columbia.edu

bpasaniuc@mednet.ucla.edu

ion@engr.uconn.edu

Related Publications

Related Presentations

Acknowledgment and Disclaimer

This material is based upon work supported in part by the National Science Foundation under Grants No. IIS-0546457 and DBI-0543365. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.