Reconstruction of Haplotype Spectra from High-Throughput Sequencing Data

Funding agency: National Science Foundation, Division of Information & Intelligent Systems
Award #: IIS-0916948
Amount: $275,000
PI: Ion I. Mandoiu, Co-PI: Yufeng Wu
Period: 09/2009–08/2013

Abstract:

Recent advances in high-throughput sequencing (HTS) technologies provide opportunities to study genome structure, function, and evolution at an unprecedented scale, and are profoundly transforming genomic research. However, fully realizing the potential of HTS technologies requires sophisticated data analysis methods. This research project is aimed at developing efficient computational methods for reconstructing the full spectrum of haplotype sequences from HTS data. Working in collaboration with molecular biologists from the University of Connecticut Health Center and the Centers for Disease Control, the investigators will develop methods enabling three novel applications of HTS, namely (a) reconstruction of diploid genome sequences, including complete haplotype sequences of each CNV copy, (b) reconstruction of alternative splicing isoform sequences and their frequencies, and (c) reconstruction of viral quasispecies sequences and their frequencies.

Major outcomes of the project will include the development of a comprehensive analytical toolkit for these problems, and high-quality open source software implementations that will be made available free of charge to the research community.  The developed methods will be based on novel probabilistic models that allow accurate haplotype spectra reconstruction by integrating diverse sources of information including paired-end reads and panels of reference haplotypes. The project will also lead to the development of new theoretically-sound optimization techniques, such as minorize-maximize schemes and network flow formulations, that will result in efficient algorithms capable of handling the massive datasets generated by high-throughput sequencing technologies.  The project aims to provide opportunities for participation of undergraduate and graduate students in bioinformatics research, and will especially encourage participation of women and underrepresented groups.

Publications

Presentations

Software Packages