Reconstruction of Haplotype Spectra from High-Throughput Sequencing Data
Funding agency: National Science Foundation, Division of Information & Intelligent Systems
Award #: IIS-0916948
Amount: $275,000
PI: Ion I. Mandoiu, Co-PI: Yufeng Wu
Period: 09/2009–08/2013
Abstract:
Recent advances in high-throughput sequencing (HTS) technologies provide opportunities to study genome structure, function, and evolution at an unprecedented scale, and are profoundly transforming genomic research. However, fully realizing the potential of HTS technologies requires sophisticated data analysis methods. This research project is aimed at developing efficient computational methods for reconstructing the full spectrum of haplotype sequences from HTS data. Working in collaboration with molecular biologists from the University of Connecticut Health Center and the Centers for Disease Control, the investigators will develop methods enabling three novel applications of HTS, namely (a) reconstruction of diploid genome sequences, including complete haplotype sequences of each CNV copy, (b) reconstruction of alternative splicing isoform sequences and their frequencies, and (c) reconstruction of viral quasispecies sequences and their frequencies.
Major outcomes of the project will include the development of a comprehensive analytical toolkit for these problems, and high-quality open source software implementations that will be made available free of charge to the research community. The developed methods will be based on novel probabilistic models that allow accurate haplotype spectra reconstruction by integrating diverse sources of information including paired-end reads and panels of reference haplotypes. The project will also lead to the development of new theoretically-sound optimization techniques, such as minorize-maximize schemes and network flow formulations, that will result in efficient algorithms capable of handling the massive datasets generated by high-throughput sequencing technologies. The project aims to provide opportunities for participation of undergraduate and graduate students in bioinformatics research, and will especially encourage participation of women and underrepresented groups.
Software Packages
- ViSpA – Viral Spectrum Assembler
- NGSTools – Java tools for analysis of Next Generation Sequencing data
- IsoEM: Inferring Alternative Splicing Isoform Frequencies from High-Throughput RNA-Seq Data
- IsoDE: Bootstrapping-based differential gene expression analysis for RNA-Seq data with and without replicates
- DGE-EM: Accurate Estimation of Gene Expression Levels from DGE Sequencing Data
- GEDI-ADMX: Genotype Error Detection and Imputation for Admixed Populations
- GeneSeq: LD-based SNP genotype calling from shotgun sequencing reads