Computational methods for genomic data analysis. Topics covered include statistical modeling of biological sequences, probabilistic models of DNA and protein evolution, expectation maximization and Gibbs sampling algorithms, genomic sequence variation, and applications in genomics and genetic epidemiology.

This course is an introduction to the fundamental mathematical models and algorithmic techniques used in bioinformatics. Emphasis will be placed on modeling computational problems arising in biology as graph-theoretic, statistical, or mathematical optimization problems, and on designing, analyzing, and implementing efficient combinatorial algorithms for the latter. Covered algorithmic techniques will include exhaustive search, integer programming, greedy algorithms, dynamic programming, divide-and-conquer, graph algorithms, combinatorial pattern matching, clustering, hidden Markov models, and randomized algorithms. Biological applications covered will include restriction mapping, DNA sequencing, motif finding, pairwise and multiple sequence alignment, gene prediction, evolutionary trees, and genome rearrangements.