Guide to running IsoEM by aligning reads on a library of known isoforms
  =======================================================================

  IsoEM takes as input alignments given in genome coordinates in SAM format.
However, best results are obtained if the reads are aligned directly on
a library of known isoforms rather than directly on the genome.  IsoEM 
comes with a series of tools designed to assist in the process.


  In this guide we show all the steps necessary to obtain isoform frequency 
estimates from a set of reads.

A. Download and install IsoEM. You can find it at:
http://dna.engr.uconn.edu/software/IsoEM


B. Download the sample.zip archive from 
http://dna.engr.uconn.edu/software/IsoEM/sample.zip

NOTE: the file is large, because it contains the full human genome 
reference sequence.

The archive contains the following files and directories:
=====================
README-SAMPLE.TXT        - This file
sample.sh                - Script to run all the steps presented below 
sample/
  hg18_ref_genome.fa     - Human genome in fasta format
  reads.fastq            - 1 million 25bp single reads in fastq format
  knownGene.gtf          - UCSC known isoforms in GTF format
  knownToEnsembl.txt     \
  knownToGnfAtlas2.txt     Three different mappings of isoforms to genes
  knownToRefSeq.txt      /
=====================


C. Unzip the archive in the same directory where IsoEM is installed!


D. Download and install bowtie from:
http://bowtie-bio.sourceforge.net/index.shtml


E. Take a look at the sample.sh script. The script is intended to run 
everything needed. Here's a summary of all the steps executed by this script:

1. Extract the isoform nucleotide sequences for all the known isoforms 
from the genome sequence based on the coordinates given in the .gtf
file.

2. Create a bowtie index for the isoform sequences

3. Align the reads on the library of isoforms

4. Convert the alignments from isoform coordinates to genome coordinates

5. Run IsoEM

6. Run Isoviz


  If the script runs correctly, you'll end up with multiple new files in
the sample/ directory. The interesting files are:
===================
sample/genome_aligned_reads.iso_estimates     		- estimated FPKMs (Fragments Per Kilobase per Million reads)
                                                    	   for all the isoforms in the GTF file

sample/genome_aligned_reads.gene_estimates    		- estimated FPKMs for genes

sample/genome_aligned_reads_iso_read_coverage.bed	- isoforms coverage by reads

sample/genome_aligned_reads_isoforms_w_fpkm.gtf		- isoforms with their fpkm values
===================

  If you have any questions or suggestions please contact one of:

Marius Nicolae (man09004@engr.uconn.edu)
Serghei Mangul (serghei.mangul@gmail.com)
Sahar Al Seesi (sahar@engr.uconn.edu)
Ion Mandoiu (ion@engr.uconn.edu)
Alex Zelikovsky (alexz@cs.gsu.edu)