Guide to running IsoEM2 and IsoDE2 on test data set =================================================== In this guide we show how to use a MAQC sample dataset to test the installation of IsoEM2 and IsoDE2. IsoEM2 takes as input alignments given in SAM format. To perform gene expression quantification and differential expression analysis, you need to run IsoEM2 first on each of the two samples you want to compare. Then you run IsoDE2 on the results generated by IsoEM2 for both samples. To run IsoEM2 and IsoDE2 on the sample data set we provide, follow the following steps. A. Download and install IsoEM2/IsoDE2. Latest version can be found at: https://github.com/mandricigor/isoem2 B. Download the sample data archive from http://dna.engr.uconn.edu/~software/IsoEM/testdata/IsoEM2IsoDE2-MAQC-Sample.zip NOTE: the file is large, because it contains the 2 SAM files The archive contains the following files and directories: ===================== SAMPLE-README.TXT - This file sample.sh - Script to run all the steps presented below hg19Ensembl64.gtf - Human Ensembl64 isoforms in GTF format hg19Ensembl64TranscriptToGene.txt - Mappings of isoforms to genes POZ-126_269_UHRR_1.sam - MAQC UHRR ION Torrent SAM file LUC-140_265_HBRR_1.sam - MAQC HBRR ION Torrent SAM file ===================== C. Unzip the archive. D. Take a look at the sample.sh script. The script is intended to run everything needed. Here's a summary of all the steps executed by this script: 1. Runs IsoEM2 for the UHRR sample 2. Runs IsoEM2 for the HBRR sample 3. Runs IsoDE2 on the output of isoboot E. Change IsoEM2Path in sample.sh to point to the installation directory of IsoEM2 on your system F. Run sample.sh NOTE: ===== This is a real dataset, and the parameters are realistic parameters to generate meanigful output. If the script runs correctly, you'll end up with the following output =================== POZ-126_269_UHR_1 - directory with IsoEM2 output for POZ-126_269_UHRR_1.sam. See directory structure below. LUC-140_265_HBR_1 - directory with IsoEM2 output for LUC-140_265_HBRR_1.sam. See directory structure below. output.txt_isoFPKM - IsoDE2 output for isoform FPKMs output.txt_geneFPKM - IsoDE2 output for gene FPKMs output.txt_isoTPM - IsoDE2 output for isoform TPMs output.txt_geneTPM - IsoDE2 output for gene TPMs IsoEM2 output directory structure | - output | | | - Isoforms | | | | | - iso_fpkm_estimates | | - iso_tpm_estimates | - Genes | | | - iso_fpkm_estimates | - iso_tpm_estimates - ConfidenceIntervals | | | - iso_fpkm_ci | - iso_tpm_ci | - gene_fpkm_ci | - gene_tpm_ci - boostrap.tar.gz =================== If you have any questions or suggestions please contact: Sahar Al Seesi (sahar@engr.uconn.edu) Igor Mandric (imandric1@student.gsu.edu) Ion Mandoiu (ion@engr.uconn.edu) Alex Zelikovski (alexz@cs.gsu.edu)