SILP2 — ILP-based Maximum Likelihood Genome Scaffolding

Introduction

Scaffolding is the final step in the genome assembly process in which contigs are oriented, ordered, and connected into larger scaffold structures by using read pairs generated from shotgun libraries with large insert length. SILP2 is a stand-alone scaffolding tool that generates maximum likelihood scaffolds via integer linear programming (ILP). SILP2 achieves high scalability without sacrificing optimality by solving the large ILP formulations required to scaffold mammalian-size genomes via a non-serial dynamic programming (NSDP) approach based on decomposing the scaffolding graph into 3-connected components.

Source code and sample test data

A zip file including the SILP2 source code and a small test dataset can be downloaded here. After unzipping the file, build and test SILP2 by running

cd SILP2
./build.sh
cd scaf_example
./run.sh

For SILP2 usage see the included README file.

Note that SILP2 needs bowtie2 to be installed on the default $PATH and also requires python packages numpy, networkx, and cplex. IBM ILOG CPLEX is available free of charge for academic use through the IBM Academic Initiative.

Source code on GitHub

The SILP2 source code can also be downloaded from https://github.com/jim-bo/silp2

Contact Information

james.lindsay@engr.uconn.edu

ion@engr.uconn.edu

alexz@cs.gsu.edu

Related Publications

Related Presentations

Acknowledgment and Disclaimer

This material is based upon work supported in part by the Agriculture and Food Research Initiative Competitive Grant No. 2011-67016-30331 from the USDA National Institute of Food and Agriculture and awards IIS-0916401 and IIS-0916948 from NSF. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the funding agencies.