SILP2 — ILP-based Maximum Likelihood Genome Scaffolding
Introduction
Scaffolding is the final step in the genome assembly process in which contigs are oriented, ordered, and connected into larger scaffold structures by using read pairs generated from shotgun libraries with large insert length. SILP2 is a stand-alone scaffolding tool that generates maximum likelihood scaffolds via integer linear programming (ILP). SILP2 achieves high scalability without sacrificing optimality by solving the large ILP formulations required to scaffold mammalian-size genomes via a non-serial dynamic programming (NSDP) approach based on decomposing the scaffolding graph into 3-connected components.
Source code and sample test data
A zip file including the SILP2 source code and a small test dataset can be downloaded here. After unzipping the file, build and test SILP2 by running
cd SILP2 ./build.sh cd scaf_example ./run.sh
For SILP2 usage see the included README file.
Note that SILP2 needs bowtie2 to be installed on the default $PATH and also requires python packages numpy, networkx, and cplex. IBM ILOG CPLEX is available free of charge for academic use through the IBM Academic Initiative.
Source code on GitHub
The SILP2 source code can also be downloaded from https://github.com/jim-bo/silp2
Contact Information
Acknowledgment and Disclaimer
This material is based upon work supported in part by the Agriculture and Food Research Initiative Competitive Grant No. 2011-67016-30331 from the USDA National Institute of Food and Agriculture and awards IIS-0916401 and IIS-0916948 from NSF. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the funding agencies.