Distinguisher Selection for DNA Barcoding
String barcoding is a recently introduced technique for genomic-based identification of microorganisms such as viruses or bacteria. Microorganism identification is performed by spotting or synthesizing the Watson-Crick complements of selected distinguisher strings on a microarray and hybridizing it to fluorescently labeled DNA extracted from the unknown microorganism. Alternatively, distinguisher complements can be hybridized in solution to the unlabeled DNA of the microorganism, extended with fluorescently labeled dideoxynucleotides in a polymerase mediated reaction, and then hybridized to a microarry consisting of the distinguishers themselves. In each case, the hybridization pattern can be viewed as a string of k zeros and ones, commonly referred to as the barcode of the microorganism. For unambiguous identification, barcodes corresponding to different microorganisms must be distinct.
This software package implements a version of the greedy setcover algorithm for distinguisher selection as described in the reference below. The software enforces constraints on distinguisher length, melting temperature, and GC content. Furthermore, the software enforces distinguisher hybridization specificity via an upper-bound on the maximum weight of a substring shared by two or more distinguishers, where the weight of a substring is obtained by adding the number of A and T bases and twice the number of C and G bases.
DNA-BAR source code
Acknowledgment and Disclaimer
This material is based upon work supported in part by the National Science Foundation under Grants No. IIS-0546457 and DBI-0543365. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.