Syotti : scalable bait design for DNA enrichment

Show full item record



Alanko , J N , Slizovskiy , I B , Lokshtanov , D , Gagie , T , Noyes , N R & Boucher , C 2022 , ' Syotti : scalable bait design for DNA enrichment ' , Bioinformatics , vol. 38 , no. SUPPL 1 , pp. 177-184 .

Title: Syotti : scalable bait design for DNA enrichment
Author: Alanko, Jarno N.; Slizovskiy, Ilya B.; Lokshtanov, Daniel; Gagie, Travis; Noyes, Noelle R.; Boucher, Christina
Contributor organization: University of Helsinki
Department of Computer Science
Genome-scale Algorithmics research group / Veli Mäkinen
Algorithmic Bioinformatics
Date: 2022-06-24
Language: eng
Number of pages: 8
Belongs to series: Bioinformatics
ISSN: 1367-4803
Abstract: Motivation: Bait enrichment is a protocol that is becoming increasingly ubiquitous as it has been shown to successfully amplify regions of interest in metagenomic samples. In this method, a set of synthetic probes ('baits') are designed, manufactured and applied to fragmented metagenomic DNA. The probes bind to the fragmented DNA and any unbound DNA is rinsed away, leaving the bound fragments to be amplified for sequencing. Metsky et al. demonstrated that bait-enrichment is capable of detecting a large number of human viral pathogens within metagenomic samples. Results: We formalize the problem of designing baits by defining the Minimum Bait Cover problem, show that the problem is NP-hard even under very restrictive assumptions, and design an efficient heuristic that takes advantage of succinct data structures. We refer to our method as Syotti. The running time of Syotti shows linear scaling in practice, running at least an order of magnitude faster than state-of-the-art methods, including the method of Metsky et al. At the same time, our method produces bait sets that are smaller than the ones produced by the competing methods, while also leaving fewer positions uncovered. Lastly, we show that Syotti requires only 25 min to design baits for a dataset comprised of 3 billion nucleotides from 1000 related bacterial substrains, whereas the method of Metsky et al. shows clearly super-linear running time and fails to process even a subset of 17% of the data in 72 h.
Subject: 1184 Genetics, developmental biology, physiology
113 Computer and information sciences
Peer reviewed: Yes
Rights: cc_by
Usage restriction: openAccess
Self-archived version: publishedVersion

Files in this item

Total number of downloads: Loading...

Files Size Format View
btac226.pdf 862.0Kb PDF View/Open

This item appears in the following Collection(s)

Show full item record