BARCOSEL : a tool for selecting an optimal barcode set for high-throughput sequencing

Show simple item record

dc.contributor University of Helsinki, Research Centre for Ecological Change en
dc.contributor University of Helsinki, Institute of Biotechnology en
dc.contributor University of Helsinki, Institute of Biotechnology en
dc.contributor University of Helsinki, Institute of Biotechnology en
dc.contributor University of Helsinki, Institute of Biotechnology en
dc.contributor University of Helsinki, Institute of Biotechnology en
dc.contributor.author Somervuo, Panu
dc.contributor.author Koskinen, Patrik
dc.contributor.author Mei, Peng
dc.contributor.author Holm, Liisa
dc.contributor.author Auvinen, Petri
dc.contributor.author Paulin, Lars
dc.date.accessioned 2018-08-22T12:34:01Z
dc.date.available 2018-08-22T12:34:01Z
dc.date.issued 2018-07-05
dc.identifier.citation Somervuo , P , Koskinen , P , Mei , P , Holm , L , Auvinen , P & Paulin , L 2018 , ' BARCOSEL : a tool for selecting an optimal barcode set for high-throughput sequencing ' , BMC Bioinformatics , vol. 19 , no. 257 , 257 . https://doi.org/10.1186/s12859-018-2262-7 en
dc.identifier.issn 1471-2105
dc.identifier.other PURE: 115213880
dc.identifier.other PURE UUID: c6c2c2e3-2277-4bc5-8496-b9b58ea6c176
dc.identifier.other WOS: 000437916400001
dc.identifier.other Scopus: 85049595802
dc.identifier.uri http://hdl.handle.net/10138/238990
dc.description.abstract Background: Current high-throughput sequencing platforms provide capacity to sequence multiple samples in parallel. Different samples are labeled by attaching a short sample specific nucleotide sequence, barcode, to each DNA molecule prior pooling them into a mix containing a number of libraries to be sequenced simultaneously. After sequencing, the samples are binned by identifying the barcode sequence within each sequence read. In order to tolerate sequencing errors, barcodes should be sufficiently apart from each other in sequence space. An additional constraint due to both nucleotide usage and basecalling accuracy is that the proportion of different nucleotides should be in balance in each barcode position. The number of samples to be mixed in each sequencing run may vary and this introduces a problem how to select the best subset of available barcodes at sequencing core facility for each sequencing run. There are plenty of tools available for de novo barcode design, but they are not suitable for subset selection. Results: We have developed a tool which can be used for three different tasks: 1) selecting an optimal barcode set from a larger set of candidates, 2) checking the compatibility of user-defined set of barcodes, e.g. whether two or more libraries with existing barcodes can be combined in a single sequencing pool, and 3) augmenting an existing set of barcodes. In our approach the selection process is formulated as a minimization problem. We define the cost function and a set of constraints and use integer programming to solve the resulting combinatorial problem. Based on the desired number of barcodes to be selected and the set of candidate sequences given by user, the necessary constraints are automatically generated and the optimal solution can be found. The method is implemented in C programming language and web interface is available at http://ekhidna2.biocenter.helsinki.fi/barcosel. Conclusions: Increasing capacity of sequencing platforms raises the challenge of mixing barcodes. Our method allows the user to select a given number of barcodes among the larger existing barcode set so that both sequencing errors are tolerated and the nucleotide balance is optimized. The tool is easy to access via web browser. en
dc.format.extent 6
dc.language.iso eng
dc.relation.ispartof BMC Bioinformatics
dc.rights en
dc.subject Barcode en
dc.subject DNA en
dc.subject Integer programming en
dc.subject Multiplexing en
dc.subject Optimization en
dc.subject Sequencing en
dc.subject 1184 Genetics, developmental biology, physiology en
dc.subject 113 Computer and information sciences en
dc.title BARCOSEL : a tool for selecting an optimal barcode set for high-throughput sequencing en
dc.type Article
dc.description.version Peer reviewed
dc.identifier.doi https://doi.org/10.1186/s12859-018-2262-7
dc.type.uri info:eu-repo/semantics/other
dc.type.uri info:eu-repo/semantics/publishedVersion
dc.contributor.pbl
dc.contributor.pbl
dc.contributor.pbl
dc.contributor.pbl
dc.contributor.pbl

Files in this item

Total number of downloads: Loading...

Files Size Format View
s12859_018_2262_7.pdf 704.7Kb PDF View/Open

This item appears in the following Collection(s)

Show simple item record