Fast and efficient Rmap assembly using the Bi-labelled de Bruijn graph

Show simple item record Mukherjee, Kingshuk Rossi, Massimiliano Salmela, Leena Boucher, Christina 2021-06-18T08:36:01Z 2021-06-18T08:36:01Z 2021-05-25
dc.identifier.citation Mukherjee , K , Rossi , M , Salmela , L & Boucher , C 2021 , ' Fast and efficient Rmap assembly using the Bi-labelled de Bruijn graph ' , Algorithms for Molecular Biology , vol. 16 , 6 .
dc.identifier.other PURE: 165273601
dc.identifier.other PURE UUID: 56ccd5f3-df2c-47db-8459-a790bcabf816
dc.identifier.other WOS: 000654173500001
dc.identifier.other ORCID: /0000-0002-0756-543X/work/95730334
dc.description.abstract Genome wide optical maps are high resolution restriction maps that give a unique numeric representation to a genome. They are produced by assembling hundreds of thousands of single molecule optical maps, which are called Rmaps. Unfortunately, there are very few choices for assembling Rmap data. There exists only one publicly-available non-proprietary method for assembly and one proprietary software that is available via an executable. Furthermore, the publicly-available method, by Valouev et al. (Proc Natl Acad Sci USA 103(43):15770-15775, 2006), follows the overlap-layout-consensus (OLC) paradigm, and therefore, is unable to scale for relatively large genomes. The algorithm behind the proprietary method, Bionano Genomics' Solve, is largely unknown. In this paper, we extend the definition of bi-labels in the paired de Bruijn graph to the context of optical mapping data, and present the first de Bruijn graph based method for Rmap assembly. We implement our approach, which we refer to as rmapper, and compare its performance against the assembler of Valouev et al. (Proc Natl Acad Sci USA 103(43):15770-15775, 2006) and Solve by Bionano Genomics on data from three genomes: E. coli, human, and climbing perch fish (Anabas Testudineus). Our method was able to successfully run on all three genomes. The method of Valouev et al. (Proc Natl Acad Sci USA 103(43):15770-15775, 2006) only successfully ran on E. coli. Moreover, on the human genome rmapper was at least 130 times faster than Bionano Solve, used five times less memory and produced the highest genome fraction with zero mis-assemblies. Our software, rmapper is written in C++ and is publicly available under GNU General Public License at . en
dc.format.extent 13
dc.language.iso eng
dc.relation.ispartof Algorithms for Molecular Biology
dc.rights cc_by
dc.rights.uri info:eu-repo/semantics/openAccess
dc.subject Optical mapping
dc.subject Single molecule maps
dc.subject de Bruijn graph
dc.subject Overlap-layout-consensus
dc.subject Genome assembly
dc.subject Mis-assemblies
dc.subject SINGLE-CELL
dc.subject GENOME
dc.subject SEQUENCE
dc.subject VALIDATION
dc.subject ALGORITHM
dc.subject ALIGNMENT
dc.subject ACCURATE
dc.subject 113 Computer and information sciences
dc.title Fast and efficient Rmap assembly using the Bi-labelled de Bruijn graph en
dc.type Article
dc.contributor.organization Department of Computer Science
dc.contributor.organization Algorithmic Bioinformatics
dc.contributor.organization Bioinformatics
dc.contributor.organization Helsinki Institute for Information Technology
dc.description.reviewstatus Peer reviewed
dc.relation.issn 1748-7188
dc.rights.accesslevel openAccess
dc.type.version publishedVersion

Files in this item

Total number of downloads: Loading...

Files Size Format View
s13015_021_00182_9.pdf 2.381Mb PDF View/Open

This item appears in the following Collection(s)

Show simple item record