Fast and efficient Rmap assembly using the Bi-labelled de Bruijn graph

Show full item record



Permalink

http://hdl.handle.net/10138/331589

Citation

Mukherjee , K , Rossi , M , Salmela , L & Boucher , C 2021 , ' Fast and efficient Rmap assembly using the Bi-labelled de Bruijn graph ' , Algorithms for Molecular Biology , vol. 16 , 6 . https://doi.org/10.1186/s13015-021-00182-9

Title: Fast and efficient Rmap assembly using the Bi-labelled de Bruijn graph
Author: Mukherjee, Kingshuk; Rossi, Massimiliano; Salmela, Leena; Boucher, Christina
Contributor: University of Helsinki, Department of Computer Science
University of Helsinki, Algorithmic Bioinformatics
Date: 2021-05-25
Language: eng
Number of pages: 13
Belongs to series: Algorithms for Molecular Biology
ISSN: 1748-7188
URI: http://hdl.handle.net/10138/331589
Abstract: Genome wide optical maps are high resolution restriction maps that give a unique numeric representation to a genome. They are produced by assembling hundreds of thousands of single molecule optical maps, which are called Rmaps. Unfortunately, there are very few choices for assembling Rmap data. There exists only one publicly-available non-proprietary method for assembly and one proprietary software that is available via an executable. Furthermore, the publicly-available method, by Valouev et al. (Proc Natl Acad Sci USA 103(43):15770-15775, 2006), follows the overlap-layout-consensus (OLC) paradigm, and therefore, is unable to scale for relatively large genomes. The algorithm behind the proprietary method, Bionano Genomics' Solve, is largely unknown. In this paper, we extend the definition of bi-labels in the paired de Bruijn graph to the context of optical mapping data, and present the first de Bruijn graph based method for Rmap assembly. We implement our approach, which we refer to as rmapper, and compare its performance against the assembler of Valouev et al. (Proc Natl Acad Sci USA 103(43):15770-15775, 2006) and Solve by Bionano Genomics on data from three genomes: E. coli, human, and climbing perch fish (Anabas Testudineus). Our method was able to successfully run on all three genomes. The method of Valouev et al. (Proc Natl Acad Sci USA 103(43):15770-15775, 2006) only successfully ran on E. coli. Moreover, on the human genome rmapper was at least 130 times faster than Bionano Solve, used five times less memory and produced the highest genome fraction with zero mis-assemblies. Our software, rmapper is written in C++ and is publicly available under GNU General Public License at .
Subject: Optical mapping
Single molecule maps
de Bruijn graph
Overlap-layout-consensus
Genome assembly
Mis-assemblies
ORDERED RESTRICTION MAPS
SINGLE-CELL
GENOME
SEQUENCE
VALIDATION
ALGORITHM
ALIGNMENT
ACCURATE
113 Computer and information sciences
Rights:


Files in this item

Total number of downloads: Loading...

Files Size Format View
s13015_021_00182_9.pdf 2.381Mb PDF View/Open

This item appears in the following Collection(s)

Show full item record