Belazzougui , D , Cunial , F , Karkkainen , J & Makinen , V 2020 , ' Linear-time String Indexing and Analysis in Small Space ' , ACM Transactions on Algorithms , vol. 16 , no. 2 , 17 . https://doi.org/10.1145/3381417
Title: | Linear-time String Indexing and Analysis in Small Space |
Author: | Belazzougui, Djamal; Cunial, Fabio; Karkkainen, Juha; Makinen, Veli |
Contributor: | University of Helsinki, Helsinki Institute for Information Technology University of Helsinki, Helsinki Institute for Information Technology University of Helsinki, Department of Computer Science University of Helsinki, Department of Computer Science |
Date: | 2020-03 |
Language: | eng |
Number of pages: | 54 |
Belongs to series: | ACM Transactions on Algorithms |
ISSN: | 1549-6325 |
URI: | http://hdl.handle.net/10138/325331 |
Abstract: | The field of succinct data structures has flourished over the past 16 years. Starting from the compressed suffix array by Grossi and Vitter (STOC 2000) and the FM-index by Ferragina and Manzini (FOCS 2000), a number of generalizations and applications of string indexes based on the Burrows-Wheeler transform (BWT) have been developed, all taking an amount of space that is close to the input size in bits. In many large-scale applications, the construction of the index and its usage need to be considered as one unit of computation. For example, one can compare two genomes by building a common index for their concatenation and by detecting common substructures by querying the index. Efficient string indexing and analysis in small space lies also at the core of a number of primitives in the data-intensive field of high-throughput DNA sequencing. We report the following advances in string indexing and analysis: We show that the BWT of a string T is an element of {1, . . . , sigma}(n) can be built in deterministic O(n) time using just O(n log sigma) bits of space, where sigma We also show how to build many of the existing indexes based on the BWT, such as the compressed suffix array, the compressed suffix tree, and the bidirectional BWT index, in randomized O(n) time and in O(n log sigma) bits of space. The previously fastest construction algorithms for BWT, compressed suffix array and compressed suffix tree, which used O(n log sigma) bits of space, took O(n log log sigma) time for the first two structures and O(n log(epsilon) n) time for the third, where. is any positive constant smaller than one. Alternatively, the BWT could be previously built in linear time if one was willing to spend O(n log sigma log log(sigma) n) bits of space. Contrary to the state-of-the-art, our bidirectional BWT index supports every operation in constant time per element in its output. |
Subject: |
Compact data structures
compressed indexes Burrows-Wheeler transform suffix array suffix tree suffix-link tree compressed suffix array compressed suffix tree bidirectional BWT index partial rank query monotone minimal perfect hash function matching statistics maximal repeat maximal unique match maximal exact match minimal absent word string kernel BURROWS-WHEELER TRANSFORM SUFFIX ARRAYS SUCCINCT REPRESENTATIONS CONSTRUCTION RETRIEVAL TREES ALGORITHMS STORAGE SETS 113 Computer and information sciences |
Rights: |
Total number of downloads: Loading...
Files | Size | Format | View |
---|---|---|---|
1609.06378.pdf | 1.054Mb |
View/ |