Linear-time String Indexing and Analysis in Small Space

Show full item record



Permalink

http://hdl.handle.net/10138/325331

Citation

Belazzougui , D , Cunial , F , Karkkainen , J & Makinen , V 2020 , ' Linear-time String Indexing and Analysis in Small Space ' , ACM Transactions on Algorithms , vol. 16 , no. 2 , 17 . https://doi.org/10.1145/3381417

Title: Linear-time String Indexing and Analysis in Small Space
Author: Belazzougui, Djamal; Cunial, Fabio; Karkkainen, Juha; Makinen, Veli
Contributor: University of Helsinki, Helsinki Institute for Information Technology
University of Helsinki, Helsinki Institute for Information Technology
University of Helsinki, Department of Computer Science
University of Helsinki, Department of Computer Science
Date: 2020-03
Language: eng
Number of pages: 54
Belongs to series: ACM Transactions on Algorithms
ISSN: 1549-6325
URI: http://hdl.handle.net/10138/325331
Abstract: The field of succinct data structures has flourished over the past 16 years. Starting from the compressed suffix array by Grossi and Vitter (STOC 2000) and the FM-index by Ferragina and Manzini (FOCS 2000), a number of generalizations and applications of string indexes based on the Burrows-Wheeler transform (BWT) have been developed, all taking an amount of space that is close to the input size in bits. In many large-scale applications, the construction of the index and its usage need to be considered as one unit of computation. For example, one can compare two genomes by building a common index for their concatenation and by detecting common substructures by querying the index. Efficient string indexing and analysis in small space lies also at the core of a number of primitives in the data-intensive field of high-throughput DNA sequencing. We report the following advances in string indexing and analysis: We show that the BWT of a string T is an element of {1, . . . , sigma}(n) can be built in deterministic O(n) time using just O(n log sigma) bits of space, where sigma We also show how to build many of the existing indexes based on the BWT, such as the compressed suffix array, the compressed suffix tree, and the bidirectional BWT index, in randomized O(n) time and in O(n log sigma) bits of space. The previously fastest construction algorithms for BWT, compressed suffix array and compressed suffix tree, which used O(n log sigma) bits of space, took O(n log log sigma) time for the first two structures and O(n log(epsilon) n) time for the third, where. is any positive constant smaller than one. Alternatively, the BWT could be previously built in linear time if one was willing to spend O(n log sigma log log(sigma) n) bits of space. Contrary to the state-of-the-art, our bidirectional BWT index supports every operation in constant time per element in its output.
Subject: Compact data structures
compressed indexes
Burrows-Wheeler transform
suffix array
suffix tree
suffix-link tree
compressed suffix array
compressed suffix tree
bidirectional BWT index
partial rank query
monotone minimal perfect hash function
matching statistics
maximal repeat
maximal unique match
maximal exact match
minimal absent word
string kernel
BURROWS-WHEELER TRANSFORM
SUFFIX ARRAYS
SUCCINCT REPRESENTATIONS
CONSTRUCTION
RETRIEVAL
TREES
ALGORITHMS
STORAGE
SETS
113 Computer and information sciences
Rights:


Files in this item

Total number of downloads: Loading...

Files Size Format View
1609.06378.pdf 1.054Mb PDF View/Open

This item appears in the following Collection(s)

Show full item record