Run-Length Compressed Indexes Are Superior for Highly Repetitive Sequence Collections

Show simple item record Siren, Jouni fi Välimäki, Niko fi Mäkinen, Veli fi Navarro, Gonzalo fi 2008-12-04T08:03:31Z fi 2009-05-28T10:35:19Z 2008-12-04T08:03:31Z fi 2009-05-28T10:35:19Z 2008 fi
dc.identifier.citation In Proc. 15th Symposium on String Processing and Information Retrieval, LNCS 5280, pp. 164–175, 2008. fi
dc.identifier.uri fi
dc.description.abstract A repetitive sequence collection is one where portions of a base sequence of length n are repeated many times with small variations, forming a collection of total length N. Examples of such collections are version control data and genome sequences of individuals, where the differences can be expressed by lists of basic edit operations. This paper is devoted to studying ways to store massive sets of highly repetitive sequence collections in space-efficient manner so that retrieval of the content as well as queries on the content of the sequences can be provided time-efficiently. We show that the state-of-the-art entropy-bound full-text self-indexes do not yet provide satisfactory space bounds for this specific task. We engineer some new structures that use run-length encoding and give empirical evidence that these structures are superior to the current structures. fi
dc.publisher Springer-Verlag fi
dc.title Run-Length Compressed Indexes Are Superior for Highly Repetitive Sequence Collections fi
dc.type Preprint fi
dc.identifier.laitoskoodi 523 fi
dc.creator.corporateName Tietojenkäsittelytieteen laitos fi
dc.creator.corporateName Department of Computer Science en
dc.creator.corporateName Datavetenskap, Institutionen för sv

Files in this item

Total number of downloads: Loading...

Files Size Format View
paper.pdf 173.7Kb PDF View/Open

This item appears in the following Collection(s)

Show simple item record