Storage and retrieval of individual genomes

Näytä kaikki kuvailutiedot

Permalink

http://hdl.handle.net/10138/23701

Lähdeviite

Mäkinen , V , Navarro , G , Sirén , J & Välimäki , N 2009 , Storage and retrieval of individual genomes . in S Batzoglou (ed.) , Research in Computational Molecular Biology : 13th Annual International Conference, RECOMB 2009 . Lecture Notes in Computer Science , no. 5541 , Springer , pp. 121-137 , Annual International Conference on Research in Computational Molecular Biology , Tucson, Arizona , United States , 18/05/2009 . DOI: 10.1007/978-3-642-02008-7_9

Julkaisun nimi: Storage and retrieval of individual genomes
Tekijä: Mäkinen, Veli; Navarro, Gonzalo; Sirén, Jouni; Välimäki, Niko
Toimittaja(t): Batzoglou, Serafim
Muu tekijä: University of Helsinki, Department of Computer Science (-2009)
University of Helsinki, Department of Computer Science (-2009)
University of Helsinki, Department of Computer Science (-2009)
Kuuluu julkaisusarjaan: Lecture Notes in Computer Science
Kuuluu julkaisusarjaan: Research in Computational Molecular Biology 13th Annual International Conference, RECOMB 2009
ISBN: 978-3-642-02007-0
Tiivistelmä: A repetitive sequence collection is one where portions of a base sequence of length n are repeated many times with small variations, forming a collection of total length N. Examples of such collections are version control data and genome sequences of individuals, where the differences can be expressed by lists of basic edit operations. Flexible and efficient data analysis on a such typically huge collection is plausible using suffix trees. However, suffix tree occupies O(N log N) bits, which very soon inhibits in-memory analyses. Recent advances in full-text self-indexing reduce the space of suffix tree to O(N log σ) bits, where σ is the alphabet size. In practice, the space reduction is more than 10-fold, for example on suffix tree of Human Genome. However, this reduction factor remains constant when more sequences are added to the collection. We develop a new family of self-indexes suited for the repetitive sequence collection setting. Their expected space requirement depends only on the length n of the base sequence and the number s of variations in its repeated copies. That is, the space reduction factor is no longer constant, but depends on N / n. We believe the structures developed in this work will provide a fundamental basis for storage and retrieval of individual genomes as they become available due to rapid progress in the sequencing technologies.
URI: http://hdl.handle.net/10138/23701
Päiväys: 2009
Avainsanat: 113 Computer and information sciences
Tekijänoikeustiedot:


Tiedostot

Latausmäärä yhteensä: Ladataan...

Tiedosto(t) Koko Formaatti Näytä
recomb2009.pdf 296.8KB PDF Avaa tiedosto

Viite kuuluu kokoelmiin:

Näytä kaikki kuvailutiedot