Storage and retrieval of individual genomes

Show simple item record

dc.contributor University of Helsinki, Department of Computer Science en
dc.contributor University of Helsinki, Department of Computer Science en
dc.contributor University of Helsinki, Department of Computer Science en
dc.contributor.author Mäkinen, Veli
dc.contributor.author Navarro, Gonzalo
dc.contributor.author Sirén, Jouni
dc.contributor.author Välimäki, Niko
dc.contributor.editor Batzoglou, Serafim
dc.date.accessioned 2010-12-02T16:41:00Z
dc.date.available 2010-12-02T16:41:00Z
dc.date.issued 2009
dc.identifier.citation Mäkinen , V , Navarro , G , Sirén , J & Välimäki , N 2009 , Storage and retrieval of individual genomes . in S Batzoglou (ed.) , Research in Computational Molecular Biology : 13th Annual International Conference, RECOMB 2009 . Lecture Notes in Computer Science , no. 5541 , Springer , pp. 121-137 , Annual International Conference on Research in Computational Molecular Biology , Tucson, Arizona , United States , 18/05/2009 . https://doi.org/10.1007/978-3-642-02008-7_9 en
dc.identifier.citation conference en
dc.identifier.isbn 978-3-642-02007-0
dc.identifier.other PURE: 9313494
dc.identifier.other PURE UUID: ef6db02d-ab10-4446-8b3b-8f3673930326
dc.identifier.other dawa_publication: 195959
dc.identifier.other Scopus: 67650318250
dc.identifier.other ORCID: /0000-0003-4454-1493/work/28882768
dc.identifier.uri http://hdl.handle.net/10138/23701
dc.description.abstract A repetitive sequence collection is one where portions of a base sequence of length n are repeated many times with small variations, forming a collection of total length N. Examples of such collections are version control data and genome sequences of individuals, where the differences can be expressed by lists of basic edit operations. Flexible and efficient data analysis on a such typically huge collection is plausible using suffix trees. However, suffix tree occupies O(N log N) bits, which very soon inhibits in-memory analyses. Recent advances in full-text self-indexing reduce the space of suffix tree to O(N log σ) bits, where σ is the alphabet size. In practice, the space reduction is more than 10-fold, for example on suffix tree of Human Genome. However, this reduction factor remains constant when more sequences are added to the collection. We develop a new family of self-indexes suited for the repetitive sequence collection setting. Their expected space requirement depends only on the length n of the base sequence and the number s of variations in its repeated copies. That is, the space reduction factor is no longer constant, but depends on N / n. We believe the structures developed in this work will provide a fundamental basis for storage and retrieval of individual genomes as they become available due to rapid progress in the sequencing technologies. en
dc.format.extent 17
dc.language.iso eng
dc.publisher Springer
dc.relation.ispartof Research in Computational Molecular Biology 13th Annual International Conference, RECOMB 2009
dc.relation.ispartofseries Lecture Notes in Computer Science
dc.rights en
dc.subject 113 Computer and information sciences en
dc.title Storage and retrieval of individual genomes en
dc.type Conference contribution
dc.identifier.doi https://doi.org/10.1007/978-3-642-02008-7_9
dc.type.uri info:eu-repo/semantics/other
dc.contributor.pbl
dc.contributor.pbl
dc.contributor.pbl

Files in this item

Total number of downloads: Loading...

Files Size Format View
recomb2009.pdf 296.8Kb PDF View/Open

This item appears in the following Collection(s)

Show simple item record