Data structures based on k-mers for querying large collections of sequencing data sets

Show simple item record

dc.contributor.author Marchet, Camille
dc.contributor.author Boucher, Christina
dc.contributor.author Puglisi, Simon J.
dc.contributor.author Medvedev, Paul
dc.contributor.author Salson, Mikael
dc.contributor.author Chikhi, Rayan
dc.date.accessioned 2021-03-15T05:48:01Z
dc.date.available 2021-03-15T05:48:01Z
dc.date.issued 2021-01
dc.identifier.citation Marchet , C , Boucher , C , Puglisi , S J , Medvedev , P , Salson , M & Chikhi , R 2021 , ' Data structures based on k-mers for querying large collections of sequencing data sets ' , Genome Research , vol. 31 , no. 1 . https://doi.org/10.1101/gr.260604.119
dc.identifier.other PURE: 160805155
dc.identifier.other PURE UUID: 217b9119-da89-4660-a12b-b07e81954bd7
dc.identifier.other WOS: 000607253900001
dc.identifier.other ORCID: /0000-0001-7668-7636/work/90906981
dc.identifier.uri http://hdl.handle.net/10138/327976
dc.description.abstract High-throughput sequencing data sets are usually deposited in public repositories (e.g., the European Nucleotide Archive) to ensure reproducibility. As the amount of data has reached petabyte scale, repositories do not allow one to perform online sequence searches, yet, such a feature would be highly useful to investigators. Toward this goal, in the last few years several computational approaches have been introduced to index and query large collections of data sets. Here, we propose an accessible survey of these approaches, which are generally based on representing data sets as sets of k-mers. We review their properties, introduce a classification, and present their general intuition. We summarize their performance and highlight their current strengths and limitations. en
dc.format.extent 12
dc.language.iso eng
dc.relation.ispartof Genome Research
dc.rights cc_by
dc.rights.uri info:eu-repo/semantics/openAccess
dc.subject DE-BRUIJN GRAPHS
dc.subject ALIGNMENT-FREE
dc.subject SEARCH
dc.subject QUANTIFICATION
dc.subject DATABASES
dc.subject THOUSANDS
dc.subject READS
dc.subject 1182 Biochemistry, cell and molecular biology
dc.subject 1184 Genetics, developmental biology, physiology
dc.title Data structures based on k-mers for querying large collections of sequencing data sets en
dc.type Review Article
dc.contributor.organization Department of Computer Science
dc.contributor.organization Helsinki Institute for Information Technology
dc.contributor.organization Algorithmic Bioinformatics
dc.contributor.organization Bioinformatics
dc.description.reviewstatus Peer reviewed
dc.relation.doi https://doi.org/10.1101/gr.260604.119
dc.relation.issn 1088-9051
dc.rights.accesslevel openAccess
dc.type.version publishedVersion

Files in this item

Total number of downloads: Loading...

Files Size Format View
Genome_Res._2021_Marchet_1_12.pdf 489.3Kb PDF View/Open

This item appears in the following Collection(s)

Show simple item record