Data structures based on k-mers for querying large collections of sequencing data sets

Show full item record



Permalink

http://hdl.handle.net/10138/327976

Citation

Marchet , C , Boucher , C , Puglisi , S J , Medvedev , P , Salson , M & Chikhi , R 2021 , ' Data structures based on k-mers for querying large collections of sequencing data sets ' , Genome Research , vol. 31 , no. 1 . https://doi.org/10.1101/gr.260604.119

Title: Data structures based on k-mers for querying large collections of sequencing data sets
Author: Marchet, Camille; Boucher, Christina; Puglisi, Simon J.; Medvedev, Paul; Salson, Mikael; Chikhi, Rayan
Contributor: University of Helsinki, Department of Computer Science
Date: 2021-01
Language: eng
Number of pages: 12
Belongs to series: Genome Research
ISSN: 1088-9051
URI: http://hdl.handle.net/10138/327976
Abstract: High-throughput sequencing data sets are usually deposited in public repositories (e.g., the European Nucleotide Archive) to ensure reproducibility. As the amount of data has reached petabyte scale, repositories do not allow one to perform online sequence searches, yet, such a feature would be highly useful to investigators. Toward this goal, in the last few years several computational approaches have been introduced to index and query large collections of data sets. Here, we propose an accessible survey of these approaches, which are generally based on representing data sets as sets of k-mers. We review their properties, introduce a classification, and present their general intuition. We summarize their performance and highlight their current strengths and limitations.
Subject: DE-BRUIJN GRAPHS
ALIGNMENT-FREE
SEARCH
QUANTIFICATION
DATABASES
THOUSANDS
READS
1182 Biochemistry, cell and molecular biology
1184 Genetics, developmental biology, physiology
Rights:


Files in this item

Total number of downloads: Loading...

Files Size Format View
Genome_Res._2021_Marchet_1_12.pdf 489.3Kb PDF View/Open

This item appears in the following Collection(s)

Show full item record