Browsing by Subject "THOUSANDS"

Sort by: Order: Results:

Now showing items 1-2 of 2
  • Marchet, Camille; Boucher, Christina; Puglisi, Simon J.; Medvedev, Paul; Salson, Mikael; Chikhi, Rayan (2021)
    High-throughput sequencing data sets are usually deposited in public repositories (e.g., the European Nucleotide Archive) to ensure reproducibility. As the amount of data has reached petabyte scale, repositories do not allow one to perform online sequence searches, yet, such a feature would be highly useful to investigators. Toward this goal, in the last few years several computational approaches have been introduced to index and query large collections of data sets. Here, we propose an accessible survey of these approaches, which are generally based on representing data sets as sets of k-mers. We review their properties, introduce a classification, and present their general intuition. We summarize their performance and highlight their current strengths and limitations.
  • Mitt, Mario; Kals, Mart; Parn, Kalle; Gabriel, Stacey B.; Lander, Eric S.; Palotie, Aarno; Ripatti, Samuli; Morris, Andrew P.; Metspalu, Andres; Esko, Tonu; Magi, Reedik; Palta, Priit (2017)
    Genetic imputation is a cost-efficient way to improve the power and resolution of genome-wide association (GWA) studies. Current publicly accessible imputation reference panels accurately predict genotypes for common variants with minor allele frequency (MAF) >= 5% and low-frequency variants (0.5