Browsing by Subject "FORMAT"

Sort by: Order: Results:

Now showing items 1-4 of 4
  • Huang, Jie; Howie, Bryan; McCarthy, Shane; Memari, Yasin; Walter, Klaudia; Min, Josine L.; Danecek, Petr; Malerba, Giovanni; Trabetti, Elisabetta; Zheng, Hou-Feng; Gambaro, Giovanni; Richards, J. Brent; Durbin, Richard; Timpson, Nicholas J.; Marchini, Jonathan; Soranzo, Nicole; UK10K Consortium; Paunio, Tiina (2015)
    Imputing genotypes from reference panels created by whole-genome sequencing (WGS) provides a cost-effective strategy for augmenting the single-nucleotide polymorphism (SNP) content of genome-wide arrays. The UK10K Cohorts project has generated a data set of 3,781 whole genomes sequenced at low depth (average 7x), aiming to exhaustively characterize genetic variation down to 0.1% minor allele frequency in the British population. Here we demonstrate the value of this resource for improving imputation accuracy at rare and low-frequency variants in both a UK and an Italian population. We show that large increases in imputation accuracy can be achieved by re-phasing WGS reference panels after initial genotype calling. We also present a method for combining WGS panels to improve variant coverage and downstream imputation accuracy, which we illustrate by integrating 7,562WGS haplotypes from the UK10K project with 2,184 haplotypes from the 1000 Genomes Project. Finally, we introduce a novel approximation that maintains speed without sacrificing imputation accuracy for rare variants.
  • Li, Zitong; Kemppainen, Petri; Rastas, Pasi; Merilä, Juha (2018)
    Genomewide association studies (GWAS) aim to identify genetic markers strongly associated with quantitative traits by utilizing linkage disequilibrium (LD) between candidate genes and markers. However, because of LD between nearby genetic markers, the standard GWAS approaches typically detect a number of correlated SNPs covering long genomic regions, making corrections for multiple testing overly conservative. Additionally, the high dimensionality of modern GWAS data poses considerable challenges for GWAS procedures such as permutation tests, which are computationally intensive. We propose a cluster-based GWAS approach that first divides the genome into many large nonoverlapping windows and uses linkage disequilibrium network analysis in combination with principal component (PC) analysis as dimensional reduction tools to summarize the SNP data to independent PCs within clusters of loci connected by high LD. We then introduce single- and multilocus models that can efficiently conduct the association tests on such high-dimensional data. The methods can be adapted to different model structures and used to analyse samples collected from the wild or from biparental F-2 populations, which are commonly used in ecological genetics mapping studies. We demonstrate the performance of our approaches with two publicly available data sets from a plant (Arabidopsis thaliana) and a fish (Pungitius pungitius), as well as with simulated data.
  • Bertolotti, Alicia C.; Layer, Ryan M.; Gundappa, Manu Kumar; Gallagher, Michael D.; Pehlivanoglu, Ege; Nome, Torfinn; Robledo, Diego; Kent, Matthew P.; Rosaeg, Line L.; Holen, Matilde M.; Mulugeta, Teshome D.; Ashton, Thomas J.; Hindar, Kjetil; Saegrov, Harald; Floro-Larsen, Bjorn; Erkinaro, Jaakko; Primmer, Craig R.; Bernatchez, Louis; Martin, Samuel A. M.; Johnston, Ian A.; Sandve, Simen R.; Lien, Sigbjorn; Macqueen, Daniel J. (2020)
    Structural variants (SVs) are a major source of genetic and phenotypic variation, but remain challenging to accurately type and are hence poorly characterized in most species. We present an approach for reliable SV discovery in non-model species using whole genome sequencing and report 15,483 high-confidence SVs in 492 Atlantic salmon (Salmo salar L.) sampled from a broad phylogeographic distribution. These SVs recover population genetic structure with high resolution, include an active DNA transposon, widely affect functional features, and overlap more duplicated genes retained from an ancestral salmonid autotetraploidization event than expected. Changes in SV allele frequency between wild and farmed fish indicate polygenic selection on behavioural traits during domestication, targeting brain-expressed synaptic networks linked to neurological disorders in humans. This study offers novel insights into the role of SVs in genome evolution and the genetic architecture of domestication traits, along with resources supporting reliable SV discovery in non-model species.
  • Gershony, Liza C.; Belanger, Janelle M.; Hytönen, Marjo K.; Lohi, Hannes; Oberbauer, Anita M. (2021)
    In dogs, symmetrical lupoid onychodystrophy (SLO) results in nail loss and an abnormal regrowth of the claws. In Bearded Collies, an autoimmune nature has been suggested because certain dog leukocyte antigen (DLA) class II haplotypes are associated with the condition. A genome-wide association study of the Bearded Collie revealed two regions of association that conferred risk for disease: one on canine chromosome (CFA) 12 that encompasses the DLA genes, and one on CFA17. Case-control association was employed on whole genome sequencing data to uncover putative causative variants in SLO within the CFA12 and CFA17 associated regions. Genotype imputation was then employed to refine variants of interest. Although no SLO-associated protein-coding variants were identified on CFA17, multiple variants, many with predicted damaging effects, were identified within potential candidate genes on CFA12. Furthermore, many potentially damaging alleles were fully correlated with the presence of DLA class II risk haplotypes for SLO, suggesting that the variants may reflect DLA class II haplotype association with disease or vice versa. Strong linkage disequilibrium in the region precluded the ability to isolate and assess the individual or combined effect of variants on disease development. Nonetheless, all were predictive of risk for SLO and, with judicious assessment, their application in selective breeding may prove useful to reduce the incidence of SLO in the breed.