Browsing by Subject "genome assembly"

Sort by: Order: Results:

Now showing items 1-8 of 8
  • Sætre, Camilla Lo Cascio; Eroukhmanoff, Fabrice; Rönkä, Katja; Kluen, Edward; Thorogood, Rose; Torrance, James; Tracey, Alan; Chow, William; Pelan, Sarah; Howe, Kerstin; Jakobsen, Kjetill S.; Tørresen, Ole K. (2021)
    The reed warbler (Acrocephalus scirpaceus) is a long-distance migrant passerine with a wide distribution across Eurasia. This species has fascinated researchers for decades, especially its role as host of a brood parasite, and its capacity for rapid phenotypic change in the face of climate change. Currently, it is expanding its range northwards in Europe, and is altering its migratory behavior in certain areas. Thus, there is great potential to discover signs of recent evolution and its impact on the genomic composition of the reed warbler. Here, we present a high-quality reference genome for the reed warbler, based on PacBio, 10×, and Hi-C sequencing. The genome has an assembly size of 1,075,083,815 bp with a scaffold N50 of 74,438,198 bp and a contig N50 of 12,742,779 bp. BUSCO analysis using aves_odb10 as a model showed that 95.7% of BUSCO genes were complete. We found unequivocal evidence of two separate macrochromosomal fusions in the reed warbler genome, in addition to the previously identified fusion between chromosome Z and a part of chromosome 4A in the Sylvioidea superfamily. We annotated 14,645 protein-coding genes, and a BUSCO analysis of the protein sequences indicated 97.5% completeness. This reference genome will serve as an important resource, and will provide new insights into the genomic effects of evolutionary drivers such as coevolution, range expansion, and adaptations to climate change, as well as chromosomal rearrangements in birds.
  • Varadharajan, Srinidhi; Rastas, Pasi; Löytynoja, Ari; Matschiner, Michael; Calboli, Federico C. F.; Guo, Baocheng; Nederbragt, Alexander J.; Jakobsen, Kjetill S.; Merilä, Juha (2019)
    The Gasterosteidae fish family hosts several species that are important models for eco-evolutionary, genetic, and genomic research. In particular, a wealth of genetic and genomic data has been generated for the three-spined stickleback (Gasterosteus aculeatus), the "ecology's supermodel," whereas the genomic resources for the nine-spined stickleback (Pungitius pungitius) have remained relatively scarce. Here, we report a high-quality chromosome-level genome assembly of P. pungitius consisting of 5,303 contigs (N50 = 1.2Mbp) with a total size of 521 Mbp. These contigs were mapped to 21 linkage groups using a high-density linkage map, yielding a final assembly with 98.5% BUSCO completeness. A total of 25,062 protein-coding genes were annotated, and about 23% of the assembly was found to consist of repetitive elements. A comprehensive analysis of repetitive elements uncovered centromere-specific tandem repeats and provided insights into the evolution of retrotransposons. A multigene phylogenetic analysis inferred a divergence time of about 26 million years ago (Ma) between nine- and three-spined sticklebacks, which is far older than the commonly assumed estimate of 13 Ma. Compared with the three-spined stickleback, we identified an additional duplication of several genes in the hemoglobin cluster. Sequencing data from populations adapted to different environments indicated potential copy number variations in hemoglobin genes. Furthermore, genome-wide synteny comparisons between three- and nine-spined sticklebacks identified chromosomal rearrangements underlying the karyotypic differences between the two species. The high-quality chromosome-scale assembly of the nine-spined stickleback genome obtained with long-read sequencing technology provides a crucial resource for comparative and population genomic investigations of stickleback fishes and teleosts.
  • Mascher, Martin; Muehlbauer, Gary J.; Rokhsar, Daniel S.; Chapman, Jarrod; Schmutz, Jeremy; Barry, Kerrie; Munoz-Amatriain, Maria; Close, Timothy J.; Wise, Roger P.; Schulman, Alan H.; Himmelbach, Axel; Mayer, Klaus F. X.; Scholz, Uwe; Poland, Jesse A.; Stein, Nils; Waugh, Robbie (2013)
  • Kivikoski, Mikko; Rastas, Pasi; Löytynoja, Ari; Merila, Juha (2021)
    We describe an integrative approach to improve contiguity and haploidy of a reference genome assembly and demonstrate its impact with practical examples. With two novel features of Lep-Anchor software and a combination of dense linkage maps, overlap detection and bridging long reads, we generated an improved assembly of the nine-spined stickleback (Pungitius pungitius) reference genome. We were able to remove a significant number of haplotypic contigs, detect more genetic variation and improve the contiguity of the genome, especially that of X chromosome. However, improved scaffolding cannot correct for mosaicism of erroneously assembled contigs, demonstrated by a de novo assembly of a 1.6-Mbp inversion. Qualitatively similar gains were obtained with the genome of three-spined stickleback (Gasterosteus aculeatus). Since the utility of genome-wide sequencing data in biological research depends heavily on the quality of the reference genome, the improved and fully automated approach described here should be helpful in refining reference genome assemblies.
  • Berner, Daniel; Roesti, Marius; Bilobram, Steven; Chan, Simon K.; Kirk, Heather; Pandoh, Pawan; Taylor, Gregory A.; Zhao, Yongjun; Jones, Steven J. M.; DeFaveri, Jacquelin (2019)
    The threespine stickleback is a geographically widespread and ecologically highly diverse fish that has emerged as a powerful model system for evolutionary genomics and developmental biology. Investigations in this species currently rely on a single high-quality reference genome, but would benefit from the availability of additional, independently sequenced and assembled genomes. We present here the assembly of four new stickleback genomes, based on the sequencing of microfluidic partitioned DNA libraries. The base pair lengths of the four genomes reach 92-101% of the standard reference genome length. Together with their de novo gene annotation, these assemblies offer a resource enhancing genomic investigations in stickleback. The genomes and their annotations are available from the Dryad Digital Repository (
  • BEEHIVE Collaboration; Wymant, Chris; Blanquart, Francois; Golubchik, Tanya; Gall, Astrid; Bakker, Margreet; Bezemer, Daniela; Croucher, Nicholas J.; Hall, Matthew; Hillebregt, Mariska; Ong, Swee Hoe; Ratmann, Oliver; Albert, Jan; Bannert, Norbert; Fellay, Jacques; Fransen, Katrien; Gourlay, Annabelle; Grabowski, M. Kate; Gunsenheimer-Bartmeyer, Barbara; Gunthard, Huldrych F.; Kivelä, Pia; Kouyos, Roger; Laeyendecker, Oliver; Liitsola, Kirsi; Meyer, Laurence; Porter, Kholoud; Ristola, Matti; van Sighem, Ard; Berkhout, Ben; Cornelissen, Marion; Kellam, Paul; Reiss, Peter; Fraser, Christophe (2018)
    Studying the evolution of viruses and their molecular epidemiology relies on accurate viral sequence data, so that small differences between similar viruses can be meaningfully interpreted. Despite its higher throughput and more detailed minority variant data, next-generation sequencing has yet to be widely adopted for HIV. The difficulty of accurately reconstructing the consensus sequence of a quasispecies from reads (short fragments of DNA) in the presence of large between-and within-host diversity, including frequent indels, may have presented a barrier. In particular, mapping (aligning) reads to a reference sequence leads to biased loss of information; this bias can distort epidemiological and evolutionary conclusions. De novo assembly avoids this bias by aligning the reads to themselves, producing a set of sequences called contigs. However contigs provide only a partial summary of the reads, misassembly may result in their having an incorrect structure, and no information is available at parts of the genome where contigs could not be assembled. To address these problems we developed the tool shiver to pre-process reads for quality and contamination, then map them to a reference tailored to the sample using corrected contigs supplemented with the user's choice of existing reference sequences. Run with two commands per sample, it can easily be used for large heterogeneous data sets. We used shiver to reconstruct the consensus sequence and minority variant information from paired-end short-read whole-genome data produced with the Illumina platform, for sixty-five existing publicly available samples and fifty new samples. We show the systematic superiority of mapping to shiver's constructed reference compared with mapping the same reads to the closest of 3,249 real references: median values of 13 bases called differently and more accurately, 0 bases called differently and less accurately, and 205 bases of missing sequence recovered. We also successfully applied shiver to whole-genome samples of Hepatitis C Virus and Respiratory Syncytial Virus. shiver is publicly available from
  • Elbers, Jean P.; Rogers, Mark F.; Perelman, Polina L.; Proskuryakova, Anastasia A.; Serdyukova, Natalia A.; Johnson, Warren E.; Horin, Petr; Corander, Jukka; Murphy, David; Burger, Pamela A. (2019)
    Researchers have assembled thousands of eukaryotic genomes using Illumina reads, but traditional mate-pair libraries cannot span all repetitive elements, resulting in highly fragmented assemblies. However, both chromosome conformation capture techniques, such as Hi-C and Dovetail Genomics Chicago libraries and long-read sequencing, such as Pacific Biosciences and Oxford Nanopore, help span and resolve repetitive regions and therefore improve genome assemblies. One important livestock species of arid regions that does not have a high-quality contiguous reference genome is the dromedary (Camelus dromedarius). Draft genomes exist but are highly fragmented, and a high-quality reference genome is needed to understand adaptation to desert environments and artificial selection during domestication. Dromedaries are among the last livestock species to have been domesticated, and together with wild and domestic Bactrian camels, they are the only representatives of the Camelini tribe, which highlights their evolutionary significance. Here we describe our efforts to improve the North African dromedary genome. We used Chicago and Hi-C sequencing libraries from Dovetail Genomics to resolve the order of previously assembled contigs, producing almost chromosome-level scaffolds. Remaining gaps were filled with Pacific Biosciences long reads, and then scaffolds were comparatively mapped to chromosomes. Long reads added 99.32 Mbp to the total length of the new assembly. Dovetail Chicago and Hi-C libraries increased the longest scaffold over 12-fold, from 9.71 Mbp to 124.99 Mbp and the scaffold N50 over 50-fold, from 1.48 Mbp to 75.02 Mbp. We demonstrate that Illumina de novo assemblies can be substantially upgraded by combining chromosome conformation capture and long-read sequencing.
  • Leskinen, Katarzyna; Pajunen, Maria I.; Vilanova, Miguel Vincente Gomez-Raya; Kiljunen, Saija; Nelson, Andrew; Smith, Darren; Skurnik, Mikael (2020)
    YerA41 is a Myoviridae bacteriophage that was originally isolated due its ability to infect Yersinia ruckeri bacteria, the causative agent of enteric redmouth disease of salmonid fish. Several attempts to determine its genomic DNA sequence using traditional and next generation sequencing technologies failed, indicating that the phage genome is modified in such a way that it is an unsuitable template for PCR amplification and for conventional sequencing. To determine the YerA41 genome sequence, we performed RNA-sequencing from phage-infected Y. ruckeri cells at different time points post-infection. The host-genome specific reads were subtracted and de novo assembly was performed on the remaining unaligned reads. This resulted in nine phage-specific scaffolds with a total length of 143 kb that shared only low level and scattered identity to known sequences deposited in DNA databases. Annotation of the sequences revealed 201 predicted genes, most of which found no homologs in the databases. Proteome studies identified altogether 63 phage particle-associated proteins. The RNA-sequencing data were used to characterize the transcriptional control of YerA41 and to investigate its impact on the bacterial gene expression. Overall, our results indicate that RNA-sequencing can be successfully used to obtain the genomic sequence of non-sequencable phages, providing simultaneous information about the phage–host interactions during the process of infection.