Browsing by Subject "GENOMES"

Sort by: Order: Results:

Now showing items 1-20 of 22
  • Acosta, Nidia Obscura; Mäkinen, Veli; Tomescu, Alexandru I. (2018)
    Background: Reconstructing the genome of a species from short fragments is one of the oldest bioinformatics problems. Metagenomic assembly is a variant of the problem asking to reconstruct the circular genomes of all bacterial species present in a sequencing sample. This problem can be naturally formulated as finding a collection of circular walks of a directed graph G that together cover all nodes, or edges, of G. Approach: We address this problem with the "safe and complete" framework of Tomescu and Medvedev (Research in computational Molecular biology-20th annual conference, RECOMB 9649: 152-163, 2016). An algorithm is called safe if it returns only those walks (also called safe) that appear as subwalk in all metagenomic assembly solutions for G. A safe algorithm is called complete if it returns all safe walks of G. Results: We give graph-theoretic characterizations of the safe walks of G, and a safe and complete algorithm finding all safe walks of G. In the node-covering case, our algorithm runs in time O(m(2) + n(3)), and in the edge-covering case it runs in time O(m(2)n); n and m denote the number of nodes and edges, respectively, of G. This algorithm constitutes the first theoretical tight upper bound on what can be safely assembled from metagenomic reads using this problem formulation.
  • Rooijers, Koos; Kolmeder, Carolin; Juste, Catherine; Dore, Joel; de Been, Mark; Boeren, Sjef; Galan, Pilar; Beauvallet, Christian; de Vos, Willem M.; Schaap, Peter J. (2011)
  • Kankainen, Matti; Ojala, Teija; Holm, Liisa (2012)
  • Ottman, Noora; Huuskonen, Laura; Reunanen, Justus; Boeren, Sjef; Klievink, Judith; Smidt, Hauke; Belzer, Clara; de Vos, Willem M. (2016)
    Akkermansia muciniphila is a common member of the human gut microbiota and belongs to the Planctomycetes-Verrucomicrobia-Chlamydiae superphylum. Decreased levels of A. muciniphila have been associated with many diseases, and thus it is considered to be a beneficial resident of the intestinal mucus layer. Surface-exposed molecules produced by this organism likely play important roles in colonization and communication with other microbes and the host, but the protein composition of the outer membrane (OM) has not been characterized thus far. Herein we set out to identify and characterize A. muciniphila proteins using an integrated approach of proteomics and computational analysis. Sarkosyl extraction and sucrose density-gradient centrifugation methods were used to enrich and fractionate the OM proteome of A. muciniphila. Proteins from these fractions were identified by LC-MS/MS and candidates for OM proteins derived from the experimental approach were subjected to computational screening to verify their location in the cell. In total we identified 79 putative OM and membrane-associated extracellular proteins, and 23 of those were found to differ in abundance between cells of A. muciniphila grown on the natural substrate, mucin, and those grown on the non-mucus sugar, glucose. The identified OM proteins included highly abundant proteins involved in secretion and transport, as well as proteins predicted to take part in formation of the pili-like structures observed in A. muciniphila. The most abundant OM protein was a 95-kD protein, termed PilQ, annotated as a type IV pili secretin and predicted to be involved in the production of pili in A. muciniphila. To verify its location we purified the His-Tag labeled N-terminal domain of PilQ and generated rabbit polyclonal antibodies. Immunoelectron microscopy of thin sections immunolabeled with these antibodies demonstrated the OM localization of PilQ, testifying for its predicted function as a type IV pili secretin in A. muciniphila. As pili structures are known to be involved in the modulation of host immune responses, this provides support for the involvement of OM proteins in the host interaction of A. muciniphila. In conclusion, the characterization of A. muciniphila OM proteome provides valuable information that can be used for further functional and immunological studies.
  • Pradhan, Barun; Cajuso, Tatiana; Katainen, Riku; Sulo, Paivi; Tanskanen, Tomas; Kilpivaara, Outi; Pitkanen, Esa; Aaltonen, Lauri A.; Kauppi, Liisa; Palin, Kimmo (2017)
    Long interspersed nuclear elements-1 (L1s) are a large family of retrotransposons. Retrotransposons are repetitive sequences that are capable of autonomous mobility via a copy-and-paste mechanism. In most copy events, only the L1 sequence is inserted, however, they can also mobilize the flanking non-repetitive region by a process known as 3' transduction. L1 insertions can contribute to genome plasticity and cause potentially tumorigenic genomic instability. However, detecting the activity of a particular source L1 and identifying new insertions stemming from it is a challenging task with current methodological approaches. We developed a long-distance inverse PCR (LDI-PCR) based approach to monitor the mobility of active L1 elements based on their 3' transduction activity. LDI-PCR requires no prior knowledge of the insertion target region. By applying LDI-PCR in conjunction with Nanopore sequencing (Oxford Nanopore Technologies) on one L1 reported to be particularly active in human cancer genomes, we detected 14 out of 15 3' transductions previously identified by whole genome sequencing in two different colorectal tumour samples. In addition we discovered 25 novel highly subclonal insertions. Furthermore, the long sequencing reads produced by LDI-PCR/Nanopore sequencing enabled the identification of both the 5' and 3' junctions and revealed detailed insertion sequence information.
  • Liu, Ying; Demina, Tatiana; Roux, Simon; Aiewsakun, Pakorn; Kazlauskas, Darius M.; Simmonds, Peter; Prangishvili, David; Oksanen, Hanna; Krupovic, Mart (2021)
    The archaeal tailed viruses (arTV), evolutionarily related to tailed double-stranded DNA (dsDNA) bacteriophages of the class Caudoviricetes, represent the most common isolates infecting halophilic archaea. Only a handful of these viruses have been genomically characterized, limiting our appreciation of their ecological impacts and evolution. Here, we present 37 new genomes of haloarchaeal tailed virus isolates, more than doubling the current number of sequenced arTVs. Analysis of all 63 available complete genomes of arTVs, which we propose to classify into 14 new families and 3 orders, suggests ancient divergence of archaeal and bacterial tailed viruses and points to an extensive sharing of genes involved in DNA metabolism and counter defense mechanisms, illuminating common strategies of virus-host interactions with tailed bacteriophages. Coupling of the comparative genomics with the host range analysis on a broad panel of haloarchaeal species uncovered 4 distinct groups of viral tail fiber adhesins controlling the host range expansion. The survey of metagenomes using viral hallmark genes suggests that the global architecture of the arTV community is shaped through recurrent transfers between different biomes, including hypersaline, marine, and anoxic environments.
  • Silva, Milton; Pratas, Diogo; Pinho, Armando J. (2020)
    Background: The increasing production of genomic data has led to an intensified need for models that can cope efficiently with the lossless compression of DNA sequences. Important applications include long-term storage and compression-based data analysis. In the literature, only a few recent articles propose the use of neural networks for DNA sequence compression. However, they fall short when compared with specific DNA compression tools, such as GeCo2. This limitation is due to the absence of models specifically designed for DNA sequences. In this work, we combine the power of neural networks with specific DNA models. For this purpose, we created GeCo3, a new genomic sequence compressor that uses neural networks for mixing multiple context and substitution-tolerant context models. Findings: We benchmark GeCo3 as a reference-free DNA compressor in 5 datasets, including a balanced and comprehensive dataset of DNA sequences, the Y-chromosome and human mitogenome, 2 compilations of archaeal and virus genomes, 4 whole genomes, and 2 collections of FASTQ data of a human virome and ancient DNA. GeCo3 achieves a solid improvement in compression over the previous version (GeCo2) of 2.4%, 7.1%, 6.1%, 5.8%, and 6.0%, respectively. To test its performance as a reference-based DNA compressor, we benchmark GeCo3 in 4 datasets constituted by the pairwise compression of the chromosomes of the genomes of several primates. GeCo3 improves the compression in 12.4%, 11.7%, 10.8%, and 10.1% over the state of the art. The cost of this compression improvement is some additional computational time (1.7-3 times slower than GeCo2). The RAM use is constant, and the tool scales efficiently, independently of the sequence size. Overall, these values outperform the state of the art. Conclusions: GeCo3 is a genomic sequence compressor with a neural network mixing approach that provides additional gains over top specific genomic compressors. The proposed mixing method is portable, requiring only the probabilities of the models as inputs, providing easy adaptation to other data compressors or compression-based data analysis tools. GeCo3 is released under GPLv3 and is available for free download at
  • Norri, Tuukka; Cazaux, Bastien; Dönges, Saska; Valenzuela, Daniel; Mäkinen, Veli (2021)
    Motivation: Variant calling workflows that utilize a single reference sequence are the de facto standard elementary genomic analysis routine for resequencing projects. Various ways to enhance the reference with pangenomic information have been proposed, but scalability combined with seamless integration to existing workflows remains a challenge. Results: We present PanVC with founder sequences, a scalable and accurate variant calling workflow based on a multiple alignment of reference sequences. Scalability is achieved by removing duplicate parts up to a limit into a founder multiple alignment, that is then indexed using a hybrid scheme that exploits general purpose read aligners. Our implemented workflow uses GATK or BCFtools for variant calling, but the various steps of our workflow (e.g. vcf2multialign tool, founder reconstruction) can be of independent interest as a basis for creating novel pangenome analysis workflows beyond variant calling.
  • Baig, Abiyad; McNally, Alan; Dunn, Steven; Paszkiewicz, Konrad H.; Corander, Jukka; Manning, Georgina (2015)
    Background: Campylobacter jejuni is a major zoonotic pathogen, causing gastroenteritis in humans. Invasion is an important pathogenesis trait by which C. jejuni causes disease. Here we report the genomic analysis of 134 strains to identify traits unique to hyperinvasive isolates. Methods: A total of 134 C. jejuni genomes were used to create a phylogenetic tree to position the hyperinvasive strains. Comparative genomics lead to the identification of mosaic capsule regions. A pan genome approach led to the discovery of unique loci, or loci with unique alleles, to the hyperinvasive strains. Results: Phylogenetic analysis showed that the hyper-invasive phenotype is a generalist trait. Despite the fact that hyperinvasive strains are only distantly related based on the whole genome phylogeny, they all possess genes within the capsule region with high identity to capsule genes from C. jejuni subsp. doylei and C. lari. In addition there were genes unique to the hyper-invasive strains with identity to non-C. jejuni genes, as well as allelic variants of mainly pathogenesis related genes already known in the other C. jejuni. In particular, the sequence of flagella genes, flgD-E and flgL were highly conserved amongst the hyper-invasive strains and divergent from sequences in other C. jejuni. A novel cytolethal distending toxin (cdt) operon was also identified as present in all hyper-invasive strains in addition to the classic cdt operon present in other C. jejuni. Conclusions: Overall, the hyper-invasive phenotype is strongly linked to the presence of orthologous genes from other Campylobacter species in their genomes, notably within the capsule region, in addition to the observed association with unique allelic variants in flagellar genes and the secondary cdt operon which is unlikely under random sharing of accessory alleles in separate lineages.
  • Qvist, Laura; Niskanen, Markku; Mannermaa, Kristiina; Wutke, Saskia; Aspi, Jouni (2019)
    Background: The Finnhorse was established as a breed more than 110 years ago by combining local Finnish landraces. Since its foundation, the breed has experienced both strong directional selection, especially for size and colour, and severe population bottlenecks that are connected with its initial foundation and subsequent changes in agricultural and forestry practices. Here, we used sequences of the mitochondrial control region and genomic single nucleotide polymorphisms (SNPs) to estimate the genetic diversity and differentiation of the four Finnhorse breeding sections: trotters, pony-sized horses, draught horses and riding horses. Furthermore, we estimated inbreeding and effective population sizes over time to infer the history of this breed. Results: We found a high level of mitochondrial genetic variation and identified 16 of the 18 haplogroups described in present-day horses. Interestingly, one of these detected haplogroups was previously reported only in the Przewalski’s horse. Female effective population sizes were in the thousands, but declines were evident at the times when the breed and its breeding sections were founded. By contrast, nuclear variation and effective population sizes were small (approximately 50). Nevertheless, inbreeding in Finnhorses was lower than in many other horse breeds. Based on nuclear SNP data, genetic differentiation among the four breeding sections was strongest between the draught horses and the three other sections (FST=0.007–0.018), whereas based on mitochondrial DNA data, it was strongest between the trotters and the pony-sized and riding horses (ΦST= 0.054–0.068). Conclusions: The existence of a Przewalski’s horse haplogroup in the Finnhorse provides new insights into the domestication of the horse, and this finding supports previous suggestions of a close relationship between the Finnhorse and eastern primitive breeds. The high level of mitochondrial DNA variation in the Finnhorse supports its domestication from a large number of mares but also reflects that its founding depended on many local landraces. Although inbreeding in Finnhorses was lower than in many other horse breeds, the small nuclear effective popula- tion sizes of each of its breeding sections can be considered as a warning sign, which warrants changes in breeding practices.
  • Liu, Zhigao; Korpelainen, Helena (2018)
    Currently, there is a lack of genetic markers capable of effectively detecting polymorphisms in Clematis. Therefore, we developed new markers to investigate inter- and intraspecific diversity in Clematis. Based on the complete chloroplast genome of Clematis terniflora, simple sequence repeats were explored and primer pairs were designed for all ten adequate repeat regions (cpSSRs), which were tested in 43 individuals of 11 Clematis species. In addition, the nuclear ITS region was sequenced in 11 Clematis species. Seven cpSSR loci were found to be polymorphic in the genus and serve as markers that can distinguish different species and be used in different genetic analyses, including cultivar identification to assist the breeding of new ornamental cultivars.
  • Kulmuni, Jonna; Nouhaud, Pierre; Pluckrose, Lucy; Satokangas, Ina; Dhaygude, Kishor; Butlin, Roger K. (2020)
    Speciation underlies the generation of novel biodiversity. Yet, there is much to learn about how natural selection shapes genomes during speciation. Selection is assumed to act against gene flow at barrier loci, promoting reproductive isolation. However, evidence for gene flow and selection is often indirect and we know very little about the temporal stability of barrier loci. Here we utilize haplodiploidy to identify candidate male barrier loci in hybrids between two wood ant species. As ant males are haploid, they are expected to reveal recessive barrier loci, which can be masked in diploid females if heterozygous. We then test for barrier stability in a sample collected 10 years later and use survival analysis to provide a direct measure of natural selection acting on candidate male barrier loci. We find multiple candidate male barrier loci scattered throughout the genome. Surprisingly, a proportion of them are not stable after 10 years, natural selection apparently switching from acting against to favouring introgression in the later sample. Instability of the barrier effect and natural selection for introgressed alleles could be due to environment-dependent selection, emphasizing the need to consider temporal variation in the strength of natural selection and the stability of the barrier effect at putative barrier loci in future speciation work.
  • Hagstrom, Erik; Freyer, Christoph; Battersby, Brendan J.; Stewart, James B.; Larsson, Nils-Goran (2014)
  • Romiguier, Jonathan; Rolland, Jonathan; Morandin, Claire; Keller, Laurent (2018)
    Background: The ants of the Formica genus are classical model species in evolutionary biology. In particular, Darwin used Formica as model species to better understand the evolution of slave-making, a parasitic behaviour where workers of another species are stolen to exploit their workforce. In his book "On the Origin of Species" (1859), Darwin first hypothesized that slave-making behaviour in Formica evolved in incremental steps from a free-living ancestor. Methods: The absence of a well-resolved phylogenetic tree of the genus prevent an assessment of whether relationships among Formica subgenera are compatible with this scenario. In this study, we resolve the relationships among the 4 palearctic Formica subgenera (Formica str. s., Coptoformica, Raptiformica and Serviformica) using a phylogenomic dataset of 945 genes for 16 species. Results: We provide a reference tree resolving the relationships among the main Formica subgenera with high bootstrap supports. Discussion: The branching order of our tree suggests that the free-living lifestyle is ancestral in the Formica genus and that parasitic colony founding could have evolved a single time, probably acting as a pre-adaptation to slave-making behaviour. Conclusion: This phylogenetic tree provides a solid backbone for future evolutionary studies in the Formica genus and slave-making behaviour.
  • Shah, Firoz Hussain; Mali, Tuulia Leena Elina; Lundell, Taina Kristina (2018)
    Basidiomycota fungi in the order Polyporales are specified to decomposition of dead wood and woody debris and thereby are crucial players in the degradation of organic matter and cycling of carbon in the forest ecosystems. Polyporales wood-decaying species comprise both white rot and brown rot fungi, based on their mode of wood decay. While the white rot fungi are able to attack and decompose all the lignocellulose biopolymers, the brown rot species mainly cause the destruction of wood polysaccharides, with minor modification of the lignin units. The biochemical mechanism of brown rot decay of wood is still unclear and has been proposed to include a combination of nonenzymatic oxidation reactions and carbohydrate-active enzymes. Therefore, a linking approach is needed to dissect the fungal brown rot processes. We studied the brown rot Polyporales species Fomitopsis pinicola by following mycelial growth and enzyme activity patterns and generating metabolites together with Fenton-promoting Fe3+-reducing activity for 3 months in submerged cultures supplemented with spruce wood. Enzyme activities to degrade hemicellulose, cellulose, proteins, and chitin were produced by three Finnish isolates of F. pinicola. Substantial secretion of oxalic acid and a decrease in pH were notable. Aromatic compounds and metabolites were observed to accumulate in the fungal cultures, with some metabolites having Fe3+-reducing activity. Thus, F. pinicola demonstrates a pattern of strong mycelial growth leading to the active production of carbohydrate-and protein-active enzymes, together with the promotion of Fenton biochemistry. Our findings point to fungal species-level "fine-tuning" and variations in the biochemical reactions leading to the brown rot type of wood decay. IMPORTANCE Fomitopsis pinicola is a common fungal species in boreal and temperate forests in the Northern Hemisphere encountered as a wood-colonizing sapro-troph and tree pathogen, causing a severe brown rot type of wood degradation. However, its lignocellulose-decomposing mechanisms have remained undiscovered. Our approach was to explore both the enzymatic activities and nonenzymatic Fenton reaction-promoting activities (Fe3+ reduction and metabolite production) by cultivating three isolates of F. pinicola in wood-supplemented cultures. Our findings on the simultaneous production of versatile enzyme activities, including those of endoglucanase, xylanase, beta-glucosidase, chitinase, and acid peptidase, together with generation of low pH, accumulation of oxalic acid, and Fe3+-reducing metabolites, increase the variations of fungal brown rot decay mechanisms. Furthermore, these findings will aid us in revealing the wood decay proteomic, transcriptomic, and metabolic activities of this ecologically important forest fungal species.
  • Abdullah; Mehmood, Furrukh; Heidari, Parviz; Rahim, Abdur; Ahmed, Ibrar; Poczai, Péter (2021)
    The chloroplast genome evolves through the course of evolution. Various types of mutational events are found within the chloroplast genome, including insertions-deletions (InDels), substitutions, inversions, gene rearrangement, and pseudogenization of genes. The pseudogenization of the chloroplast threonine (trnT-GGU) gene was previously reported in Cryptomeria japonica (Cupressaceae), Pelargonium x hortorum (Geraniaceae), and Anaphalis sinica and Leontopodium leiolepis of the tribe Gnaphalieae (Asteroideae, Asteraceae). Here, we performed a broad analysis of the trnT-GGU gene among the species of 13 subfamilies of Asteraceae and found this gene as a pseudogene in core Asteraceae (Gymnarrhenoideae, Cichorioideae, Corymbioideae, and Asteroideae), which was linked to an insertion event within the 5 ' acceptor stem and is not associated with ecological factors such as habit, habitat, and geographical distribution of the species. The pseudogenization of trnT-GGU was not predicted in codon usage, indicating that the superwobbling phenomenon occurs in core Asteraceae in which a single transfer RNA (trnT-UGU) decodes all four codons of threonine. To the best of our knowledge, this is the first evidence of a complete clade of a plant species using the superwobbling phenomenon for translation.
  • Kasurinen, Jutta; Spruit, Cindy M.; Wicklund, Anu; Pajunen, Maria I.; Skurnik, Mikael (2021)
    Bacteriophage vB_EcoM_fHy-Eco03 (fHy-Eco03 for short) was isolated from a sewage sample based on its ability to infect an Escherichia coli clinical blood culture isolate. Altogether, 32 genes encoding hypothetical proteins of unknown function (HPUFs) were identified from the genomic sequence of fHy-Eco03. The HPUFs were screened for toxic properties (toxHPUFs) with a novel, Next Generation Sequencing (NGS)-based approach. This approach identifies toxHPUF-encoding genes through comparison of gene-specific read coverages in DNA from pooled ligation mixtures before electroporation and pooled transformants after electroporation. The performance and reliability of the NGS screening assay was compared with a plating efficiency-based method, and both methods identified the fHy-Eco03 gene g05 product as toxic. While the outcomes of the two screenings were highly similar, the NGS screening assay outperformed the plating efficiency assay in both reliability and efficiency. The NGS screening assay can be used as a high throughput method in the search for new phage-inspired antimicrobial molecules.
  • Posth, Cosimo; Zaro, Valentina; Spyrou, Maria A.; Vai, Stefania; Gnecchi-Ruscone, Guido A.; Modi, Alessandra; Peltzer, Alexander; Motsch, Angela; Nagele, Kathrin; Vagene, Ashild J.; Nelson, Elizabeth A.; Radzeviciute, Rita; Freund, Cacilia; Bondioli, Lorenzo M.; Cappuccini, Luca; Frenzel, Hannah; Pacciani, Elsa; Boschin, Francesco; Capecchi, Giulia; Martini, Ivan; Moroni, Adriana; Ricci, Stefano; Sperduti, Alessandra; Turchetti, Maria Angela; Riga, Alessandro; Zavattaro, Monica; Zifferero, Andrea; Heyne, Henrike O.; Fernandez-Dominguez, Eva; Kroonen, Guus J.; McCormick, Michael; Haak, Wolfgang; Lari, Martina; Barbujani, Guido; Bondioli, Luca; Bos, Kirsten; Caramelli, David; Krause, Johannes (2021)
    The origin, development, and legacy of the enigmatic Etruscan civilization from the central region of the Italian peninsula known as Etruria have been debated for centuries. Here we report a genomic time transect of 82 individuals spanning almost two millennia (800 BCE to 1000 CE) across Etruria and southern Italy. During the Iron Age, we detect a component of Indo-European-associated steppe ancestry and the lack of recent Anatolian-related admixture among the putative non-Indo-European-speaking Etruscans. Despite comprising diverse individuals of central European, northern African, and Near Eastern ancestry, the local gene pool is largely maintained across the first millennium BCE. This drastically changes during the Roman Imperial period where we report an abrupt population-wide shift to similar to 50% admixture with eastern Mediterranean ancestry. Last, we identify northern European components appearing in central Italy during the Early Middle Ages, which thus formed the genetic landscape of present-day Italian populations.
  • Fortino, Vittorio; Smolander, Olli-Pekka; Auvinen, Petri; Tagliaferri, Roberto; Greco, Dario (2014)