Browsing by Subject "ALIGNMENT"

Sort by: Order: Results:

Now showing items 1-20 of 70
  • Mehine, Miika; Khamaiseh, Sara; Ahvenainen, Terhi; Heikkinen, Tuomas; Äyräväinen, Anna; Pakarinen, Päivi; Härkki, Päivi; Pasanen, Annukka; Bützow, Ralf; Vahteristo, Pia (2020)
    Simple Summary Uterine leiomyomas are benign smooth muscle tumors affecting millions of women globally. On a molecular level, leiomyomas can be classified into three main subtypes, each characterized by mutations affecting either MED12, HMGA2, or FH. Leiomyomas are still widely regarded as a single entity, although early observations suggest that different subtypes behave differently, in terms of both clinical outcomes and therapeutic requirements. The majority of classification studies on leiomyomas have been performed using fresh frozen tissue. Archival formalin-fixed paraffin-embedded (FFPE) tissue represents an invaluable source of biological material that can be studied retrospectively. Methods capable of generating high-quality data from FFPE material are in high demand. Here, we show that 3 ' RNA sequencing can accurately classify leiomyomas that have been stored as FFPE tissue in hospital archives for years. A targeted 3 ' RNA sequencing panel could provide researchers and clinicians with a cost-effective and scalable diagnostic tool for classifying smooth muscle tumors. Uterine leiomyomas are benign smooth muscle tumors occurring in 70% of women of reproductive age. The majority of leiomyomas harbor one of three well-established genetic changes: a hotspot mutation in MED12, overexpression of HMGA2, or biallelic loss of FH. The majority of studies have classified leiomyomas by complex and costly methods, such as whole-genome sequencing, or by combining multiple traditional methods, such as immunohistochemistry and Sanger sequencing. The type of specimens and the amount of resources available often determine the choice. A more universal, cost-effective, and scalable method for classifying leiomyomas is needed. The aim of this study was to evaluate whether RNA sequencing can accurately classify formalin-fixed paraffin-embedded (FFPE) leiomyomas. We performed 3 ' RNA sequencing with 44 leiomyoma and 5 myometrium FFPE samples, revealing that the samples clustered according to the mutation status of MED12, HMGA2, and FH. Furthermore, we confirmed each subtype in a publicly available fresh frozen dataset. These results indicate that a targeted 3 ' RNA sequencing panel could serve as a cost-effective and robust tool for stratifying both fresh frozen and FFPE leiomyomas. This study also highlights 3 ' RNA sequencing as a promising method for studying the abundance of unexploited tissue material that is routinely stored in hospital archives.
  • Feola, Sara; Chiaro, Jacopo; Martins, Beatriz; Russo, Salvatore; Fusciello, Manlio; Ylösmäki, Erkko; Bonini, Chiara; Ruggiero, Eliana; Hamdan, Firas; Feodoroff, Michaela; Antignani, Gabriella; Viitala, Tapani; Pesonen, Sari; Grönholm, Mikaela; Branca, Rui M. M.; Lehtiö, Janne; Cerullo, Vincenzo (2022)
    Besides the isolation and identification of major histocompatibility complex I-restricted peptides from the surface of cancer cells, one of the challenges is eliciting an effective antitumor CD8+ T-cell-mediated response as part of therapeutic cancer vaccine. Therefore, the establishment of a solid pipeline for the downstream selection of clinically relevant peptides and the subsequent creation of therapeutic cancer vaccines are of utmost importance. Indeed, the use of peptides for eliciting specific antitumor adaptive immunity is hindered by two main limitations: the efficient selection of the most optimal candidate peptides and the use of a highly immunogenic platform to combine with the peptides to induce effective tumor-specific adaptive immune responses. Here, we describe for the first time a streamlined pipeline for the generation of personalized cancer vaccines starting from the isolation and selection of the most immunogenic peptide candidates expressed on the tumor cells and ending in the generation of efficient therapeutic oncolytic cancer vaccines. This immunopeptidomics-based pipeline was carefully validated in a murine colon tumor model CT26. Specifically, we used state-of-the-art immunoprecipitation and mass spectrometric methodologies to isolate > 8000 peptide targets from the CT26 tumor cell line. The selection of the target candidates was then based on two separate approaches: RNAseq analysis and HEX software. The latter is a tool previously developed by Jacopo, 2020, able to identify tumor antigens similar to pathogen antigens in order to exploit molecular mimicry and tumor pathogen cross-reactive T cells in cancer vaccine development. The generated list of candidates (26 in total) was further tested in a functional characterization assay using interferon-gamma enzyme-linked immunospot (ELISpot), reducing the number of candidates to six. These peptides were then tested in our previously described oncolytic cancer vaccine platform PeptiCRAd, a vaccine platform that combines an immunogenic oncolytic adenovirus (OAd) coated with tumor antigen peptides. In our work, PeptiCRAd was successfully used for the treatment of mice bearing CT26, controlling the primary malignant lesion and most importantly a secondary, nontreated, cancer lesion. These results confirmed the feasibility of applying the described pipeline for the selection of peptide candidates and generation of therapeutic oncolytic cancer vaccine, filling a gap in the field of cancer immunotherapy, and paving the way to translate our pipeline into human therapeutic approach.
  • Acosta, Nidia Obscura; Mäkinen, Veli; Tomescu, Alexandru I. (2018)
    Background: Reconstructing the genome of a species from short fragments is one of the oldest bioinformatics problems. Metagenomic assembly is a variant of the problem asking to reconstruct the circular genomes of all bacterial species present in a sequencing sample. This problem can be naturally formulated as finding a collection of circular walks of a directed graph G that together cover all nodes, or edges, of G. Approach: We address this problem with the "safe and complete" framework of Tomescu and Medvedev (Research in computational Molecular biology-20th annual conference, RECOMB 9649: 152-163, 2016). An algorithm is called safe if it returns only those walks (also called safe) that appear as subwalk in all metagenomic assembly solutions for G. A safe algorithm is called complete if it returns all safe walks of G. Results: We give graph-theoretic characterizations of the safe walks of G, and a safe and complete algorithm finding all safe walks of G. In the node-covering case, our algorithm runs in time O(m(2) + n(3)), and in the edge-covering case it runs in time O(m(2)n); n and m denote the number of nodes and edges, respectively, of G. This algorithm constitutes the first theoretical tight upper bound on what can be safely assembled from metagenomic reads using this problem formulation.
  • Mukherjee, Kingshuk; Alipanahi, Bahar; Kahveci, Tamer; Salmela, Leena; Boucher, Christina (2019)
    Motivation: Optical maps are high-resolution restriction maps (Rmaps) that give a unique numeric representation to a genome. Used in concert with sequence reads, they provide a useful tool for genome assembly and for discovering structural variations and rearrangements. Although they have been a regular feature of modern genome assembly projects, optical maps have been mainly used in post-processing step and not in the genome assembly process itself. Several methods have been proposed for pairwise alignment of single molecule optical maps-called Rmaps, or for aligning optical maps to assembled reads. However, the problem of aligning an Rmap to a graph representing the sequence data of the same genome has not been studied before. Such an alignment provides a mapping between two sets of data: optical maps and sequence data which will facilitate the usage of optical maps in the sequence assembly step itself. Results: We define the problem of aligning an Rmap to a de Bruijn graph and present the first algorithm for solving this problem which is based on a seed-and-extend approach. We demonstrate that our method is capable of aligning 73% of Rmaps generated from the Escherichia coli genome to the de Bruijn graph constructed from short reads generated from the same genome. We validate the alignments and show that our method achieves an accuracy of 99.6%. We also show that our method scales to larger genomes. In particular, we show that 76% of Rmaps can be aligned to the de Bruijn graph in the case of human data.
  • Lamnidis, Thiseas C.; Majander, Kerttu; Jeong, Choongwon; Salmela, Elina; Wessman, Anna; Moiseyev, Vyacheslav; Khartanovich, Valery; Balanovsky, Oleg; Ongyerth, Matthias; Weihmann, Antje; Sajantila, Antti; Kelso, Janet; Pääbo, Svante; Onkamo, Päivi; Haak, Wolfgang; Krause, Johannes; Schiffels, Stephan (2018)
    European population history has been shaped by migrations of people, and their subsequent admixture. Recently, ancient DNA has brought new insights into European migration events linked to the advent of agriculture, and possibly to the spread of Indo-European languages. However, little is known about the ancient population history of north-eastern Europe, in particular about populations speaking Uralic languages, such as Finns and Saami. Here we analyse ancient genomic data from 11 individuals from Finland and north-western Russia. We show that the genetic makeup of northern Europe was shaped by migrations from Siberia that began at least 3500 years ago. This Siberian ancestry was subsequently admixed into many modern populations in the region, particularly into populations speaking Uralic languages today. Additionally, we show that ancestors of modern Saami inhabited a larger territory during the Iron Age, which adds to the historical and linguistic information about the population history of Finland.
  • Holden, Lindsay A.; Arumilli, Meharji; Hytonen, Marjo K.; Hundi, Sruthi; Salojärvi, Jarkko; Brown, Kim H.; Lohi, Hannes (2018)
    Dogs are excellent animal models for human disease. They have extensive veterinary histories, pedigrees, and a unique genetic system due to breeding practices. Despite these advantages, one factor limiting their usefulness is the canine genome reference (CGR) which was assembled using a single purebred Boxer. Although a common practice, this results in many high-quality reads remaining unmapped. To address this whole-genome sequence data from three breeds, Border Collie (n = 26), Bearded Collie (n = 7), and Entlebucher Sennenhund (n = 8), were analyzed to identify novel, non-CGR genomic contigs using the previously validated pseudo-de novo assembly pipeline. We identified 256,957 novel contigs and paired-end relationships together with BLAT scores provided 126,555 (49%) high-quality contigs with genomic coordinates containing 4.6 Mb of novel sequence absent from the CGR. These contigs close 12,503 known gaps, including 2.4 Mb containing partially missing sequences for 11.5% of Ensembl, 16.4% of RefSeq and 12.2% of canFam3.1+ CGR annotated genes and 1,748 unmapped contigs containing 2,366 novel gene variants. Examples for six disease-associated genes (SCARF2, RD3, COL9A3, FAM161A, RASGRP1 and DLX6) containing gaps or alternate splice variants missing from the CGR are also presented. These findings from non-reference breeds support the need for improvement of the current Boxer-only CGR to avoid missing important biological information. The inclusion of the missing gene sequences into the CGR will facilitate identification of putative disease mutations across diverse breeds and phenotypes.
  • Cheng, Lu; Walker, Alan W.; Corander, Jukka (2012)
  • Celorio-Mancera, Maria de la Paz; Rastas, Pasi; Steward, Rachel A.; Nylin, Soren; Wheat, Christopher W. (2021)
    The comma butterfly (Polygonia c-album, Nymphalidae, Lepidoptera) is a model insect species, most notably in the study of phenotypic plasticity and plant-insect coevolutionary interactions. In order to facilitate the integration of genomic tools with a diverse body of ecological and evolutionary research, we assembled the genome of a Swedish comma using 10X sequencing, scaffolding with matepair data, genome polishing, and assignment to linkage groups using a high-density linkage map. The resulting genome is 373 Mb in size, with a scaffold N50 of 11.7 Mb and contig N50 of 11,2Mb. The genome contained 90.1% of single-copy Lepidopteran orthologs in a BUSCO analysis of 5,286 genes. A total of 21,004 gene-models were annotated on the genome using RNA-Seq data from larval and adult tissue in combination with proteins from the Arthropoda database, resulting in a high-quality annotation for which functional annotations were generated. We further documented the quality of the chromosomal assembly via synteny assessment with Melitaea cinxia. The resulting annotated, chromosome-level genome will provide an important resource for investigating coevolutionary dynamics and comparative analyses in Lepidoptera.
  • Aspatwar, Ashok; Barker, Harlan; Aisala, Heidi; Zueva, Ksenia; Kuuslahti, Marianne; Tolvanen, Martti; Primmer, Craig R.; Lumme, Jaakko; Bonardi, Alessandro; Tripathi, Amit; Parkkila, Seppo; Supuran, Claudiu T. (2022)
    A beta-class carbonic anhydrase (CA, EC was cloned from the genome of the Monogenean platyhelminth Gyrodactylus salaris, a parasite of Atlantic salmon. The new enzyme, GsaCA beta has a significant catalytic activity for the physiological reaction, CO2 + H2O (sic) HCO3- + H+ with a k(cat) of 1.1 x 10(5) s(-1) and a k(cat)/K-m of 7.58 x 10(6) M-1 x s(-1). This activity was inhibited by acetazolamide (K-I of 0.46 mu M), a sulphonamide in clinical use, as well as by selected inorganic anions and small molecules. Most tested anions inhibited GsaCA beta at millimolar concentrations, but sulfamide (K-I of 81 mu M), N,N-diethyldithiocarbamate (K-I of 67 mu M) and sulphamic acid (K-I of 6.2 mu M) showed a rather efficient inhibitory action. There are currently very few non-toxic agents effective in combating this parasite. GsaCA beta is subsequently proposed as a new drug target for which effective inhibitors can be designed.
  • Keller, Saskia; Hetzel, Udo; Sironen, Tarja; Korzyukov, Yegor; Vapalahti, Olli; Kipar, Anja; Hepojoki, Jussi (2017)
    Boid inclusion body disease (BIBD) is an often fatal disease affecting mainly constrictor snakes. BIBD has been associated with infection, and more recently with coinfection, by various reptarenavirus species (family Arenaviridae). Thus far BIBD has only been reported in captive snakes, and neither the incubation period nor the route of transmission are known. Herein we provide strong evidence that co-infecting reptarenavirus species can be vertically transmitted in Boa constrictor. In total we examined five B. constrictor clutches with offspring ranging in age from embryos over perinatal abortions to juveniles. The mother and/or father of each clutch were initially diagnosed with BIBD andor reptarenavirus infection by detection of the pathognomonic inclusion bodies (IB) andor reptarenaviral RNA. By applying next-generation sequencing and de novo sequence assembly we determined the "reptarenavirome " of each clutch, yielding several nearly complete L and S segments of multiple reptarenaviruses. We further confirmed vertical transmission of the co-infecting reptarenaviruses by species-specific RT-PCR from samples of parental animals and offspring. Curiously, not all offspring obtained the full parental "reptarenavirome". We extended our findings by an in vitro approach; cell cultures derived from embryonal samples rapidly developed IB and promoted replication of some or all parental viruses. In the tissues of embryos and perinatal abortions, viral antigen was sometimes detected, but IB were consistently seen only in the juvenile snakes from the age of 2 mo onwards. In addition to demonstrating vertical transmission of multiple species, our results also indicate that reptarenavirus infection induces BIBD over time in the offspring.
  • Rehman, Umar; Sultana, Nighat; Abdullah; Jamal, Abbas; Muzaffar, Maryam; Poczai, Péter (2021)
    Family Phyllanthaceae belongs to the eudicot order Malpighiales, and its species are herbs, shrubs, and trees that are mostly distributed in tropical regions. Here, we elucidate the molecular evolution of the chloroplast genome in Phyllanthaceae and identify the polymorphic loci for phylogenetic inference. We de novo assembled the chloroplast genomes of three Phyllanthaceae species, i.e., Phyllanthus emblica, Flueggea virosa, and Leptopus cordifolius, and compared them with six other previously reported genomes. All species comprised two inverted repeat regions (size range 23,921–27,128 bp) that separated large single-copy (83,627–89,932 bp) and small single-copy (17,424–19,441 bp) regions. Chloroplast genomes contained 111–112 unique genes, including 77–78 protein-coding, 30 tRNAs, and 4 rRNAs. The deletion/pseudogenization of rps16 genes was found in only two species. High variability was seen in the number of oligonucleotide repeats, while guanine-cytosine contents, codon usage, amino acid frequency, simple sequence repeats, synonymous and non-synonymous substitutions, and transition and transversion substitutions were similar. The transition substitutions were higher in coding sequences than in non-coding sequences. Phylogenetic analysis revealed the polyphyletic nature of the genus Phyllanthus. The polymorphic proteincoding genes, including rpl22, ycf1, matK, ndhF, and rps15, were also determined, which may be helpful for reconstructing the high-resolution phylogenetic tree of the family Phyllanthaceae. Overall, the study provides insight into the chloroplast genome evolution in Phyllanthaceae.
  • Ouwerkerk, Janneke P.; Tytgat, Hanne L. P.; Elzinga, Janneke; Koehorst, Jasper; Van den Abbeele, Pieter; Henrissat, Bernard; Gueimonde, Miguel; Cani, Patrice D.; Van de Wiele, Tom; Belzer, Clara; de Vos, Willem M. (2022)
    Akkermansia muciniphila is a champion of mucin degradation in the human gastrointestinal tract. Here, we report the isolation of six novel strains from healthy human donors and their genomic, proteomic and physiological characterization in comparison to the type-strains A. muciniphila Muc(T) and A. glycaniphila Pyt(T). Complete genome sequencing revealed that, despite their large genomic similarity (>97.6%), the novel isolates clustered into two distinct subspecies of A. muciniphila: Amuc1, which includes the type-strain Muc(T), and AmucU, a cluster of unassigned strains that have not yet been well characterized. CRISPR analysis showed all strains to be unique and confirmed that single healthy subjects can carry more than one A. muciniphila strain. Mucin degradation pathways were strongly conserved amongst all isolates, illustrating the exemplary niche adaptation of A. muciniphila to the mucin interface. This was confirmed by analysis of the predicted glycoside hydrolase profiles and supported by comparing the proteomes of A. muciniphila strain H2, belonging to the AmucU cluster, to Muc(T) and A. glycaniphila Pyt(T) (including 610 and 727 proteins, respectively). While some intrinsic resistance was observed among the A. muciniphila straind, none of these seem to pose strain-specific risks in terms of their antibiotic resistance patterns nor a significant risk for the horizontal transfer of antibiotic resistance determinants, opening the way to apply the type-strain Muc(T) or these new A. muciniphila strains as next generation beneficial microbes.
  • Abdullah; Mehmood, Furrukh; Heidari, Parviz; Ahmed, Ibrar; Poczai, Péter (2021)
    The genus Blumea (Asteroideae, Asteraceae) comprises about 100 species, including herbs, shrubs, and small trees. Previous studies have been unable to resolve taxonomic issues and the phylogeny of the genus Blumea due to the low polymorphism of molecular markers. Therefore, suitable polymorphic regions need to be identified. Here, we de novo assembled plastomes of the three Blumea species B. oxyodonta, B. tenella, and B. balsamifera and compared them with 25 other species of Asteroideae after correction of annotations. These species have quadripartite plastomes with similar gene content and genome organization comprising 113 genes, including 80 protein-coding, 29 transfer RNA, and 4 ribosomal RNA genes. The contraction and expansion of inverted repeats also show high similarities among the species. The comparative analysis of codon usage, amino acid frequency, microsatellite repeats, oligonucleotide repeats, and transition and transversion substitutions has revealed high resemblance among the newly assembled species of Blumea. We identified 10 highly polymorphic regions with nucleotide diversity above 0.02, including rps16-trnQ, ycf1, ndhF-rpl32, rps15, petN-psbM, and rpl32-trnL, and they may be suitable for the development of robust, authentic, and cost-effective markers for bar coding and inference of the phylogeny of the genus Blumea. Among these highly polymorphic regions, five regions also co-occurred with oligonucleotide repeats and support use of repeats as a proxy for the identification of polymorphic loci. The phylogenetic analysis revealed a close relationship between Blumea and Pluchea within the tribe Inuleae. Our study supports a sister relationship between “Astereae and Anthemideae,” while Gnaphalieae roots these two tribes, whereas in a previous study a sister relationship was reported between “Senecioneae and Anthemideae” and “Astereae and Gnaphalieae” using nuclear genome sequences. The conflicting phylogenetic signals observed at the tribal level between chloroplast and nuclear genome data require further investigation.
  • Abdullah; Henriquez, Claudia L.; Mehmood, Furrukh; Shahzadi, Iram; Ali, Zain; Waheed, Mohammad Tahir; Croat, Thomas B; Poczai, Péter; Ahmed, Ibrar (2020)
    The chloroplast genome provides insight into the evolution of plant species. We de novo assembled and annotated chloroplast genomes of four genera representing three subfamilies of Araceae: Lasia spinosa (Lasioideae), Stylochaeton bogneri, Zamioculcas zamiifolia (Zamioculcadoideae), and Orontium aquaticum (Orontioideae), and performed comparative genomics using these chloroplast genomes. The sizes of the chloroplast genomes ranged from 163,770 bp to 169,982 bp. These genomes comprise 113 unique genes, including 79 protein-coding, 4 rRNA, and 30 tRNA genes. Among these genes, 17–18 genes are duplicated in the inverted repeat (IR) regions, comprising 6–7 protein-coding (including trans-splicing gene rps12), 4 rRNA, and 7 tRNA genes. The total number of genes ranged between 130 and 131. The infA gene was found to be a pseudogene in all four genomes reported here. These genomes exhibited high similarities in codon usage, amino acid frequency, RNA editing sites, and microsatellites. The oligonucleotide repeats and junctions JSB (IRb/SSC) and JSA (SSC/IRa) were highly variable among the genomes. The patterns of IR contraction and expansion were shown to be homoplasious, and therefore unsuitable for phylogenetic analyses. Signatures of positive selection were seen in three genes in S. bogneri, including ycf2, clpP, and rpl36. This study is a valuable addition to the evolutionary history of chloroplast genome structure in Araceae.
  • Beier, Sebastian; Himmelbach, Axel; Colmsee, Christian; Zhang, Xiao-Qi; Barrero, Roberto A.; Zhang, Qisen; Li, Lin; Bayer, Micha; Bolser, Daniel; Taudien, Stefan; Groth, Marco; Felder, Marius; Hastie, Alex; Simkova, Hana; Stankova, Helena; Vrana, Jan; Chan, Saki; Munoz-Amatriain, Maria; Ounit, Rachid; Wanamaker, Steve; Schmutzer, Thomas; Aliyeva-Schnorr, Lala; Grasso, Stefano; Tanskanen, Jaakko; Sampath, Dharanya; Heavens, Darren; Cao, Sujie; Chapman, Brett; Dai, Fei; Han, Yong; Li, Hua; Li, Xuan; Lin, Chongyun; McCooke, John K.; Tan, Cong; Wang, Songbo; Yin, Shuya; Zhou, Gaofeng; Poland, Jesse A.; Bellgard, Matthew I.; Houben, Andreas; Dolezel, Jaroslav; Ayling, Sarah; Lonardi, Stefano; Langridge, Peter; Muehlbauer, Gary J.; Kersey, Paul; Clark, Matthew D.; Caccamo, Mario; Schulman, Alan H.; Platzer, Matthias; Close, Timothy J.; Hansson, Mats; Zhang, Guoping; Braumann, Ilka; Li, Chengdao; Waugh, Robbie; Scholz, Uwe; Stein, Nils; Mascher, Martin (2017)
    Barley (Hordeum vulgare L.) is a cereal grass mainly used as animal fodder and raw material for the malting industry. The map-based reference genome sequence of barley cv. `Morex' was constructed by the International Barley Genome Sequencing Consortium (IBSC) using hierarchical shotgun sequencing. Here, we report the experimental and computational procedures to (i) sequence and assemble more than 80,000 bacterial artificial chromosome (BAC) clones along the minimum tiling path of a genome-wide physical map, (ii) find and validate overlaps between adjacent BACs, (iii) construct 4,265 non-redundant sequence scaffolds representing clusters of overlapping BACs, and (iv) order and orient these BAC clusters along the seven barley chromosomes using positional information provided by dense genetic maps, an optical map and chromosome conformation capture sequencing (Hi-C). Integrative access to these sequence and mapping resources is provided by the barley genome explorer (BARLEX).
  • Jouhten, Hanne; Ronkainen, Aki; Aakko, Juhani; Salminen, Seppo; Mattila, Eero; Arkkila, Perttu; Satokari, Reetta (2020)
    Fecal microbiota transplantation (FMT) is an effective treatment for recurrentClostridioides difficileinfection (rCDI) and it's also considered for treating other indications. Metagenomic studies have indicated that commensal donor bacteria may colonize FMT recipients, but cultivation has not been employed to verify strain-level colonization. We combined molecular profiling ofBifidobacteriumpopulations with cultivation, molecular typing, and whole genome sequencing (WGS) to isolate and identify strains that were transferred from donors to recipients. SeveralBifidobacteriumstrains from two donors were recovered from 13 recipients during the 1-year follow-up period after FMT. The strain identities were confirmed by WGS and comparative genomics. Our results show that specific donor-derived bifidobacteria can colonize rCDI patients for at least 1 year, and thus FMT may have long-term consequences for the recipient's microbiota and health. Conceptually, we demonstrate that FMT trials combined with microbial profiling can be used as a platform for discovering and isolating commensal strains with proven colonization capacity for potential therapeutic use.
  • Zhang, Shu; Bai, Xue; Ren, Li-Yuan; Sun, Hui-Hui; Tang, Hui-Ping; Vaario, Lu-Min; Xu, Jianping; Zhang, Yong-Jie (2021)
    Fungi, as eukaryotic organisms, contain two genomes, the mitochondrial genome and the nuclear genome, in their cells. How the two genomes evolve and correlate to each other is debated. Herein, taking the gourmet pine mushroom Tricholoma matsutake as an example, we performed comparative mitogenomic analysis using samples collected from diverse locations and compared the evolution of the two genomes. The T. matsutake mitogenome encodes 49 genes and is rich of repetitive and non-coding DNAs. Six genes were invaded by up to 11 group I introns, with one cox1 intron cox1P372 showing presence/absence dynamics among different samples. Bioinformatic analyses suggested limited or no evidence of mitochondrial heteroplasmy. Interestingly, hundreds of mitochondrial DNA fragments were found in the nuclear genome, with several larger than 500 nt confirmed by PCR assays and read count comparisons, indicating clear evidence of transfer of mitochondrial DNA into the nuclear genome. Nuclear DNA of T. matsutake showed a higher mutation rate than mitochondrial DNA. Furthermore, we found evidence of incongruence between phylogenetic trees derived from mitogenome and nuclear DNA sequences. Together, our results reveal the dynamic genome evolution of the gourmet pine mushroom.
  • Herranen, J.; Markkanen, J.; Muinonen, K. (2017)
    We establish a theoretical framework for solving the equations of motion for an arbitrarily shaped, inhomogeneous dust particle in the presence of radiation pressure. The repeated scattering problem involved is solved using a state-of-the-art volume integral equation-based T-matrix method. A Fortran implementation of the framework is used to solve the explicit time evolution of a homogeneous irregular sample geometry. The results are shown to be consistent with rigid body dynamics, between integrators, and comparable with predictions from an alignment efficiency potential map. Also, we demonstrate the explicit effect of single-particle dynamics to observed polarization using the obtained orientational results.
  • BEEHIVE Collaboration; Wymant, Chris; Blanquart, Francois; Golubchik, Tanya; Gall, Astrid; Bakker, Margreet; Bezemer, Daniela; Croucher, Nicholas J.; Hall, Matthew; Hillebregt, Mariska; Ong, Swee Hoe; Ratmann, Oliver; Albert, Jan; Bannert, Norbert; Fellay, Jacques; Fransen, Katrien; Gourlay, Annabelle; Grabowski, M. Kate; Gunsenheimer-Bartmeyer, Barbara; Gunthard, Huldrych F.; Kivelä, Pia; Kouyos, Roger; Laeyendecker, Oliver; Liitsola, Kirsi; Meyer, Laurence; Porter, Kholoud; Ristola, Matti; van Sighem, Ard; Berkhout, Ben; Cornelissen, Marion; Kellam, Paul; Reiss, Peter; Fraser, Christophe (2018)
    Studying the evolution of viruses and their molecular epidemiology relies on accurate viral sequence data, so that small differences between similar viruses can be meaningfully interpreted. Despite its higher throughput and more detailed minority variant data, next-generation sequencing has yet to be widely adopted for HIV. The difficulty of accurately reconstructing the consensus sequence of a quasispecies from reads (short fragments of DNA) in the presence of large between-and within-host diversity, including frequent indels, may have presented a barrier. In particular, mapping (aligning) reads to a reference sequence leads to biased loss of information; this bias can distort epidemiological and evolutionary conclusions. De novo assembly avoids this bias by aligning the reads to themselves, producing a set of sequences called contigs. However contigs provide only a partial summary of the reads, misassembly may result in their having an incorrect structure, and no information is available at parts of the genome where contigs could not be assembled. To address these problems we developed the tool shiver to pre-process reads for quality and contamination, then map them to a reference tailored to the sample using corrected contigs supplemented with the user's choice of existing reference sequences. Run with two commands per sample, it can easily be used for large heterogeneous data sets. We used shiver to reconstruct the consensus sequence and minority variant information from paired-end short-read whole-genome data produced with the Illumina platform, for sixty-five existing publicly available samples and fifty new samples. We show the systematic superiority of mapping to shiver's constructed reference compared with mapping the same reads to the closest of 3,249 real references: median values of 13 bases called differently and more accurately, 0 bases called differently and less accurately, and 205 bases of missing sequence recovered. We also successfully applied shiver to whole-genome samples of Hepatitis C Virus and Respiratory Syncytial Virus. shiver is publicly available from
  • Groussin, Mathieu; Poyet, Mathilde; Sistiaga, Ainara; Kearney, Sean M.; Moniz, Katya; Noel, Mary; Hooker, Jeff; Gibbons, Sean M.; Segurel, Laure; Froment, Alain; Mohamed, Rihlat Said; Fezeu, Alain; Juimo, Vanessa A.; Lafosse, Sophie; Tabe, Francis E.; Girard, Catherine; Iqaluk, Deborah; Nguyen, Le Thanh Tu; Shapiro, B. Jesse; Lehtimaki, Jenni; Ruokolainen, Lasse; Kettunen, Pinja P.; Vatanen, Tommi; Sigwazi, Shani; Mabulla, Audax; Dominguez-Rodrigo, Manuel; Nartey, Yvonne A.; Agyei-Nkansah, Adwoa; Duah, Amoako; Awuku, Yaw A.; Valles, Kenneth A.; Asibey, Shadrack O.; Afihene, Mary Y.; Roberts, Lewis R.; Plymoth, Amelie; Onyekwere, Charles A.; Summons, Roger E.; Xavier, Ramnik J.; Alm, Eric J. (2021)
    Industrialization has impacted the human gut ecosystem, resulting in altered microbiome composition and diversity. Whether bacterial genomes may also adapt to the industrialization of their host populations remains largely unexplored. Here, we investigate the extent to which the rates and targets of horizontal gene transfer (HGT) vary across thousands of bacterial strains from 15 human populations spanning a range of industrialization. We show that HGTs have accumulated in the microbiome over recent host generations and that HGT occurs at high frequency within individuals. Comparison across human populations reveals that industrialized lifestyles are associated with higher HGT rates and that the functions of HGTs are related to the level of host industrialization. Our results suggest that gut bacteria continuously acquire new functionality based on host lifestyle and that high rates of HGT may be a recent development in human history linked to industrialization.