Browsing by Subject "SEQUENCES"

Sort by: Order: Results:

Now showing items 1-20 of 52
  • Thompson, Luke R.; Sanders, Jon G.; McDonald, Daniel; Amir, Amnon; Ladau, Joshua; Locey, Kenneth J.; Prill, Robert J.; Tripathi, Anupriya; Gibbons, Sean M.; Ackermann, Gail; Navas-Molina, Jose A.; Janssen, Stefan; Kopylova, Evguenia; Vazquez-Baeza, Yoshiki; Gonzalez, Antonio; Morton, James T.; Mirarab, Siavash; Xu, Zhenjiang Zech; Jiang, Lingjing; Haroon, Mohamed F.; Kanbar, Jad; Zhu, Qiyun; Song, Se Jin; Kosciolek, Tomasz; Bokulich, Nicholas A.; Lefler, Joshua; Brislawn, Colin J.; Humphrey, Gregory; Owens, Sarah M.; Hampton-Marcell, Jarrad; Berg-Lyons, Donna; McKenzie, Valerie; Fierer, Noah; Fuhrman, Jed A.; Clauset, Aaron; Stevens, Rick L.; Shade, Ashley; Pollard, Katherine S.; Goodwin, Kelly D.; Jansson, Janet K.; Gilbert, Jack A.; Knight, Rob; Earth Microbiome Project Consortiu; Hultman, Jenni (2017)
    Our growing awareness of the microbial world's importance and diversity contrasts starkly with our limited understanding of its fundamental structure. Despite recent advances in DNA sequencing, a lack of standardized protocols and common analytical frameworks impedes comparisons among studies, hindering the development of global inferences about microbial life on Earth. Here we present a meta-analysis of microbial community samples collected by hundreds of researchers for the Earth Microbiome Project. Coordinated protocols and new analytical methods, particularly the use of exact sequences instead of clustered operational taxonomic units, enable bacterial and archaeal ribosomal RNA gene sequences to be followed across multiple studies and allow us to explore patterns of diversity at an unprecedented scale. The result is both a reference database giving global context to DNA sequence data and a framework for incorporating data from future studies, fostering increasingly complete characterization of Earth's microbial diversity.
  • Alanko, Jarno; Cunial, Fabio; Belazzougui, Djamal; Mäkinen, Veli (2017)
    Background: A metagenomic sample is a set of DNA fragments, randomly extracted from multiple cells in an environment, belonging to distinct, often unknown species. Unsupervised metagenomic clustering aims at partitioning a metagenomic sample into sets that approximate taxonomic units, without using reference genomes. Since samples are large and steadily growing, space-efficient clustering algorithms are strongly needed. Results: We design and implement a space-efficient algorithmic framework that solves a number of core primitives in unsupervised metagenomic clustering using just the bidirectional Burrows-Wheeler index and a union-find data structure on the set of reads. When run on a sample of total length n, with m reads of maximum length l each, on an alphabet of total size sigma, our algorithms take O(n(t + log sigma)) time and just 2n + o(n) + O(max{l sigma log n, K logm}) bits of space in addition to the index and to the union-find data structure, where K is a measure of the redundancy of the sample and t is the query time of the union-find data structure. Conclusions: Our experimental results show that our algorithms are practical, they can exploit multiple cores by a parallel traversal of the suffix-link tree, and they are competitive both in space and in time with the state of the art.
  • Howlader, Mohammad Sajid Ali; Nair, Abhilash; Gopalan, Sujith V.; Merilä, Juha (2015)
    A new species of Microhyla frog from the Nilphamari district of Bangladesh is described and compared with its morphologically similar and geographically proximate congeners. Molecular phylogeny derived from mitochondrial DNA sequences revealed that although the new species - designated here as Microhyla nilphamariensis sp. nov. - forms a clade with M. ornate, it is highly divergent from M. ornata and all of its congeners, with 5.7 - 13.2% sequence divergence at the 16S rRNA gene. The new species can be identified phenotypically on the basis of a set of diagnostic (both qualitative and quantitative) characters as follows: head length is 77% of head width, distance from front of eyes to the nostril is roughly six times greater than nostril-snout length, internarial distance is roughly five times greater than nostril-snout length, interorbital distance is two times greater than internarial distance, and distance from back of mandible to back of the eye is 15% of head length. Furthermore, inner metacarpal tubercle is small and ovoid-shaped, whereas outer metacarpal tubercle is very small and rounded. Toes have rudimentary webbing, digital discs are absent, inner metatarsal tubercle is small and round, outer metatarsal tubercle is ovoid-shaped, minute, and indistinct.
  • Kröger, Björn; Penny, Amelia; Shen, Yuefeng; Munnecke, Axel (2020)
    The Late Ordovician succession of the Baltic Basin contains a characteristic fine-grained limestone, which is rich in calcareous green algae. This limestone occurs in surface outcrops and drill-cores in an extensive belt reaching from Sweden across the Baltic Sea to the Baltic countries. This limestone, which is known in the literature under several different lithological names, is described and interpreted, and the term "Baltic limestone facies" is suggested. The microfacies, from selected outcrops from the angstrom land Islands, Finland and Estonia, consists of calcareous green algae as the main skeletal component in a bioclastic mudstone-packstone lithology with a pure micritic matrix. Three types of calcitarch, which range in diameter from c. 100-180 mu m, are common. Basinward, the youngest sections of the facies belt contain coral-stromatoporoid patch reefs and Palaeoporella-algal mounds. The Baltic limestone facies can be interpreted as representing the shallow part of an open-marine low-latitude carbonate platform.
  • Nováková, Eliška; Zablatzká, Lenka; Brus, Jan; Nesrstová, Viktorie; Hanáček, Pavel; Kalendar, Ruslan; Cvrčková, Fatima; Majeský, Ľuboš; Smýkal, Petr (2019)
    Reproductive isolation is an important component of species differentiation. The plastid accD gene coding for the acetyl-CoA carboxylase subunit and the nuclear bccp gene coding for the biotin carboxyl carrier protein were identified as candidate genes governing nuclear-cytoplasmic incompatibility in peas. We examined the allelic diversity in a set of 195 geographically diverse samples of both cultivated (Pisum sativum, P. abyssinicum) and wild (P. fulvum and P. elatius) peas. Based on deduced protein sequences, we identified 34 accD and 31 bccp alleles that are partially geographically and genetically structured. The accD is highly variable due to insertions of tandem repeats. P. fulvum and P. abyssinicum have unique alleles and combinations of both genes. On the other hand, partial overlap was observed between P. sativum and P. elatius. Mapping of protein sequence polymorphisms to 3D structures revealed that most of the repeat and indel polymorphisms map to sequence regions that could not be modeled, consistent with this part of the protein being less constrained by requirements for precise folding than the enzymatically active domains. The results of this study are important not only from an evolutionary point of view but are also relevant for pea breeding when using more distant wild relatives.
  • Plunkett, Jevon; Doniger, Scott; Orabona, Guilherme; Morgan, Thomas; Haataja, Ritva; Hallman, Mikko; Puttonen, Hilkka; Menon, Ramkumar; Kuczynski, Edward; Norwitz, Errol; Snegovskikh, Victoria; Palotie, Aarno; Palotie, Leena; Fellman, Vineta; DeFranco, Emily A.; Chaudhari, Bimal P.; McGregor, Tracy L.; McElroy, Jude J.; Oetjens, Matthew T.; Teramo, Kari; Borecki, Ingrid; Fay, Justin; Muglia, Louis (2011)
  • Koslicki, David; Chatterjee, Saikat; Shahrivar, Damon; Walker, Alan W.; Francis, Suzanna C.; Fraser, Louise J.; Vehkaperae, Mikko; Lan, Yueheng; Corander, Jukka (2015)
    Motivation Estimation of bacterial community composition from high-throughput sequenced 16S rRNA gene amplicons is a key task in microbial ecology. Since the sequence data from each sample typically consist of a large number of reads and are adversely impacted by different levels of biological and technical noise, accurate analysis of such large datasets is challenging. Results There has been a recent surge of interest in using compressed sensing inspired and convex-optimization based methods to solve the estimation problem for bacterial community composition. These methods typically rely on summarizing the sequence data by frequencies of low-order k-mers and matching this information statistically with a taxonomically structured database. Here we show that the accuracy of the resulting community composition estimates can be substantially improved by aggregating the reads from a sample with an unsupervised machine learning approach prior to the estimation phase. The aggregation of reads is a pre-processing approach where we use a standard K-means clustering algorithm that partitions a large set of reads into subsets with reasonable computational cost to provide several vectors of first order statistics instead of only single statistical summarization in terms of k-mer frequencies. The output of the clustering is then processed further to obtain the final estimate for each sample. The resulting method is called Aggregation of Reads by K-means (ARK), and it is based on a statistical argument via mixture density formulation. ARK is found to improve the fidelity and robustness of several recently introduced methods, with only a modest increase in computational complexity. Availability An open source, platform-independent implementation of the method in the Julia programming language is freely available at https://github.com/dkoslicki/ARK. A Matlab implementation is available at http://www.ee.kth.se/ctsoftware.
  • Tiwari, Ananda; Hokajärvi, Anna-Maria; Santo Domingo, Jorge; Elk, Michael; Jayaprakash, Balamuralikrishna; Ryu, Hodon; Siponen, Sallamaari; Vepsäläinen, Asko; Kauppinen, Ari; Puurunen, Osmo; Artimo, Aki; Perkola, Noora; Huttula, Timo; Miettinen, Ilkka T.; Pitkänen, Tarja (2021)
    Background Rivers and lakes are used for multiple purposes such as for drinking water (DW) production, recreation, and as recipients of wastewater from various sources. The deterioration of surface water quality with wastewater is well-known, but less is known about the bacterial community dynamics in the affected surface waters. Understanding the bacterial community characteristics -from the source of contamination, through the watershed to the DW production process-may help safeguard human health and the environment. Results The spatial and seasonal dynamics of bacterial communities, their predicted functions, and potential health-related bacterial (PHRB) reads within the Kokemaenjoki River watershed in southwest Finland were analyzed with the 16S rRNA-gene amplicon sequencing method. Water samples were collected from various sampling points of the watershed, from its major pollution sources (sewage influent and effluent, industrial effluent, mine runoff) and different stages of the DW treatment process (pre-treatment, groundwater observation well, DW production well) by using the river water as raw water with an artificial groundwater recharge (AGR). The beta-diversity analysis revealed that bacterial communities were highly varied among sample groups (R = 0.92, p <0.001, ANOSIM). The species richness and evenness indices were highest in surface water (Chao1; 920 +/- 10) among sample groups and gradually decreased during the DW treatment process (DW production well; Chao1: 320 +/- 20). Although the phylum Proteobacteria was omnipresent, its relative abundance was higher in sewage and industrial effluents (66-80%) than in surface water (55%). Phyla Firmicutes and Fusobacteria were only detected in sewage samples. Actinobacteria was more abundant in the surface water (>= 13%) than in other groups (= 13%) than in others (
  • Vaario, Lu-Min; Asamizu, Shumpei; Sarjala, Tytti; Matsushita, Norihisa; Onaka, Hiroyasu; Xia, Yan; Kurokochi, Hiroyuki; Morinaga, Shin-Ichi; Huang, Jian; Zhang, Shijie; Lian, Chunlan (2020)
    Tricholoma matsutake is known to be the dominant fungal species in matsutake fruitbody neighboring (shiro) soil. To understand the mechanisms behind matsutake dominance, we studied the bacterial communities in matsutake dominant shiro soil and non-shiro soil, isolated the strains of Streptomyces from matsutake mycorrhizal root tips both from shiro soil and from the Pinus densiflora seedlings cultivated in shiro soil. Further, we investigated three Streptomyces spp. for their ability to inhibit fungal growth and Pinus densiflora seedling root elongation as well as two strains for their antifungal and antioxidative properties. Our results showed that Actinobacteria was the most abundant phylum in shiro soil. However, the differences in the Actinobacterial community composition (phylum or order level) between shiro and non-shiro soils were not significant, as indicated by PERMANOVA analyses. A genus belonging to Actinobacteria, Streptomyces, was present on the matsutake mycorrhizas, although in minority. The two antifungal assays revealed that the broths of three Streptomyces spp. had either inhibitory, neutral or promoting effects on the growth of different forest soil fungi as well as on the root elongation of the seedlings. The extracts of two strains, including one isolated from the P. densiflora seedlings, inhibited the growth of either pathogenic or ectomycorrhizal fungi. The effect depended on the medium used to cultivate the strains, but not the solvent used for the extraction. Two Streptomyces spp. showed antioxidant activity in one out of three assays used, in a ferric reducing antioxidant power assay. The observed properties seem to have several functions in matsutake shiro soil and they may contribute to the protection of the shiro area for T. matsutake dominance.
  • Contu, Lara; Balistreri, Giuseppe; Domanski, Michal; Uldry, Anne-Christine; Muhlemann, Oliver (2021)
    The positive-sense, single-stranded RNA alphaviruses pose a potential epidemic threat. Understanding the complex interactions between the viral and the host cell proteins is crucial for elucidating the mechanisms underlying successful virus replication strategies and for developing specific antiviral interventions. Here we present the first comprehensive protein-protein interaction map between the proteins of Semliki Forest Virus (SFV), a mosquito-borne member of the alphaviruses, and host cell proteins. Among the many identified cellular interactors of SFV proteins, the enrichment of factors involved in translation and nonsense-mediated mRNA decay (NMD) was striking, reflecting the virus' hijacking of the translation machinery and indicating viral countermeasures for escaping NMD by inhibiting NMD at later time points during the infectious cycle. In addition to observing a general inhibition of NMD about 4 hours post infection, we also demonstrate that transient expression of the SFV capsid protein is sufficient to inhibit NMD in cells, suggesting that the massive production of capsid protein during the SFV reproduction cycle is responsible for NMD inhibition. Author summary To take over control of the host cell and ensure its own replication, viral proteins do interact with a plethora of host cell proteins. Elucidating these viral-host cell protein interactions is therefore key for understanding the mechanisms that a virus applies to successfully hijack the host cell. This study provides the first comprehensive protein-protein interaction map between the proteins of Semliki Forest Virus (SFV), a positive-strand, single-stranded RNA virus of the alphavirus family. While we previously discovered that the host cell recognizes and degrades the incoming viral genomic RNA by a cellular quality control system called Nonsense-Mediated mRNA Decay (NMD), our interactome study now led to uncovering of the other side of this arms race between SFV and the infected cells: We show in this study that the viral capsid protein has the capacity to inhibit NMD.
  • Tuomenoksa, Asta; Pajo, Kati; Klippi, Anu (2016)
    This study applies conversation analysis to compare everyday conversation samples between a person with aphasia (PWA) and a familiar communication partner (CP) before and after intensive language-action therapy (ILAT). Our analysis concentrated on collaborative repair sequences with the assumption that impairment-focused therapy would translate into a change in the nature of trouble sources, which engender collaborative repair action typical of aphasic conversation. The most frequent repair initiation technique used by the CP was candidate understandings. The function of candidate understandings changed from addressing specific trouble sources pre-ILAT to concluding longer stretches of the PWA's talk post-ILAT. Alongside with these findings, we documented a clinically significant increase in the Western Aphasia Battery's aphasia quotient post-ILAT. Our results suggest that instead of mere frequency count of conversational behaviours, examining the type and function of repair actions might provide insight into therapy-related changes in conversation following impairment-focused therapy.
  • Nyholm, Outi; Halkilahti, Jani; Wiklund, Gudrun; Okeke, Uche; Paulin, Lars; Auvinen, Petri; Haukka, Kaisa; Siitonen, Anja (2015)
    Background Shigatoxigenic Escherichia coli (STEC) and enterotoxigenic E. coli (ETEC) cause serious foodborne infections in humans. These two pathogroups are defined based on the pathogroup-associated virulence genes: stx encoding Shiga toxin (Stx) for STEC and elt encoding heat-labile and/or est encoding heat-stable enterotoxin (ST) for ETEC. The study investigated the genomics of STEC/ETEC hybrid strains to determine their phylogenetic position among E. coli and to define the virulence genes they harbor. Methods The whole genomes of three STEC/ETEC strains possessing both stx and est genes were sequenced using PacBio RS sequencer. Two of the strains were isolated from the patients, one with hemolytic uremic syndrome, and one with diarrhea. The third strain was of bovine origin. Core genome analysis of the shared chromosomal genes and comparison with E. coli and Shigella spp. reference genomes was performed to determine the phylogenetic position of the STEC/ETEC strains. In addition, a set of virulence genes and ETEC colonization factors were extracted from the genomes. The production of Stx and ST were studied. Results The human STEC/ETEC strains clustered with strains representing ETEC, STEC, entero-aggregative E. coli, and commensal and laboratory-adapted E. coli. However, the bovine STEC/ETEC strain formed a remote cluster with two STECs of bovine origin. All three STEC/ETEC strains harbored several other virulence genes, apart from stx and est, and lacked ETEC colonization factors. Two STEC/ETEC strains produced both toxins and one strain Stx only. Conclusions This study shows that pathogroup-associated virulence genes of different E. coli can coexist in strains originating from different phylogenetic lineages. The possibility of virulence genes to be associated with several E. coli pathogroups should be taken into account in strain typing and in epidemiological surveillance. Development of novel hybrid E. coli strains may cause a new public health risk, which challenges the traditional diagnostics of E. coli infections.
  • Harjunpää, Katariina; Deppermann, Arnulf; Sorjonen, Marja-Leena (2021)
    Using video-recordings from one day of a theater project for young adults, this paper investigates how the meaning of novel verbal expressions is interactionally constituted and elaborated over the interactional history of a series of activities. We examine how the theater director introduces and instructs the group in the Chekhovian technique of acting, which is based on “imagining with the body,” and how the imaginary elements of the technique are “brought into existence” in the language of the instructions. By tracking shifts in the instructor’s use of the key expressions invisible/imaginary/inner body or movement through a series of exercises, we demonstrate how they are increasingly treated as real and perceivable bodily conduct. The analyses focus on the instructor’s attribution of factual and agentive properties to these expressions, and the changes that these properties undergo over the series of instructions. This case demonstrates the significance of longitudinal processes for the establishment of shared meaning in social interaction. The study thereby contributes to the field of interactional semantics and to longitudinal studies of social interaction.
  • Hyytiainen, Heidi K.; Jayaprakash, Balamuralikrishna; Kirjavainen, Pirkka V.; Saari, Sampo E.; Holopainen, Rauno; Keskinen, Jorma; Hämeri, Kaarle; Hyvarinen, Anne; Boor, Brandon E.; Taubel, Martin (2018)
    Background: Floor dust is commonly used for microbial determinations in epidemiological studies to estimate early-life indoor microbial exposures. Resuspension of floor dust and its impact on infant microbial exposure is, however, little explored. The aim of our study was to investigate how floor dust resuspension induced by an infant's crawling motion and an adult walking affects infant inhalation exposure to microbes. Results: We conducted controlled chamber experiments with a simplified mechanical crawling infant robot and an adult volunteer walking over carpeted flooring. We applied bacterial 16S rRNA gene sequencing and quantitative PCR to monitor the infant breathing zone microbial content and compared that to the adult breathing zone and the carpet dust as the source. During crawling, fungal and bacterial levels were, on average, 8- to 21-fold higher in the infant breathing zone compared to measurements from the adult breathing zone. During walking experiments, the increase in microbial levels in the infant breathing zone was far less pronounced. The correlation in rank orders of microbial levels in the carpet dust and the corresponding infant breathing zone sample varied between different microbial groups but was mostly moderate. The relative abundance of bacterial taxa was characteristically distinct in carpet dust and infant and adult breathing zones during the infant crawling experiments. Bacterial diversity in carpet dust and the infant breathing zone did not correlate significantly. Conclusions: The microbiota in the infant breathing zone differ in absolute quantitative and compositional terms from that of the adult breathing zone and of floor dust. Crawling induces resuspension of floor dust from carpeted flooring, creating a concentrated and localized cloud of microbial content around the infant. Thus, the microbial exposure of infants following dust resuspension is difficult to predict based on common house dust or bulk air measurements. Improved approaches for the assessment of infant microbial exposure, such as sampling at the infant breathing zone level, are needed.
  • Feng, Shaohong; Stiller, Josefin; Deng, Yuan; Armstrong, Joel; Fang, Qi; Reeve, Andrew Hart; Xie, Duo; Chen, Guangji; Guo, Chunxue; Faircloth, Brant C.; Petersen, Bent; Wang, Zongji; Zhou, Qi; Diekhans, Mark; Chen, Wanjun; Andreu-Sanchez, Sergio; Margaryan, Ashot; Howard, Jason Travis; Parent, Carole; Pacheco, George; Sinding, Mikkel-Holger S.; Puetz, Lara; Cavill, Emily; Ribeiro, Angela M.; Eckhart, Leopold; Fjeldsa, Jon; Hosner, Peter A.; Brumfield, Robb T.; Christidis, Les; Bertelsen, Mads F.; Sicheritz-Ponten, Thomas; Tietze, Dieter Thomas; Robertson, Bruce C.; Song, Gang; Borgia, Gerald; Claramunt, Santiago; Lovette, Irby J.; Cowen, Saul J.; Njoroge, Peter; Dumbacher, John Philip; Ryder, Oliver A.; Fuchs, Jerome; Bunce, Michael; Burt, David W.; Cracraft, Joel; Meng, Guanliang; Hackett, Shannon J.; Ryan, Peter G.; Jønsson, Knud Andreas; Jamieson, Ian G.; da Fonseca, Rute R.; Braun, Edward L.; Houde, Peter; Mirarab, Siavash; Suh, Alexander; Hansson, Bengt; Ponnikas, Suvi; Sigeman, Hanna; Stervander, Martin; Frandsen, Paul B.; van der Zwan, Henriette; van der Sluis, Rencia; Visser, Carina; Balakrishnan, Christopher N.; Clark, Andrew G.; Fitzpatrick, John W.; Bowman, Reed; Chen, Nancy; Cloutier, Alison; Sackton, Timothy B.; Edwards, Scott V.; Foote, Dustin J.; Shakya, Subir B.; Sheldon, Frederick H.; Vignal, Alain; Soares, Andre E. R.; Shapiro, Beth; Gonzalez-Solis, Jacob; Ferrer-Obiol, Joan; Rozas, Julio; Riutort, Marta; Tigano, Anna; Friesen, Vicki; Dalen, Love; Urrutia, Araxi O.; Szekely, Tamas; Liu, Yang; Campana, Michael G.; Corvelo, Andre; Fleischer, Robert C.; Rutherford, Kim M.; Gemmell, Neil J.; Dussex, Nicolas; Mouritsen, Henrik; Thiele, Nadine; Delmore, Kira; Liedvogel, Miriam; Franke, Andre; Hoeppner, Marc P.; Krone, Oliver; Fudickar, Adam M.; Mila, Borja; Ketterson, Ellen D.; Fidler, Andrew Eric; Friis, Guillermo; Parody-Merino, Angela M.; Battley, Phil F.; Cox, Murray P.; Lima, Nicholas Costa Barroso; Prosdocimi, Francisco; Parchman, Thomas Lee; Schlinger, Barney A.; Loiselle, Bette A.; Blake, John G.; Lim, Haw Chuan; Day, Lainy B.; Fuxjager, Matthew J.; Baldwin, Maude W.; Braun, Michael J.; Wirthlin, Morgan; Dikow, Rebecca B.; Ryder, T. Brandt; Camenisch, Glauco; Keller, Lukas F.; DaCosta, Jeffrey M.; Hauber, Mark E.; Louder, Matthew I. M.; Witt, Christopher C.; McGuire, Jimmy A.; Mudge, Joann; Megna, Libby C.; Carling, Matthew D.; Wang, Biao; Taylor, Scott A.; Del-Rio, Glaucia; Aleixo, Alexandre; Vasconcelos, Ana Tereza Ribeiro; Mello, Claudio V.; Weir, Jason T.; Haussler, David; Li, Qiye; Yang, Huanming; Wang, Jian; Lei, Fumin; Rahbek, Carsten; Gilbert, M. Thomas P.; Graves, Gary R.; Jarvis, Erich D.; Paten, Benedict; Zhang, Guojie (2020)
    Whole-genome sequencing projects are increasingly populating the tree of life and characterizing biodiversity(1-4). Sparse taxon sampling has previously been proposed to confound phylogenetic inference(5), and captures only a fraction of the genomic diversity. Here we report a substantial step towards the dense representation of avian phylogenetic and molecular diversity, by analysing 363 genomes from 92.4% of bird families-including 267 newly sequenced genomes produced for phase II of the Bird 10,000 Genomes (B10K) Project. We use this comparative genome dataset in combination with a pipeline that leverages a reference-free whole-genome alignment to identify orthologous regions in greater numbers than has previously been possible and to recognize genomic novelties in particular bird lineages. The densely sampled alignment provides a single-base-pair map of selection, has more than doubled the fraction of bases that are confidently predicted to be under conservation and reveals extensive patterns of weak selection in predominantly non-coding DNA. Our results demonstrate that increasing the diversity of genomes used in comparative studies can reveal more shared and lineage-specific variation, and improve the investigation of genomic characteristics. We anticipate that this genomic resource will offer new perspectives on evolutionary processes in cross-species comparative analyses and assist in efforts to conserve species. A dataset of the genomes of 363 species from the Bird 10,000 Genomes Project shows increased power to detect shared and lineage-specific variation, demonstrating the importance of phylogenetically diverse taxon sampling in whole-genome sequencing.
  • Hui, Nan; Grönroos, Mira; Roslund, Marja I.; Parajuli, Anirudra; Vari, Heli K.; Soininen, Laura; Laitinen, Olli H.; Sinkkonen, Aki; The ADELE Research Group (2019)
    Human activities typically lead to simplified urban diversity, which in turn reduces microbial exposure and increases the risk to urban dwellers from non-communicable diseases. To overcome this, we developed a microbial inoculant from forest and agricultural materials that resembles microbiota in organic soils. Three different sand materials (sieved, safety and sandbox) commonly used in playgrounds and other public spaces were enriched with 5 % of the inoculant. Skin microbiota on fingers (identified from bacterial 16S rDNA determined using Illumina MiSeq sequencing) was compared after touching non-enriched and microbial inoculant-enriched sands. Exposure to the non-enriched materials changed the skin bacterial community composition in distinct ways. When the inoculant was added to the materials, the overall shift in community composition was larger and the differences between different sand materials almost disappeared. Inoculant-enriched sand materials increased bacterial diversity and richness but did not affect evenness at the OTU level on skin. The Firmicutes/Bacteroidetes ratio was higher after touching inoculant-enriched compared to non-enriched sand materials. The relative abundance of opportunistic pathogens on skin was 40–50 % before touching sand materials, but dropped to 14 % and 4 % after touching standard and inoculant-enriched sand materials, respectively. When individual genera were analyzed, Pseudomonas sp. and Sphingomonas sp. were more abundant after touching standard, non-enriched sand materials, while only the relative abundance of Chryseobacterium sp. increased after touching the inoculant-enriched materials. As Chryseobacterium is harmless for healthy persons, and as standard landscaping materials and normal skin contain genera that include severe pathogens , the inoculant-enriched materials can be considered safe. Microbial inoculants could be specifically created to increase the proportion of non-pathogenic bacterial taxa and minimize the transfer of pathogenic taxa. We recommend further study into the usability of inoculant-enriched materials and their effects on the bacterial community composition of human skin and on the immune response.
  • Gagie, Travis; Hartikainen, Aleksi; Karhu, Kalle; Kärkkäinen, Juha; Navarro, Gonzalo; Puglisi, Simon J.; Sirén, Jouni (2017)
    Most of the fastest-growing string collections today are repetitive, that is, most of the constituent documents are similar to many others. As these collections keep growing, a key approach to handling them is to exploit their repetitiveness, which can reduce their space usage by orders of magnitude. We study the problem of indexing repetitive string collections in order to perform efficient document retrieval operations on them. Document retrieval problems are routinely solved by search engines on large natural language collections, but the techniques are less developed on generic string collections. The case of repetitive string collections is even less understood, and there are very few existing solutions. We develop two novel ideas, interleaved LCPs and precomputed document lists, that yield highly compressed indexes solving the problem of document listing (find all the documents where a string appears), top-k document retrieval (find the k documents where a string appears most often), and document counting (count the number of documents where a string appears). We also show that a classical data structure supporting the latter query becomes highly compressible on repetitive data. Finally, we show how the tools we developed can be combined to solve ranked conjunctive and disjunctive multi-term queries under the simple model of relevance. We thoroughly evaluate the resulting techniques in various real-life repetitiveness scenarios, and recommend the best choices for each case.
  • Tonkin-Hill, Gerry; Lees, John A.; Bentley, Stephen D.; Frost, Simon D. W.; Corander, Jukka (2019)
    We present fastbaps, a fast solution to the genetic clustering problem. Fastbaps rapidly identifies an approximate fit to a Dirichlet process mixture model (DPM) for clustering multilocus genotype data. Our efficient model-based clustering approach is able to cluster datasets 10-100 times larger than the existing model-based methods, which we demonstrate by analyzing an alignment of over 110 000 sequences of HIV-1 pol genes. We also provide a method for rapidly partitioning an existing hierarchy in order to maximize the DPM model marginal likelihood, allowing us to split phylogenetic trees into clades and sub-clades using a population genomic model. Extensive tests on simulated data as well as a diverse set of real bacterial and viral datasets show that fastbaps provides comparable or improved solutions to previous model-based methods, while being significantly faster. The method is made freely available under an open source MIT licence as an easy to use R package at https://github.com/gtonkinhill/fastbaps.
  • Sasic Zoric, Ljiljana; Stahls, Gunilla; Dan, Mihajla (2019)
    Wolbachia is a widespread bacterial endosymbiont among arthropod species. It influences the reproduction of the host species and also mitochondrial DNA diversity. Until now there were only a few studies that detected Wolbachia infections in hoverflies (Diptera: Syrphidae), and this is the first broader study with the aim of examining the incidence of Wolbachia in the hoverfly genus Merodon. The obtained results indicate an infection rate of 96% and the presence of both Wolbachia supergroup A and B, which are characteristic for most of the infected arthropod species. Additionally, the presence of multiple Wolbachia strains in the Merodon aureus group species was detected and the mitochondrial DNA COI-based relationships of the group are discussed in the light of infection. Finally, we discuss plant-mediated horizontal transmission of Wolbachia strains among the studied hoverfly species.
  • Odriozola, Inaki; Abrego, Nerea; Tlaskal, Vojtech; Zrustova, Petra; Morais, Daniel; Vetrovsky, Tomas; Ovaskainen, Otso; Baldrian, Petr (2021)
    Fungal-bacterial interactions play a key role in the functioning of many ecosystems. Thus, understanding their interactive dynamics is of central importance for gaining predictive knowledge on ecosystem functioning. However, it is challenging to disentangle the mechanisms behind species associations from observed co occurrence patterns, and little is known about the directionality of such interactions. Here, we applied joint species distribution modeling to high-throughput sequencing data on co-occurring fungal and bacterial communities in deadwood to ask whether fungal and bacterial co-occurrences result from shared habitat use (i.e., deadwood's properties) or whether there are fungal-bacterial interactive associations after habitat characteristics are taken into account. Moreover, we tested the hypothesis that the interactions are mainly modulated through fungal communities influencing bacterial communities. For that, we quantified how much the predictive power of the joint species distribution models for bacterial and fungal community improved when accounting for the other community. Our results show that fungi and bacteria form tight association networks (i.e., some species pairs co-occur more frequently and other species pairs co-occur less frequently than expected by chance) in deadwood that include common (or opposite) responses to the environment as well as (potentially) biotic interactions. Additionally, we show that information about the fungal occurrences and abundances increased the power to predict the bacterial abundances substantially, whereas information about the bacterial occurrences and abundances increased the power to predict the fungal abundances much less. Our results suggest that fungal communities may mainly affect bacteria in deadwood. IMPORTANCE Understanding the interactive dynamics between fungal and bacterial communities is important to gain predictive knowledge on ecosystem functioning. However, little is known about the mechanisms behind fungal-bacterial associations and the directionality of species interactions. Applying joint species distribution modeling to high-throughput sequencing data on co-occurring fungal-bacterial communities in deadwood, we found evidence that nonrandom fungal-bacterial associations derive from shared habitat use as well as (potentially) biotic interactions. Importantly, the combination of cross-validations and conditional cross-validations helped us to answer the question about the directionality of the biotic interactions, providing evidence that suggests that fungal communities may mainly affect bacteria in deadwood. Our modeling approach may help gain insight into the directionality of interactions between different components of the microbiome in other environments.