Browsing by Subject "DISCOVERY"

Sort by: Order: Results:

Now showing items 1-20 of 84
  • Pensar, Johan; Talvitie, Topi; Hyttinen, Antti; Koivisto, Mikko (The Association for the Advancement of Artificial Intelligence (AAAI), 2020)
    AAAI Conference on Artificial Intelligence
    We present a novel Bayesian method for the challenging task of estimating causal effects from passively observed data when the underlying causal DAG structure is unknown. To rigorously capture the inherent uncertainty associated with the estimate, our method builds a Bayesian posterior distribution of the linear causal effect, by integrating Bayesian linear regression and averaging over DAGs. For computing the exact posterior for all cause-effect variable pairs, we give an algorithm that runs in time O(3(d) d) for d variables, being feasible up to 20 variables. We also give a variant that computes the posterior probabilities of all pairwise ancestor relations within the same time complexity. significantly improving the fastest previous algorithm. In simulations, our Bayesian method outperforms previous methods in estimation accuracy, especially for small sample sizes. We further show that our method for effect estimation is well-adapted for detecting strong causal effects markedly deviating from zero, while our variant for computing posteriors of ancestor relations is the method of choice for detecting the mere existence of a causal relation. Finally, we apply our method on observational flow cytometry data, detecting several causal relations that concur with previous findings from experimental data.
  • Ahluwalia, Tarunveer S.; Schulz, Christina-Alexandra; Waage, Johannes; Skaaby, Tea; Sandholm, Niina; van Zuydam, Natalie; Charmet, Romain; Bork-Jensen, Jette; Almgren, Peter; Thuesen, Betina H.; Bedin, Mathilda; Brandslund, Ivan; Christensen, Cramer K.; Linneberg, Allan; Ahlqvist, Emma; Groop, Per-Henrik; Hadjadj, Samy; Tregouet, David-Alexandre; Jorgensen, Marit E.; Grarup, Niels; Pedersen, Oluf; Simons, Matias; Groop, Leif; Orho-Melander, Marju; McCarthy, Mark I.; Melander, Olle; Rossing, Peter; Kilpeläinen, Tuomas O.; Hansen, Torben (2019)
    Aims/hypothesisIdentifying rare coding variants associated with albuminuria may open new avenues for preventing chronic kidney disease and end-stage renal disease, which are highly prevalent in individuals with diabetes. Efforts to identify genetic susceptibility variants for albuminuria have so far been limited, with the majority of studies focusing on common variants.MethodsWe performed an exome-wide association study to identify coding variants in a two-stage (discovery and replication) approach. Data from 33,985 individuals of European ancestry (15,872 with and 18,113 without diabetes) and 2605 Greenlanders were included.ResultsWe identified a rare (minor allele frequency [MAF]: 0.8%) missense (A1690V) variant in CUBN (rs141640975, =0.27, p=1.3x10(-11)) associated with albuminuria as a continuous measure in the combined European meta-analysis. The presence of each rare allele of the variant was associated with a 6.4% increase in albuminuria. The rare CUBN variant had an effect that was three times stronger in individuals with type 2 diabetes compared with those without (p(interaction)=7.0x10(-4), with diabetes=0.69, without diabetes=0.20) in the discovery meta-analysis. Gene-aggregate tests based on rare and common variants identified three additional genes associated with albuminuria (HES1, CDC73 and GRM5) after multiple testing correction (p(Bonferroni)
  • Van Horebeek, Lies; Hilven, Kelly; Mallants, Klara; Van Nieuwenhuijze, Annemarie; Kelkka, Tiina; Savola, Paula; Mustjoki, Satu; Schlenner, Susan M.; Liston, Adrian; Dubois, Benedicte; Goris, An (2019)
    The role of somatic variants in diseases beyond cancer is increasingly being recognized, with potential roles in autoinflammatory and autoimmune diseases. However, as mutation rates and allele fractions are lower, studies in these diseases are substantially less tolerant of false positives, and bio-informatics algorithms require high replication rates. We developed a pipeline combining two variant callers, MuTect2 and VarScan2, with technical filtering and prioritization. Our pipeline detects somatic variants with allele fractions as low as 0.5% and achieves a replication rate of > 55%. Validation in an independent data set demonstrates excellent performance (sensitivity > 57%, specificity > 98%, replication rate > 80%). We applied this pipeline to the autoimmune disease multiple sclerosis (MS) as a proof-of-principle. We demonstrate that 60% of MS patients carry 2-10 exonic somatic variants in their peripheral blood T and B cells, with the vast majority (80%) occurring in T cells and variants persisting over time. Synonymous variants significantly co-occur with non-synonymous variants. Systematic characterization indicates somatic variants are enriched for being novel or very rare in public databases of germline variants and trend towards being more damaging and conserved, as reflected by higher phred-scaled combined annotation-dependent depletion (CADD) and genomic evolutionary rate profiling (GERP) scores. Our pipeline and proof-of-principle now warrant further investigation of common somatic genetic variation on top of inherited genetic variation in the context of autoimmune disease, where it may offer subtle survival advantages to immune cells and contribute to the capacity of these cells to participate in the autoimmune reaction.
  • Webb, Anne; Cottage, Amanda; Wood, Thomas; Khamassi, Khalil; Hobbs, Douglas; Gostkiewicz, Krystyna; White, Mark; Khazaei, Hamid; Ali, Mohamed; Street, Daniel; Duc, Gerard; Stoddard, Fred L.; Maalouf, Fouad; Ogbonnaya, Francis C.; Link, Wolfgang; Thomas, Jane; O'Sullivan, Donal M. (2016)
    Faba bean (Vicia faba L.) is a globally important nitrogen-fixing legume, which is widely grown in a diverse range of environments. In this work, we mine and validate a set of 845 SNPs from the aligned transcriptomes of two contrasting inbred lines. Each V. faba SNP is assigned by BLAST analysis to a single Medicago orthologue. This set of syntenically anchored polymorphisms were then validated as individual KASP assays, classified according to their informativeness and performance on a panel of 37 inbred lines, and the best performing 757 markers used to genotype six mapping populations. The six resulting linkage maps were merged into a single consensus map on which 687 SNPs were placed on six linkage groups, each presumed to correspond to one of the six V. faba chromosomes. This sequence-based consensus map was used to explore synteny with the most closely related crop species, lentil and the most closely related fully sequenced genome, Medicago. Large tracts of uninterrupted colinearity were found between faba bean and Medicago, making it relatively straightforward to predict gene content and order in mapped genetic interval. As a demonstration of this, we mapped a flower colour gene to a 2-cM interval of Vf chromosome 2 which was highly colinear with Mt3. The obvious candidate gene from 78 gene models in the collinear Medicago chromosome segment was the previously characterized MtWD40-1 gene controlling anthocyanin production in Medicago and resequencing of the Vf orthologue showed a putative causative deletion of the entire 50 end of the gene.
  • Kohonen, Pekka; Parkkinen, Juuso A.; Willighagen, Egon L.; Ceder, Rebecca; Wennerberg, Krister; Kaski, Samuel; Grafstrom, Roland C. (2017)
    Predicting unanticipated harmful effects of chemicals and drug molecules is a difficult and costly task. Here we utilize a 'big data compacting and data fusion'-concept to capture diverse adverse outcomes on cellular and organismal levels. The approach generates from transcriptomics data set a 'predictive toxicogenomics space' (PTGS) tool composed of 1,331 genes distributed over 14 overlapping cytotoxicity-related gene space components. Involving similar to 2.5 x 10(8) data points and 1,300 compounds to construct and validate the PTGS, the tool serves to: explain dose-dependent cytotoxicity effects, provide a virtual cytotoxicity probability estimate intrinsic to omics data, predict chemically-induced pathological states in liver resulting from repeated dosing of rats, and furthermore, predict human drug-induced liver injury (DILI) from hepatocyte experiments. Analysing 68 DILI-annotated drugs, the PTGS tool outperforms and complements existing tests, leading to a hereto-unseen level of DILI prediction accuracy.
  • Benetto Tiz, Davide; Skok, Žiga; Durcik, Martina; Tomašič, Tihomir; Peterlin Mašič, Lucija; Ilaš, Janez; Draskovits, Gábor; Révész, Tamás; Nyerges, Ákos; Pál, Csaba; Cruz, Cristina D.; Tammela, Päivi Sirpa Marjaana; Žigon, Dušan; Kikelj, Danijel; Zidar, Nace (2019)
    ATP competitive inhibitors of DNA gyrase and topoisomerase IV have great therapeutic potential, but none of the described synthetic compounds has so far reached the market. To optimise the activities and physicochemical properties of our previously reported N-phenylpyrrolamide inhibitors, we have synthesized an improved, chemically variegated selection of compounds and evaluated them against DNA gyrase and topoisomerase IV enzymes, and against selected Gram-positive and Gram-negative bacteria. The most potent compound displayed IC50 values of 6.9 nM against Escherichia coli DNA gyrase and 960 nM against Staphylococcus aureus topoisomerase IV. Several compounds displayed minimum inhibitory concentrations (MICs) against Gram-positive strains in the 1-50 mu M range, one of which inhibited the growth of Enterococcus faecalis, Enterococcus faecium, S. aureus and Streptococcus pyogenes with MIC values of 1.56 mu M, 1.56 mu M, 0.78 mu M and 0.72 mu M, respectively. This compound has been investigated further on methicillin-resistant S. aureus (MRSA) and on ciprofloxacin non-susceptible and extremely drug resistant strain of S. aureus (MRSA VISA). It exhibited the MIC value of 2.5 mu M on both strains, and MIC value of 32 mu M against MRSA in the presence of inactivated human blood serum. Further studies are needed to confirm its mode of action. (C) 2019 Elsevier Masson SAS. All rights reserved.
  • Lek, Monkol; Karczewski, Konrad J.; Minikel, Eric V.; Samocha, Kaitlin E.; Banks, Eric; Fennell, Timothy; O'Donnell-Luria, Anne H.; Ware, James S.; Hill, Andrew J.; Cummings, Beryl B.; Tukiainen, Taru; Birnbaum, Daniel P.; Kosmicki, Jack A.; Duncan, Laramie E.; Estrada, Karol; Zhao, Fengmei; Zou, James; Pierce-Hollman, Emma; Berghout, Joanne; Cooper, David N.; Deflaux, Nicole; DePristo, Mark; Do, Ron; Flannick, Jason; Fromer, Menachem; Gauthier, Laura; Goldstein, Jackie; Gupta, Namrata; Howrigan, Daniel; Kiezun, Adam; Kurki, Mitja I.; Moonshine, Ami Levy; Natarajan, Pradeep; Orozeo, Lorena; Peloso, Gina M.; Poplin, Ryan; Rivas, Manuel A.; Ruano-Rubio, Valentin; Rose, Samuel A.; Ruderfer, Douglas M.; Shakir, Khalid; Stenson, Peter D.; Stevens, Christine; Thomas, Brett P.; Tiao, Grace; Tusie-Luna, Maria T.; Weisburd, Ben; Palotie, Aarno; Tuomilehto, Jaakko; Daly, Mark J.; Exome Aggregation Consortium (2016)
    Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of predicted protein-truncating variants, with 72% of these genes having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human 'knockout' variants in protein-coding genes.
  • Marttinen, Pekka; Myllykangas, Samuel; Corander, Jukka (2009)
  • Muona, Mikko; Ishimura, Ryosuke; Laari, Anni; Ichimura, Yoshinobu; Linnankivi, Tarja; Keski-Filppula, Riikka; Herva, Riitta; Rantala, Heikki; Paetau, Anders; Pöyhönen, Minna; Obata, Miki; Uemura, Takefumi; Karhu, Thomas; Bizen, Norihisa; Takebayashi, Hirohide; McKee, Shane; Parker, Michael J.; Akawi, Nadia; McRae, Jeremy; Hurles, Matthew E.; Kuismin, Outi; Kurki, Mitja I.; Anttonen, Anna-Kaisa; Tanaka, Keiji; Palotie, Aarno; Waguri, Satoshi; Lehesjoki, Anna-Elina; Komatsus, Masaaki; DDD Study (2016)
    The ubiquitin fold modifier 1 (UFM1) cascade is a recently identified evolutionarily conserved ubiquitin-like modification system whose function and link to human disease have remained largely uncharacterized. By using exome sequencing in Finnish individuals with severe epileptic syndromes, we identified pathogenic compound heterozygous variants in UBAS, encoding an activating enzyme for UFM1, in two unrelated families. Two additional individuals with biallelic UBAS variants were identified from the UK-based Deciphering Developmental Disorders study and one from the Northern Finland Intellectual Disability cohort. The affected individuals (n = 9) presented in early infancy with severe irritability, followed by dystonia and stagnation of development. Furthermore, the majority of individuals display postnatal microcephaly and epilepsy and develop spasticity. The affected individuals were compound heterozygous for a missense substitution, c.1111G>A (p.A1a371Thr; allele frequency of 0.28% in Europeans), and a nonsense variant or c.164G>A that encodes an amino acid substitution p.Arg5SHis, but also affects splicing by facilitating exon 2 skipping, thus also being in effect a loss-of-function allele. Using an in vitro thioester formation assay and cellular analyses, we show that the p.A1a371Thr variant is hypomorphic with attenuated ability to transfer the activated UFM1 to UFC1. Finally, we show that the CNS-specific knockout of Ufml in mice causes neonatal death accompanied by microcephaly and apoptosis in specific neurons, further suggesting that the UFM1 system is essential for CNS development and function. Taken together, our data imply that the combination of a hypomorphic p.A1a371Thr variant in trans with a loss-of-function allele in UBAS underlies a severe infantile-onset encephalopathy.
  • Oduor, Joseph M. Ochieng; Kadija, Ermir; Nyachieo, Atunga; Mureithi, Marianne W.; Skurnik, Mikael (2020)
    Emergence of antibiotic-resistant bacteria is a serious threat to the public health. This is also true for Staphylococcus aureus and other staphylococci. Staphylococcus phages Stab20, Stab21, Stab22, and Stab23, were isolated in Albania. Based on genomic and phylogenetic analysis, they were classified to genus Kayvirus of the subfamily Twortvirinae. In this work, we describe the in-depth characterization of the phages that electron microscopy confirmed to be myoviruses. These phages showed tolerance to pH range of 5.4 to 9.4, to maximum UV radiation energy of 25 mu J/cm(2), to temperatures up to 45 degrees C, and to ethanol concentrations up to 25%, and complete resistance to chloroform. The adsorption rate constants of the phages ranged between 1.0 x 10(-9) mL/min and 4.7 x 10(-9) mL/min, and the burst size was from 42 to 130 plaque-forming units. The phages Stab20, 21, 22, and 23, originally isolated using Staphylococcus xylosus as a host, demonstrated varied host ranges among different Staphylococcus strains suggesting that they could be included in cocktail formulations for therapeutic or bio-control purpose. Phage particle proteomes, consisting on average of ca 60-70 gene products, revealed, in addition to straight-forward structural proteins, also the presence of enzymes such DNA polymerase, helicases, recombinases, exonucleases, and RNA ligase polymer. They are likely to be injected into the bacteria along with the genomic DNA to take over the host metabolism as soon as possible after infection.
  • Ravikumar, Balaguru; Alam, Zaid; Peddinti, Gopal; Aittokallio, Tero (2017)
    The advent of polypharmacology paradigm in drug discovery calls for novel chemoinformatic tools for analyzing compounds' multi-targeting activities. Such tools should provide an intuitive representation of the chemical space through capturing and visualizing underlying patterns of compound similarities linked to their polypharmacological effects. Most of the existing compound-centric chemoinformatics tools lack interactive options and user interfaces that are critical for the real-time needs of chemical biologists carrying out compound screening experiments. Toward that end, we introduce C-SPADE, an open-source exploratory web-tool for interactive analysis and visualization of drug profiling assays (biochemical, cell-based or cell-free) using compound-centric similarity clustering. C-SPADE allows the users to visually map the chemical diversity of a screening panel, explore investigational compounds in terms of their similarity to the screening panel, perform polypharmacological analyses and guide drug-target interaction predictions. C-SPADE requires only the raw drug profiling data as input, and it automatically retrieves the structural information and constructs the compound clusters in real-time, thereby reducing the time required for manual analysis in drug development or repurposing applications. The web-tool provides a customizable visual workspace that can either be downloaded as figure or Newick tree file or shared as a hyperlink with other users. C-SPADE is freely available at
  • Ravikumar, Balaguru; Timonen, Sanna; Alam, Zaid; Parri, Elina; Wennerberg, Krister; Aittokallio, Tero (2019)
    Owing to the intrinsic polypharmacological nature of most small-molecule kinase inhibitors, there is a need for computational models that enable systematic exploration of the chemogenomic landscape underlying druggable kinome toward more efficient kinome-profiling strategies. We implemented Virtual-KinomeProfiler, an efficient computational platform that captures distinct representations of chemical similarity space of the druggable kinome for various drug discovery endeavors. By using the computational platform, we profiled approximately 37 million compound-kinase pairs and made predictions for 151,708 compounds in terms of their repositioning and lead molecule potential, against 248 kinases simultaneously. Experimental testing with biochemical assays validated 51 of the predicted interactions, identifying 19 small-molecule inhibitors of EGFR, HCK, FLT1, and MSK1 protein kinases. The prediction model led to a 1.5-fold increase in precision and 2.8-fold decrease in false-discovery rate, when compared with traditional single-dose biochemical screening, which demonstrates its potential to drastically expedite the kinome-specific drug discovery process.
  • Lamichhane, Santosh; Kemppainen, Esko; Trost, Kajetan; Siljander, Heli; Hyöty, Heikki; Ilonen, Jorma; Toppari, Jorma; Veijola, Riitta; Hyötyläinen, Tuulia; Knip, Mikael; Oresic, Matej (2019)
    Aims/hypothesis Metabolic dysregulation may precede the onset of type 1 diabetes. However, these metabolic disturbances and their specific role in disease initiation remain poorly understood. In this study, we examined whether children who progress to type 1 diabetes have a circulatory polar metabolite profile distinct from that of children who later progress to islet autoimmunity but not type 1 diabetes and a matched control group. Methods We analysed polar metabolites from 415 longitudinal plasma samples in a prospective cohort of children in three study groups: those who progressed to type 1 diabetes; those who seroconverted to one islet autoantibody but not to type 1 diabetes; and an antibody-negative control group. Metabolites were measured using two-dimensional GC high-speed time of flight MS. Results In early infancy, progression to type 1 diabetes was associated with downregulated amino acids, sugar derivatives and fatty acids, including catabolites of microbial origin, compared with the control group. Methionine remained persistently upregulated in those progressing to type 1 diabetes compared with the control group and those who seroconverted to one islet autoantibody. The appearance of islet autoantibodies was associated with decreased glutamic and aspartic acids. Conclusions/interpretation Our findings suggest that children who progress to type 1 diabetes have a unique metabolic profile, which is, however, altered with the appearance of islet autoantibodies. Our findings may assist with early prediction of the disease.
  • Dziubanska-Kusibab, Paulina J.; Berger, Hilmar; Battistini, Federica; Bouwman, Britta A. M.; Iftekhar, Amina; Katainen, Riku; Cajuso, Tatiana; Crosetto, Nicola; Orozco, Modesto; Aaltonen, Lauri A.; Meyer, Thomas F. (2020)
    The mucosal epithelium is a common target of damage by chronic bacterial infections and the accompanying toxins, and most cancers originate from this tissue. We investigated whether colibactin, a potent genotoxin(1) associated with certain strains of Escherichia coli(2), creates a specific DNA-damage signature in infected human colorectal cells. Notably, the genomic contexts of colibactin-induced DNA double-strand breaks were enriched for an AT-rich hexameric sequence motif, associated with distinct DNA-shape characteristics. A survey of somatic mutations at colibactin target sites of several thousand cancer genomes revealed notable enrichment of this motif in colorectal cancers. Moreover, the exact double-strand-break loci corresponded with mutational hot spots in cancer genomes, reminiscent of a trinucleotide signature previously identified in healthy colorectal epithelial cells(3). The present study provides evidence for the etiological role of colibactin in human cancer. Identification of a DNA-damage signature induced by colibactin, a toxin expressed by some strains of Escherichia coli, is enriched in human colorectal cancers.
  • Riesgo, Ana; Andrade, Sonia C. S.; Sharma, Prashant P.; Novo, Marta; Perez-Porro, Alicia R.; Vahtera, Varpu; Gonzalez, Vanessa L.; Kawauchi, Gisele Y.; Giribet, Gonzalo (2012)
    Traditionally, genomic or transcriptomic data have been restricted to a few model or emerging model organisms, and to a handful of species of medical and/or environmental importance. Next-generation sequencing techniques have the capability of yielding massive amounts of gene sequence data for virtually any species at a modest cost. Here we provide a comparative analysis of de novo assembled transcriptomic data for ten non-model species of previously understudied animal taxa.
  • Pöhö, Päivi; Lipponen, Katriina; Bespalov, Maxim M.; Sikanen, Tiina; Kotiaho, Tapio; Kostiainen, Risto (2019)
    In this study, the feasibility of direct infusion electrospray ionization microchip mass spectrometry (chip-MS) was compared to the commonly used liquid chromatography-mass spectrometry (LC-MS) in non-targeted metabolomics analysis of human foreskin fibroblasts (HFF) and human induced pluripotent stem cells (hiPSC) reprogrammed from HFF. The total number of the detected features with chip-MS and LC-MS were 619 and 1959, respectively. Approximately 25% of detected features showed statistically significant changes between the cell lines with both analytical methods. The results show that chip-MS is a rapid and simple method that allows high sample throughput from small sample volumes and can detect the main metabolites and classify cells based on their metabolic profiles. However, the selectivity of chip-MS is limited compared to LC-MS and chip-MS may suffer from ion suppression.
  • Khan, Suleiman A.; Faisal, Ali; Mpindi, John Patrick; Parkkinen, Juuso A.; Kalliokoski, Tuomo; Poso, Antti; Kallioniemi, Olli P.; Wennerberg, Krister; Kaski, Samuel (2012)
  • Lamichhane, Santosh; Ahonen, Linda; Dyrlund, Thomas Sparholt; Dickens, Alex M.; Siljander, Heli; Hyöty, Heikki; Ilonen, Jorma; Toppari, Jorma; Veijola, Riitta; Hyötyläinen, Tuulia; Knip, Mikael; Oresic, Matej (2019)
    Previous studies suggest that children who progress to type 1 diabetes (T1D) later in life already have an altered serum lipid molecular profile at birth. Here, we compared cord blood lipidome across the three study groups: children who progressed to T1D (PT1D; n = 30), children who developed at least one islet autoantibody but did not progress to T1D during the follow-up (P1Ab; n = 33), and their age-matched controls (CTR; n = 38). We found that phospholipids, specifically sphingomyelins, were lower in T1D progressors when compared to P1Ab and the CTR. Cholesterol esters remained higher in PT1D when compared to other groups. A signature comprising five lipids was predictive of the risk of progression to T1D, with an area under the receiver operating characteristic curve (AUROC) of 0.83. Our findings provide further evidence that the lipidomic profiles of newborn infants who progress to T1D later in life are different from lipidomic profiles in P1Ab and CTR.
  • IDG-DREAM Drug-Kinase Binding; Cichonska, Anna; Ravikumar, Balaguru; Tanoli, Ziaurrehman; Aittokallio, Tero (2021)
    Despite decades of intensive search for compounds that modulate the activity of particular protein targets, a large proportion of the human kinome remains as yet undrugged. Effective approaches are therefore required to map the massive space of unexplored compound-kinase interactions for novel and potent activities. Here, we carry out a crowdsourced benchmarking of predictive algorithms for kinase inhibitor potencies across multiple kinase families tested on unpublished bioactivity data. We find the top-performing predictions are based on various models, including kernel learning, gradient boosting and deep learning, and their ensemble leads to a predictive accuracy exceeding that of single-dose kinase activity assays. We design experiments based on the model predictions and identify unexpected activities even for under-studied kinases, thereby accelerating experimental mapping efforts. The open-source prediction algorithms together with the bioactivities between 95 compounds and 295 kinases provide a resource for benchmarking prediction algorithms and for extending the druggable kinome. The IDG-DREAM Challenge carried out crowdsourced benchmarking of predictive algorithms for kinase inhibitor activities on unpublished data. This study provides a resource to compare emerging algorithms and prioritize new kinase activities to accelerate drug discovery and repurposing efforts.
  • Jones, Martin R.; Pinto, Ernani; Torres, Mariana A.; Dörr, Fabiane; Mazur-Marzec, Hanna; Szubert, Karolina; Tartaglione, Luciana; Dell'Aversano, Carmela; Miles, Christopher O.; Beach, Daniel G.; McCarron, Pearse; Sivonen, Kaarina; Fewer, David P.; Jokela, Jouni; Janssen, Elisabeth M.-L. (2021)
    Harmful cyanobacterial blooms, which frequently contain toxic secondary metabolites, are reported in aquatic environments around the world. More than two thousand cyanobacterial secondary metabolites have been reported from diverse sources over the past fifty years. A comprehensive, publically-accessible database detailing these secondary metabolites would facilitate research into their occurrence, functions and toxicological risks. To address this need we created CyanoMetDB, a highly curated, flat-file, openly-accessible database of cyanobacterial secondary metabolites collated from 850 peer-reviewed articles published between 1967 and 2020. CyanoMetDB contains 2010 cyanobacterial metabolites and 99 structurally related compounds. This has nearly doubled the number of entries with complete literature metadata and structural composition information compared to previously available open access databases. The dataset includes microcytsins, cyanopeptolins, other depsipeptides, anabaenopeptins, microginins, aeruginosins, cyclamides, cryptophycins, saxitoxins, spumigins, microviridins, and anatoxins among other metabolite classes. A comprehensive database dedicated to cyanobacterial secondary metabolites facilitates: (1) the detection and dereplication of known cyanobacterial toxins and secondary metabolites; (2) the identification of novel natural products from cyanobacteria; (3) research on biosynthesis of cyanobacterial secondary metabolites, including substructure searches; and (4) the investigation of their abundance, persistence, and toxicity in natural environments.