  • Thapa Magar, Purushottam (Helsingin yliopisto, 2021)
    Rapid growth and advancement of next generation sequencing (NGS) technologies have changed the landscape of genomic medicine. Today, clinical laboratories perform DNA sequencing on a regular basis, which is an error prone process. Erroneous data affects downstream analysis and produces fallacious result. Therefore, external quality assessment (EQA) of laboratories working with NGS data is crucial. Validation of variations such as single nucleotide polymor- phism (SNP) and InDels (<50 bp) is fairly accurate these days. However, detection and quality assessment of large changes such as the copy number variation (CNV) continues to be a concern. In this work, we aimed to study the feasibility of an automated CNV concordance analysis for the laboratory EQA services. We benchmarked variants reported by 25 laboratories against the highly curated gold standard for the son (HG002/NA24385) of the askenazim trio from the Personal Genome Project published by the Genome in a Bottle Consortium (GIAB). We employed two methods to conduct concordance of CNVs, the sequence based comparison with Truvari and the in-house exome-based comparison. For deletion calls of two whole genome sequencing (WGS) submissions, Truvari gained a value greater than 88% and 68% for precision and recall respectively. Conversely, the in-house method’s precision and recall score peaked at 39% and 7.9% respectively for one WGS submission for both deletion and duplication calls. The results indicate that automated CNV concordance analysis of the deletion calls for the WGS-based callset might be feasible with Truvari. On the other hand, results for panel-based targeted sequencing for the deletion calls showed precision and recall rates ranging from 0-80% and 0-5.6% respectively with Truvari. The result suggests that automated concordance analysis of CNVs for targeted sequencing remains a challenge. In conclusion, CNV concordance analysis depends on how the sequence data is generated.
  • Aivelo, Tuomas; Tschirren, Barbara (2020)
    Experimental field studies have demonstrated negative fitness consequences of Hen Flea Ceratophyllus gallinae infestations for bird hosts, yet it is currently unclear whether these negative effects are a direct consequence of flea-induced blood loss or a result of flea-borne pathogen transmission. Here we used a 16S rRNA gene sequencing approach to characterize the bacterial microbiota community of Hen Fleas collected from Great Tit Parus major nests and found that Brevibacterium (Actinobacteria), Staphylococcus (Firmicutes), Stenotrophomonas (Proteobacteria), Massilia (Proteobacteria), as well as the arthropod endosymbionts 'Candidatus Lariskella' and 'Candidatus Midichloria' were most abundant. We found evidence for the occurrence of Staphylococcus spp. in Hen Fleas, which may cause opportunistic infections in bird hosts, but not of other known pathogens commonly transmitted by other flea species, such as Bartonella spp. or Rickettsia spp. However, Hen Fleas might transmit other pathogens (e.g. viruses or bacteria that are not currently recognized as bird pathogens), which may contribute to the negative fitness consequences of Hen Flea infestations in addition to direct blood loss or secondary infections of wounds caused by biting fleas.
  • Miller, W.G.; Chapman, M.H.; Yee, E.; Revez, J.; Bono, J.L.; Rossi, M. (2017)
    Campylobacter avium is a thermotolerant Campylobacter species that has been isolated from poultry. C. avium was also the second hippuricase-positive species to be identified within Campylobacter. Here, we present the genome sequence of the C. avium type strain LMG 24591 (= CCUG 56292T), isolated in 2006 from a broiler chicken in Italy. © 2017 Miller et al.
  • Kringel, Dario; Malkusch, Sebastian; Kalso, Eija; Lötsch, Jorn (2021)
    The genetic background of pain is becoming increasingly well understood, which opens up possibilities for predicting the individual risk of persistent pain and the use of tailored therapies adapted to the variant pattern of the patient's pain-relevant genes. The individual variant pattern of pain-relevant genes is accessible via next-generation sequencing, although the analysis of all "pain genes" would be expensive. Here, we report on the development of a cost-effective next generation sequencing-based pain-genotyping assay comprising the development of a customized AmpliSeq (TM) panel and bioinformatics approaches that condensate the genetic information of pain by identifying the most representative genes. The panel includes 29 key genes that have been shown to cover 70% of the biological functions exerted by a list of 540 so-called "pain genes" derived from transgenic mice experiments. These were supplemented by 43 additional genes that had been independently proposed as relevant for persistent pain. The functional genomics covered by the resulting 72 genes is particularly represented by mitogen-activated protein kinase of extracellular signal-regulated kinase and cytokine production and secretion. The present genotyping assay was established in 61 subjects of Caucasian ethnicity and investigates the functional role of the selected genes in the context of the known genetic architecture of pain without seeking functional associations for pain. The assay identified a total of 691 genetic variants, of which many have reports for a clinical relevance for pain or in another context. The assay is applicable for small to large-scale experimental setups at contemporary genotyping costs.
  • Korpelainen, Helena; Pietiläinen, Maria (2017)
    In the present study, we conducted DNA metabarcoding (the nuclear ITS2 region) for indoor fungal samples originating from two nursery schools with a suspected mould problem (sampling before and after renovation), from two university buildings, and from an old farmhouse. Good-quality sequences were obtained, and the results showed that DNA metabarcoding provides high resolution in fungal identification. The pooled proportions of sequences representing filamentous ascomycetes, filamentous basidiomycetes, yeasts, and other fungi equalled 62.3%, 8.0%, 28.3%, and 1.4%, respectively, and the total number of fungal genera found during the study was 585. When comparing fungal diversities and taxonomic composition between different types of buildings, no obvious pattern was detected. The average pairwise values of Sorensen(Chao) indices that were used to compare similarities for taxon composition between samples among the samples from the two university buildings, two nurseries, and farmhouse equaled 0.693, 0.736, 0.852, 0.928, and 0.981, respectively, while the mean similarity index for all samples was 0.864. We discovered that making explicit conclusions on the relationship between the indoor air quality and mycoflora is complicated by the lack of appropriate indicators for air quality and by the occurrence of wide spatial and temporal changes in diversity and compositions among samples.
  • Hirvensalo, Päivi; Tornio, Aleksi; Neuvonen, Mikko; Kiander, Wilma; Kidron, Heidi; Paile-Hyvärinen, Maria; Tapaninen, Tuija; Backman, Janne T.; Niemi, Mikko (2019)
    Abstract The aim of this study was to investigate how variability in multiple genes related to pharmacokinetics affects fluvastatin exposure. We determined fluvastatin enantiomer pharmacokinetics and sequenced 379 pharmacokinetic genes in 200 healthy volunteers. CYP2C9*3 associated with significantly increased area under the plasma concentration-time curve (AUC) of both 3R,5S- and 3S,5R-fluvastatin (by 67% and 94% per variant allele copy, P = 3.77 ? 10-9 and P = 3.19 ? 10-12). In contrast, SLCO1B1 c.521T>C associated with increased AUC of active 3R,5S-fluvastatin only (by 34% per variant allele copy; P = 8.15 ? 10-8). A candidate gene analysis suggested that CYP2C9*2 also affects the AUC of both fluvastatin enantiomers and that SLCO2B1 single nucleotide variations (SNVs) may affect the AUC of 3S,5R-fluvastatin. Thus, SLCO transporters have enantiospecific effects on fluvastatin pharmacokinetics in humans. Genotyping of both CYP2C9 and SLCO1B1 may be useful in predicting fluvastatin efficacy and myotoxicity. This article is protected by copyright. All rights reserved.
  • Honkimaa, Anni; Kimura, Bryn; Sioofy-Khojine, Amir-Babak; Lin, Jake; Laiho, Jutta; Oikarinen, Sami; Hyöty, Heikki (2020)
    Coxsackie B (CVB) viruses have been associated with type 1 diabetes. We have recently observed that CVB1 was linked to the initiation of the autoimmune process leading to type 1 diabetes in Finnish children. Viral persistency in the pancreas is currently considered as one possible mechanism. In the current study persistent infection was established in pancreatic ductal and beta cell lines (PANC-1 and 1.1B4) using four different CVB1 strains, including the prototype strain and three clinical isolates. We sequenced 5 ' untranslated region (UTR) and regions coding for structural and non-structural proteins and the second single open reading frame (ORF) protein of all persisting CVB1 strains using next generation sequencing to identify mutations that are common for all of these strains. One mutation, K257R in VP1, was found from all persisting CVB1 strains. The mutations were mainly accumulated in viral structural proteins, especially at BC, DE, EF loops and C-terminus of viral capsid protein 1 (VP1), the puff region of VP2, the knob region of VP3 and infection-enhancing epitope of VP4. This showed that the capsid region of the viruses sustains various changes during persistency some of which could be hallmark(s) of persistency.
  • Crowgey, Erin L.; Soini, Tea; Shah, Nidhi; Pauniaho, Satu-Liisa; Lahdenne, Pekka; Wilson, David B.; Heikinheimo, Markku; Druley, Todd E. (2020)
    Purpose: Pediatric germ cell tumors are rare, representing about 3% of childhood malignancies in children less than 15 years of age, presenting in neonates or adolescents with a greater incidence noted in older adolescents. Aberrations in primordial germ cell proliferation/differentiation can lead to a variety of neoplasms, including teratomas, embryonal carcinoma, choriocarcinoma, and yolk sac tumors. Patients and Methods: Three Finnish families with varying familial germ cell tumors were identified, and whole-genome sequencing was performed using an Illumina sequencing platform. In total, 22 unique subjects across the three families were sequenced. Family 1 proband (female) was affected by malignant ovarian teratoma, Family 2 proband (female) was affected by sacrococcygeal teratoma with yolk sac tumor in the setting of Cornelia de Lange syndrome, and Family 3 proband (male) was affected by malignant testicular teratoma. Rare variants were identified using an autosomal recessive or de novo model of inheritance. Results: For family 1 proband (female), an autosomal recessive or de novo model of inheritance identified variants of interest in the following genes: CD109, IKBKB, and CTNNA3, SUPT6H, MUC5AC, and FRG1. Family 2 proband (female) analysis identified gene variants of interest in the following genes: LONRF2, ANO7, HS6ST1, PRB2, and DNM2. Family 3 proband (male) analysis identified the following potential genes: CRIPAK, KRTAP5-7, and CACNA1B. Conclusion: Leveraging deep pedigrees and next-generation sequencing, rare germline variants were identified that were enriched in three families from Finland with a history of familial germ cell tumors. The data presented support the importance of germline mutations when analyzing complex cancers with a low somatic mutation landscape.
  • Belyayev, Alexander; Josefiová, Jiřina; Jandová, Michaela; Kalendar, Ruslan; Krak, Karol; Mandák, Bohumil (2019)
    Satellite DNA (satDNA) is the most variable fraction of the eukaryotic genome. Related species share a common ancestral satDNA library, and changing of any library component in a particular lineage results in interspecific differences. Although the general developmental trend is clear, our knowledge of the origin and dynamics of satDNAs is still fragmentary. Here, we explore whole genome shotgun Illumina reads using the RepeatExplorer (RE) pipeline to infer satDNA family life stories in the genomes of Chenopodium species. The studied seven diploids represent separate lineages and provide an example of a species complex typical for angiosperms. Application of the RE pipeline allowed to determine by similarity searches the satDNA family with a basic monomer of ~40 bp and to trace its transformation from the reconstructed ancestral to the species-specific sequences. As a result, three types of satDNA family evolutionary development were distinguished: (i) concerted evolution with mutation and recombination events; (ii) concerted evolution with a trend toward increased complexity and length of the satellite monomer; and (iii) non-concerted evolution, with low levels of homogenization and multidirectional trends. The third type is an example of entire repeatome transformation, thus producing a novel set of satDNA families, and genomes showing non-concerted evolution are proposed as a significant source for genomic diversity.
  • Kondelin, Johanna; Martin, Samantha; Katainen, Riku; Renkonen-Sinisalo, Laura; Lepistö, Anna; Koskensalo, Selja; Böhm, Jan; Mecklin, Jukka-Pekka; Cajuso, Tatiana; Hänninen, Ulrika A.; Välimäki, Niko; Ravantti, Janne; Rajamäki, Kristiina; Palin, Kimmo; Aaltonen, Lauri A. (2021)
    Microsatellite instability (MSI) is caused by defective DNA mismatch repair (MMR), and manifests as accumulation of small insertions and deletions (indels) in short tandem repeats of the genome. Another form of repeat instability, elevated microsatellite alterations at selected tetranucleotide repeats (EMAST), has been suggested to occur in 50% to 60% of colorectal cancer (CRC), of which approximately one quarter are accounted for by MSI. Unlike for MSI, the criteria for defining EMAST is not consensual. EMAST CRCs have been suggested to form a distinct subset of CRCs that has been linked to a higher tumor stage, chronic inflammation, and poor prognosis. EMAST CRCs not exhibiting MSI have been proposed to show instability of di- and trinucleotide repeats in addition to tetranucleotide repeats, but lack instability of mononucleotide repeats. However, previous studies on EMAST have been based on targeted analysis of small sets of marker repeats, often in relatively few samples. To gain insight into tetranucleotide instability on a genome-wide level, we utilized whole genome sequencing data from 227 microsatellite stable (MSS) CRCs, 18 MSI CRCs, 3 POLE-mutated CRCs, and their corresponding normal samples. As expected, we observed tetranucleotide instability in all MSI CRCs, accompanied by instability of mono-, di-, and trinucleotide repeats. Among MSS CRCs, some tumors displayed more microsatellite mutations than others as a continuum, and no distinct subset of tumors with the previously proposed molecular characters of EMAST could be observed. Our results suggest that tetranucleotide repeat mutations in non-MSI CRCs represent stochastic mutation events rather than define a distinct CRC subclass.
  • Pour-Aboughadareh, Alireza; Kianersi, Farzad; Poczai, Péter; Moradkhani, Hoda (2021)
    Among cereal crops, wheat has been identified as a major source for human food consumption. Wheat breeders require access to new genetic diversity resources to satisfy the demands of a growing human population for more food with a high quality that can be produced in variable environmental conditions. The close relatives of domesticated wheats represent an ideal gene pool for the use of breeders. The genera Aegilops and Triticum are known as the main gene pool of domesticated wheat, including numerous species with different and interesting genomic constitutions. According to the literature, each wild relative harbors useful alleles which can induce resistance to various environmental stresses. Furthermore, progress in genetic and biotechnology sciences has provided accurate information regarding the phylogenetic relationships among species, which consequently opened avenues to reconsider the potential of each wild relative and to provide a context for how we can employ them in future breeding programs. In the present review, we have sought to represent the level of genetic diversity among the wild relatives of wheat, as well as the breeding potential of each wild species that can be used in wheat-breeding programs.
  • Seppälä, Hanna; Virtanen, Elina; Saarela, Mika; Laine, Pia; Paulin, Lars; Mannonen, Laura; Auvinen, Petri; Auvinen, Eeva (2017)
    Background. Progressive multifocal leukoencephalopathy (PML) is a fatal disease caused by reactivation of JC polyomavirus (JCPyV) in immunosuppressed individuals and lytic infection by neurotropic JCPyV in glial cells. The exact content of neurotropic mutations within individual JCPyV strains has not been studied to our knowledge. Methods. We exploited the capacity of single-molecule real-time sequencing technology to determine the sequence of complete JCPyV genomes in single reads. The method was used to precisely characterize individual neurotropic JCPyV strains of 3 patients with PML without the bias caused by assembly of short sequence reads. Results. In the cerebrospinal fluid sample of a 73-year-old woman with rapid PML onset, 3 distinct JCPyV populations could be identified. All viral populations were characterized by rearrangements within the noncoding regulatory region (NCCR) and 1 point mutation, S267L in the VP1 gene, suggestive of neurotropic strains. One patient with PML had a single neurotropic strain with rearranged NCCR, and 1 patient had a single strain with small NCCR alterations. Conclusions. We report here, for the first time, full characterization of individual neurotropic JCPyV strains in the cerebrospinal fluid of patients with PML. It remains to be established whether PML pathogenesis is driven by one or several neurotropic strains in an individual.
  • van Steenbeek, F. G.; Hytonen, M. K.; Leegwater, P. A. J.; Lohi, H. (2016)
    Since the annotation of its genome a decade ago, the dog has proven to be an excellent model for the study of inherited diseases. A large variety of spontaneous simple and complex phenotypes occur in dogs, providing physiologically relevant models to corresponding human conditions. In addition, gene discovery is facilitated in clinically less heterogeneous purebred dogs with closed population structures because smaller study cohorts and fewer markers are often sufficient to expose causal variants. Here, we review the development of genomic resources from microsatellites to whole-genome sequencing and give examples of successful findings that have followed the technological progress. The increasing amount of whole-genome sequence data warrants better functional annotation of the canine genome to more effectively utilise this unique model to understand genetic contributions in morphological, behavioural and other complex traits.
  • Fiorentino, Michelangelo; Gruppioni, Elisa; Massari, Francesco; Giunchi, Francesca; Altimari, Annalisa; Ciccarese, Chiara; Bimbatti, Davide; Scarpa, Aldo; Iacovelli, Roberto; Porta, Camillo; Virinder, Sarhadi; Tortora, Giampaolo; Artibani, Walter; Schiavina, Riccardo; Ardizzoni, Andrea; Brunelli, Matteo; Knuutila, Sakari; Martignoni, Guido (2017)
    Renal cell cancer (RCC) is characterized by histological and molecular heterogeneity that may account for variable response to targeted therapies. We evaluated retrospectively with a next generation sequencing (NGS) approach using a pre-designed cancer panel the mutation burden of 32 lesions from 22 metastatic RCC patients treated with at least one tyrosine kinase or mTOR inhibitor. We identified mutations in the VHL, PTEN, JAK3, MET, ERBB4, APC, CDKN2A, FGFR3, EGFR, RB1, TP53 genes. Somatic alterations were correlated with response to therapy. Most mutations hit VHL1 (31,8%) followed by PTEN (13,6%), JAK3, FGFR and TP53 (9% each). Eight (36%) patients were wild-type at least for the genes included in the panel. A genotype concordance between primary RCC and its secondary lesion was found in 3/6 cases. Patients were treated with Sorafenib, Sunitinib and Temsirolimus with partial responses in 4 (18,2%) and disease stabilization in 7 (31,8%). Among the 4 partial responders, 1 (25%) was wild-type and 3 (75%) harbored different VHL1 variants. Among the 7 patients with disease stabilization 2 (29%) were wild-type, 2 (29%) PTEN mutated, and single patients (14% each) displayed mutations in VHL1, JAK3 and APC/CDKN2A. Among the 11 non-responders 7 (64%) were wild-type, 2 (18%) were p53 mutated and 2 (18%) VHL1 mutated. No significant associations were found among RCC histotype, mutation variants and response to therapies. In the absence of predictive biomarkers for metastatic RCC treatment, a NGS approach may address single patients to basket clinical trials according to actionable molecular specific alterations.