Browsing by Subject "VARIABLE SELECTION"

Sort by: Order: Results:

Now showing items 1-11 of 11
  • Häppölä, Paavo; Havulinna, Aki S.; Tasa, Tönis; Mars, Nina; Perola, Markus; Kallela, Mikko; Milani, Lili; Koskinen, Seppo; Salomaa, Veikko; Neale, Benjamin M.; Palotie, Aarno; Daly, Mark; Ripatti, Samuli (2020)
    Health differences among the elderly and the role of medical treatments are topical issues in aging societies. We demonstrate the use of modern statistical learning methods to develop a data-driven health measure based on 21 years of pharmacy purchase and mortality data of 12,047 aging individuals. The resulting score was validated with 33,616 individuals from two fully independent datasets and it is strongly associated with all-cause mortality (HR 1.18 per point increase in score; 95% CI 1.14-1.22; p=2.25e-16). When combined with Charlson comorbidity index, individuals with elevated medication score and comorbidity index had over six times higher risk (HR 6.30; 95% CI 3.84-10.3; AUC=0.802) compared to individuals with a protective score profile. Alone, the medication score performs similarly to the Charlson comorbidity index and is associated with polygenic risk for coronary heart disease and type 2 diabetes.
  • PCAWG Evolution Heterogeneity Work; PCAWG Consortium; Dentro, Stefan C.; Mustonen, Ville (2021)
    Intra-tumor heterogeneity (ITH) is a mechanism of therapeutic resistance and therefore an important clinical challenge. However, the extent, origin, and drivers of ITH across cancer types are poorly understood. To address this, we extensively characterize ITH across whole-genome sequences of 2,658 cancer samples spanning 38 cancer types. Nearly all informative samples (95.1 %) contain evidence of distinct subclonal expansions with frequent branching relationships between subclones, We observe positive selection of subclonal driver mutations across most cancer types and identify cancer type-specific subclonal patterns of driver gene mutations, fusions, structural variants, and copy number alterations as well as dynamic changes in mutational processes between subclonal expansions. Our results underline the importance of ITH and its drivers in tumor evolution and provide a pan-cancer resource of comprehensively annotated subclonal events from whole-genome sequencing data.
  • Rönkä, Katja; Valkonen, Janne K.; Nokelainen, Ossi; Rojas, Bibiana; Gordon, Swanne; Burdfield-Steel, Emily; Mappes, Johanna (2020)
    Warning signals are predicted to develop signal monomorphism via positive frequency-dependent selection (+FDS) albeit many aposematic systems exhibit signal polymorphism. To understand this mismatch, we conducted a large-scale predation experiment in four countries, among which the frequencies of hindwing warning coloration of the aposematic moth,Arctia plantaginis,differ. Here we show that selection by avian predators on warning colour is predicted by local morph frequency and predator community composition. We found +FDS to be the strongest in monomorphic Scotland and lowest in polymorphic Finland, where the attack risk of moth morphs depended on the local avian community. +FDS was also found where the predator community was the least diverse (Georgia), whereas in the most diverse avian community (Estonia), hardly any models were attacked. Our results support the idea that spatial variation in predator communities alters the strength or direction of selection on warning signals, thus facilitating a geographic mosaic of selection.
  • Sundin, Iiris; Peltola, Tomi; Micallef, Luana; Afrabandpey, Homayun; Soare, Marta; Majumder, Muntasir Mamun; Daee, Pedram; He, Chen; Serim, Baris; Havulinna, Aki; Heckman, Caroline; Jacucci, Giulio; Marttinen, Pekka; Kaski, Samuel (2018)
    Motivation: Precision medicine requires the ability to predict the efficacies of different treatments for a given individual using high-dimensional genomic measurements. However, identifying predictive features remains a challenge when the sample size is small. Incorporating expert knowledge offers a promising approach to improve predictions, but collecting such knowledge is laborious if the number of candidate features is very large. Results: We introduce a probabilistic framework to incorporate expert feedback about the impact of genomic measurements on the outcome of interest and present a novel approach to collect the feedback efficiently, based on Bayesian experimental design. The new approach outperformed other recent alternatives in two medical applications: prediction of metabolic traits and prediction of sensitivity of cancer cells to different drugs, both using genomic features as predictors. Furthermore, the intelligent approach to collect feedback reduced the workload of the expert to approximately 11%, compared to a baseline approach.
  • Aernouts, Ben; Adriaens, Ines; Diaz-Olivares, Jose; Saeys, Wouter; Mantysaari, Paivi; Kokkonen, Tuomo; Mehtio, Terhi; Kajava, Sari; Lidauer, Paula; Lidauer, Martin H.; Pastell, Matti (2020)
    In high-yielding dairy cattle, severe postpartum negative energy balance is often associated with metabolic and infectious disorders that negatively affect production, fertility, and welfare. Mobilization of adipose tissue associated with negative energy balance is reflected through an increased level of nonesterified fatty acids (NEFA) in the blood plasma. Earlier, identification of negative energy balance through detection of increased blood plasma NEFA concentration required laborious and stressful blood sampling. More recently, attempts have been made to predict blood NEFA concentration from milk samples. In this study, we aimed to develop and validate a model to predict blood plasma NEFA concentration using the milk mid-infrared (MIR) spectra that are routinely measured in the context of milk recording. To this end, blood plasma and milk samples were collected in wk 2, 3, and 20 postpartum for 192 lactations in 3 herds. The blood plasma samples were taken in the morning, and representative milk samples were collected during the morning and evening milk sessions on the same day. To predict plasma NEFA concentration from the milk MIR spectra, partial least squares regression models were trained on part of the observations from the first herd. The models were then thoroughly validated on all other observations of the first herd and on the observations of the 2 independent herds to explore their robustness and wide applicability. The final model could accurately predict blood plasma NEFA concentrations 1.2 mmol/L NEFA, the model clearly underestimated the true level. Additionally, we found that morning blood plasma NEFA levels were predicted with significantly higher accuracy using MIR spectra of evening milk samples compared with MIR spectra of morning samples, with root mean square error of prediction values of, respectively, 0.182 and 0.197 mmol/L, and R-2 values of 0.613 and 0.502. These results suggest a time delay between variations in blood plasma NEFA and related milk biomarkers. Based on the MIR spectra of evening milk samples, cows at risk for negative energy status, indicated by detrimental morning blood plasma NEFA levels (>0.6 mmol/L), could be identified with a sensitivity and specificity of, respectively, 0.831 and 0.800. As this model can be applied to millions of historical and future milk MIR spectra, it opens an opportunity for regular metabolic screening and improved resilience phenotyping.
  • Wang, Yingnan; Zhao, Yongxin; Wang, Yu; Li, Zitong; Guo, Baocheng; Merilä, Juha (2020)
    Abstract The degree to which adaptation to similar selection pressures is underlain by parallel vs. non-parallel genetic changes is a topic of broad interest in contemporary evolutionary biology. Sticklebacks provide opportunities to characterize and compare the genetic underpinnings of repeated marine-freshwater divergences at both intra- and interspecific levels. While the degree of genetic parallelism in repeated marine-freshwater divergences has been frequently studied in the three-spined stickleback (Gasterosteus aculeatus), much less is known about this in other stickleback species. Using a population transcriptomic approach, we identified both genetic and gene expression variations associated with marine-freshwater divergence in the nine-spined stickleback (Pungitius pungitius). Specifically, we used a genome-wide association study approach, and found that ~1% of the total 173,491 identified SNPs showed marine-freshwater ecotypic differentiation. A total of 861 genes were identified to have SNPs associated with marine-freshwater divergence in nine-spined stickleback, but only 12 of these genes have also been reported as candidates associated with marine-freshwater divergence in the three-spined stickleback. Hence, our results indicate a low degree of interspecific genetic parallelism in marine-freshwater divergence. Moreover, 1,578 genes in the brain and 1,050 genes in the liver were differentially expressed between marine and freshwater nine-spined sticklebacks, ~5% of which have also been identified as candidates associated with marine-freshwater divergence in the three-spined stickleback. However, only few of these (e.g., CLDND1) appear to have been involved in repeated marine-freshwater divergence in nine-spined sticklebacks. Taken together, the results indicate a low degree of genetic parallelism in repeated marine-freshwater divergence both at intra- and interspecific levels.
  • Okser, Sebastian; Pahikkala, Tapio; Airola, Antti; Salakoski, Tapio; Ripatti, Samuli; Aittokallio, Tero (2014)
  • Sysi-Aho, Marko; Koikkalainen, Juha; Seppanen-Laakso, Tuulikki; Kaartinen, Maija; Kuusisto, Johanna; Peuhkurinen, Keijo; Karkkainen, Satu; Antila, Margareta; Lauerma, Kirsi-Maria Susanna; Reissell, Eeva; Jurkko, Raija; Lotjonen, Jyrki; Heliö, Tiina; Oresic, Matej (2011)
  • Serra, Angela; Fratello, Michele; Cattelani, Luca; Liampa, Irene; Melagraki, Georgia; Kohonen, Pekka; Nymark, Penny; Federico, Antonio; Kinaret, Pia Anneli Sofia; Jagiello, Karolina; Ha, My Kieu; Choi, Jang-Sik; Sanabria, Natasha; Gulumian, Mary; Puzyn, Tomasz; Yoon, Tae-Hyun; Sarimveis, Haralambos; Grafström, Roland; Afantitis, Antreas; Greco, Dario (2020)
    Transcriptomics data are relevant to address a number of challenges in Toxicogenomics (TGx). After careful planning of exposure conditions and data preprocessing, the TGx data can be used in predictive toxicology, where more advanced modelling techniques are applied. The large volume of molecular profiles produced by omics-based technologies allows the development and application of artificial intelligence (AI) methods in TGx. Indeed, the publicly available omics datasets are constantly increasing together with a plethora of different methods that are made available to facilitate their analysis, interpretation and the generation of accurate and stable predictive models. In this review, we present the state-of-the-art of data modelling applied to transcriptomics data in TGx. We show how the benchmark dose (BMD) analysis can be applied to TGx data. We review read across and adverse outcome pathways (AOP) modelling methodologies. We discuss how network-based approaches can be successfully employed to clarify the mechanism of action (MOA) or specific biomarkers of exposure. We also describe the main AI methodologies applied to TGx data to create predictive classification and regression models and we address current challenges. Finally, we present a short description of deep learning (DL) and data integration methodologies applied in these contexts. Modelling of TGx data represents a valuable tool for more accurate chemical safety assessment. This review is the third part of a three-article series on Transcriptomics in Toxicogenomics.