Browsing by Subject "LASSO"

Sort by: Order: Results:

Now showing items 1-5 of 5
  • Haghighi, Mona; Johnson, Suzanne Bennett; Qian, Xiaoning; Lynch, Kristian F.; Vehik, Kendra; Huang, Shuai; TEDDY Study Grp; Knip, Mikael (2016)
    Regression models are extensively used in many epidemiological studies to understand the linkage between specific outcomes of interest and their risk factors. However, regression models in general examine the average effects of the risk factors and ignore subgroups with different risk profiles. As a result, interventions are often geared towards the average member of the population, without consideration of the special health needs of different subgroups within the population. This paper demonstrates the value of using rule-based analysis methods that can identify subgroups with heterogeneous risk profiles in a population without imposing assumptions on the subgroups or method. The rules define the risk pattern of subsets of individuals by not only considering the interactions between the risk factors but also their ranges. We compared the rule-based analysis results with the results from a logistic regression model in The Environmental Determinants of Diabetes in the Young (TEDDY) study. Both methods detected a similar suite of risk factors, but the rule-based analysis was superior at detecting multiple interactions between the risk factors that characterize the subgroups. A further investigation of the particular characteristics of each subgroup may detect the special health needs of the subgroup and lead to tailored interventions.
  • Calleja-Rodriguez, Ainhoa; Li, Zitong; Hallingbäck, Henrik R.; Sillanpää, Mikko J.; Wu, Harry X.; Abrahamsson, Sara; Garcia-Gil, Maria Rosario (2019)
    In forest tree breeding, family-based Quantitative Trait Loci (QTL) studies are valuable as methods to dissect the complexity of a trait and as a source of candidate genes. In the field of conifer research, our study contributes to the evaluation of phenotypic and predicted breeding values for the identification of QTL linked to complex traits in a three-generation pedigree population in Scots pine (Pinus sylvestris L.). A total of 11 470 open pollinated F-2-progeny trees established at three different locations, were measured for growth and adaptive traits. Breeding values were predicted for their 360 mothers, originating from a single cross of two grand-parents. A multilevel LASSO association analysis was conducted to detect QTL using genotypes of the mothers with the corresponding phenotypes and Estimated Breeding Values (EBV). Different levels of genotype-by-environment (G x E) effects among sites at different years, were detected for survival and height. Moderate-to-low narrow sense heritabilities and EBV accuracies were found for all traits and all sites. We identified 18 AFLPs and 12 SNPs to be associated with QTL for one or more traits. 62 QTL were significant with percentages of variance explained ranging from 1.7 to 18.9%. In those cases where the same marker was associated to a phenotypic or an ebvQTL, the ebvQTL always explained higher proportion of the variance, maybe due to the more accurate nature of Estimated Breeding Values (EBV). Two SNP-QTL showed pleiotropic effects for traits related with hardiness, seed, cone and flower production. Furthermore, we detected several QTL with significant effects across multiple ages, which could be considered as strong candidate loci for early selection. The lack of reproducibility of some QTL detected across sites may be due to environmental heterogeneity reflected by the genotype- and QTL-by-environment effects. (C) 2018 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license.
  • Kontio, Juho A. J.; Pyhäjärvi, Tanja; Sillanpää, Mikko J. (2021)
    Author summary Here we built up a mathematically justified bridge between 1) parametric approaches and 2) co-expression networks in light of identifying molecular interactions underlying complex traits. We first shared our concern that methodological improvements around these schemes, adjusting only their power and scalability, are bounded by more fundamental scheme-specific limitations. Subsequently, our theoretical results were exploited to overcome these limitations to find gene-by-gene interactions neither of which can capture alone. We also aimed to illustrate how this framework enables the interpretation of co-expression networks in a more parametric sense to achieve systematic insights into complex biological processes more reliably. The main procedure was fit for various types of biological applications and high-dimensional data to cover the area of systems biology as broadly as possible. In particular, we chose to illustrate the method's applicability for gene-profile based risk-stratification in cancer research using public acute myeloid leukemia datasets. A wide variety of 1) parametric regression models and 2) co-expression networks have been developed for finding gene-by-gene interactions underlying complex traits from expression data. While both methodological schemes have their own well-known benefits, little is known about their synergistic potential. Our study introduces their methodological fusion that cross-exploits the strengths of individual approaches via a built-in information-sharing mechanism. This fusion is theoretically based on certain trait-conditioned dependency patterns between two genes depending on their role in the underlying parametric model. Resulting trait-specific co-expression network estimation method 1) serves to enhance the interpretation of biological networks in a parametric sense, and 2) exploits the underlying parametric model itself in the estimation process. To also account for the substantial amount of intrinsic noise and collinearities, often entailed by expression data, a tailored co-expression measure is introduced along with this framework to alleviate related computational problems. A remarkable advance over the reference methods in simulated scenarios substantiate the method's high-efficiency. As proof-of-concept, this synergistic approach is successfully applied in survival analysis, with acute myeloid leukemia data, further highlighting the framework's versatility and broad practical relevance.
  • Zou, Yuan; Roos, Teemu (Springer International Publishing AG, 2016)
    Lecture Notes in Computer Science
    Modeling interactions in regression models poses both computational as well as statistical challenges: the computational resources and the amount of data required to solve them increases sharply with the size of the problem. We focus on logistic regression with categorical variables and propose a method for learning dependencies that are ex- pressed as general Boolean formulas. The computational and statistical challenges are solved by applying a technique called transformed Lasso, which involves a matrix transformation of the original covariates. We compare the method to an earlier related method, LogicReg, and show that our method scales better in terms of the number of covariates as well as the order and complexity of the interactions.