SuperDCA for genome-wide epistasis analysis

Show full item record



Permalink

http://hdl.handle.net/10138/237525

Citation

Puranen , S , Pesonen , M , Pensar , J , Xu , Y Y , Lees , J A , Bentley , S D , Croucher , N J & Corander , J 2018 , ' SuperDCA for genome-wide epistasis analysis ' , Microbial Genomics , vol. 4 , no. 6 , 000184 . https://doi.org/10.1099/mgen.0.000184

Title: SuperDCA for genome-wide epistasis analysis
Author: Puranen, Santeri; Pesonen, Maiju; Pensar, Johan; Xu, Ying Ying; Lees, John A.; Bentley, Stephen D.; Croucher, Nicholas J.; Corander, Jukka
Contributor: University of Helsinki, Department of Mathematics and Statistics
University of Helsinki, Department of Mathematics and Statistics
University of Helsinki, Department of Mathematics and Statistics
University of Helsinki, Jukka Corander / Principal Investigator
Date: 2018-06
Language: eng
Number of pages: 12
Belongs to series: Microbial Genomics
ISSN: 2057-5858
URI: http://hdl.handle.net/10138/237525
Abstract: The potential for genome-wide modelling of epistasis has recently surfaced given the possibility of sequencing densely sampled populations and the emerging families of statistical interaction models. Direct coupling analysis (DCA) has previously been shown to yield valuable predictions for single protein structures, and has recently been extended to genome-wide analysis of bacteria, identifying novel interactions in the co-evolution between resistance, virulence and core genome elements. However, earlier computational DCA methods have not been scalable to enable model fitting simultaneously to 10(4)-10(5) polymorphisms, representing the amount of core genomic variation observed in analyses of many bacterial species. Here, we introduce a novel inference method (SuperDCA) that employs a new scoring principle, efficient parallelization, optimization and filtering on phylogenetic information to achieve scalability for up to 10(5) polymorphisms. Using two large population samples of Streptococcus pneumoniae, we demonstrate the ability of SuperDCA to make additional significant biological findings about this major human pathogen. We also show that our method can uncover signals of selection that are not detectable by genome-wide association analysis, even though our analysis does not require phenotypic measurements. SuperDCA, thus, holds considerable potential in building understanding about numerous organisms at a systems biological level.
Subject: population genomics
epistasis
linkage disequilibrium
DIRECT-COUPLING ANALYSIS
PROTEIN-STRUCTURE
STRUCTURE PREDICTION
MUTUAL INFORMATION
RESIDUE CONTACTS
SEQUENCE
IDENTIFICATION
MUTATIONS
EVOLUTION
1184 Genetics, developmental biology, physiology
1183 Plant biology, microbiology, virology
Rights:


Files in this item

Total number of downloads: Loading...

Files Size Format View
mgen000184.pdf 3.663Mb PDF View/Open

This item appears in the following Collection(s)

Show full item record