GPrank : an R package for detecting dynamic elements from genome-wide time series

Show full item record



Topa , H & Honkela , A 2018 , ' GPrank : an R package for detecting dynamic elements from genome-wide time series ' , BMC Bioinformatics , vol. 19 , 367 .

Title: GPrank : an R package for detecting dynamic elements from genome-wide time series
Author: Topa, Hande; Honkela, Antti
Contributor organization: Institute for Molecular Medicine Finland
Probabilistic Mechanistic Models for Genomics research group / Antti Honkela
Biostatistics Helsinki
Department of Public Health
Department of Mathematics and Statistics
Helsinki Institute for Information Technology
Statistical and population genetics
Date: 2018-10-04
Language: eng
Number of pages: 6
Belongs to series: BMC Bioinformatics
ISSN: 1471-2105
Abstract: Background: Genome-wide high-throughput sequencing (HIS) time series experiments are a powerful tool for monitoring various genomic elements over time. They can be used to monitor, for example, gene or transcript expression with RNA sequencing (RNA-seq), DNA methylation levels with bisulfite sequencing (BS-seq), or abundances of genetic variants in populations with pooled sequencing (Pool-seq). However, because of high experimental costs, the time series data sets often consist of a very limited number of time points with very few or no biological replicates, posing challenges in the data analysis. Results: Here we present the GPrank R package for modelling genome-wide time series by incorporating variance information obtained during pre-processing of the HIS data using probabilistic quantification methods or from a beta-binomial model using sequencing depth. GPrank is well-suited for analysing both short and irregularly sampled time series. It is based on modelling each time series by two Gaussian process (GP) models, namely, time-dependent and time-independent GP models, and comparing the evidence provided by data under two models by computing their Bayes factor (BF). Genomic elements are then ranked by their BFs, and temporally most dynamic elements can be identified. Conclusions: Incorporating the variance information helps GPrank avoid false positives without compromising computational efficiency. Fitted models can be easily further explored in a browser. Detection and visualisation of temporally most active dynamic elements in the genome can provide a good starting point for further downstream analyses for increasing our understanding of the studied processes.
Subject: Gaussian process
High-throughput sequencing
Time series
Bayes factor
3111 Biomedicine
1182 Biochemistry, cell and molecular biology
Peer reviewed: Yes
Rights: cc_by
Usage restriction: openAccess
Self-archived version: publishedVersion

Files in this item

Total number of downloads: Loading...

Files Size Format View
s12859_018_2370_4.pdf 825.2Kb PDF View/Open

This item appears in the following Collection(s)

Show full item record