Modular discovery of monomeric and dimeric transcription factor binding motifs for large data sets

Show full item record



Permalink

http://hdl.handle.net/10138/236336

Citation

Toivonen , J , Kivioja , T , Jolma , A , Yin , Y , Taipale , J & Ukkonen , E 2018 , ' Modular discovery of monomeric and dimeric transcription factor binding motifs for large data sets ' , Nucleic Acids Research , vol. 46 , no. 8 . https://doi.org/10.1093/nar/gky027

Title: Modular discovery of monomeric and dimeric transcription factor binding motifs for large data sets
Author: Toivonen, Jarkko; Kivioja, Teemu; Jolma, Arttu; Yin, Yimeng; Taipale, Jussi; Ukkonen, Esko
Other contributor: University of Helsinki, Department of Computer Science
University of Helsinki, Genome-Scale Biology (GSB) Research Program
University of Helsinki, Karolinska Institutet
University of Helsinki, Genome-Scale Biology (GSB) Research Program
University of Helsinki, Department of Computer Science



Date: 2018-05-04
Language: eng
Number of pages: 16
Belongs to series: Nucleic Acids Research
ISSN: 0305-1048
DOI: https://doi.org/10.1093/nar/gky027
URI: http://hdl.handle.net/10138/236336
Abstract: In some dimeric cases of transcription factor (TF) binding, the specificity of dimeric motifs has been observed to differ notably from what would be expected were the two factors to bind to DNA independently of each other. Current motif discovery methods are unable to learn monomeric and dimeric motifs in modular fashion such that deviations from the expected motif would become explicit and the noise from dimeric occurrences would not corrupt monomeric models. We propose a novel modeling technique and an expectation maximization algorithm, implemented as software tool MODER, for discovering monomeric TF binding motifs and their dimeric combinations. Given training data and seeds for monomeric motifs, the algorithm learns in the same probabilistic framework a mixture model which represents monomeric motifs as standard position-specific probability matrices (PPMs), and dimeric motifs as pairs of monomeric PPMs, with associated orientation and spacing preferences. For dimers the model represents deviations from pure modular model of two independent monomers, thus making co-operative binding effects explicit. MODER can analyze in reasonable time tens of Mbps of training data. We validated the tool on HT-SELEX and ChIP-seq data. Our findings include some TFs whose expected model has palindromic symmetry but the observed model is directional.
Subject: 113 Computer and information sciences
Computational Biology
motif detection
3111 Biomedicine
1182 Biochemistry, cell and molecular biology
CHIP-SEQ DATA
EXPECTATION MAXIMIZATION ALGORITHM
EM ALGORITHM
DNA-BINDING
HUMAN GENOME
SITES
SEQUENCE
IDENTIFICATION
SPECIFICITIES
ALIGNMENT
Rights:


Files in this item

Total number of downloads: Loading...

Files Size Format View
gky027.pdf 4.703Mb PDF View/Open

This item appears in the following Collection(s)

Show full item record