MODER2: First-order Markov Modeling and Discovery of Monomeric and Dimeric Binding Motifs

Show full item record



Permalink

http://hdl.handle.net/10138/316283

Citation

Toivonen , J , Das , P , Taipale , J & Ukkonen , E 2020 , ' MODER2: First-order Markov Modeling and Discovery of Monomeric and Dimeric Binding Motifs ' , Bioinformatics , vol. 36 , no. 9 , pp. 2690-2696 . https://doi.org/10.1093/bioinformatics/btaa045

Title: MODER2: First-order Markov Modeling and Discovery of Monomeric and Dimeric Binding Motifs
Author: Toivonen, Jarkko; Das, Pratyush; Taipale, Jussi; Ukkonen, Esko
Contributor: University of Helsinki, Department of Computer Science
University of Helsinki, ATG - Applied Tumor Genomics
University of Helsinki, Jussi Taipale / Principal Investigator
University of Helsinki, Department of Computer Science
Date: 2020-05-01
Language: eng
Number of pages: 7
Belongs to series: Bioinformatics
ISSN: 1367-4803
URI: http://hdl.handle.net/10138/316283
Abstract: Motivation: Position-specific probability matrices (PPMs, also called position-specific weight matrices) have been the dominating model for transcription factor (TF)-binding motifs in DNA. There is, however, increasing recent evidence of better performance of higher order models such as Markov models of order one, also called adjacent dinucleotide matrices (ADMs). ADMs can model dependencies between adjacent nucleotides, unlike PPMs. A modeling technique and software tool that would estimate such models simultaneously both for monomers and their dimers have been missing. Results: We present an ADM-based mixture model for monomeric and dimeric TF-binding motifs and an expectation maximization algorithm MODER2 for learning such models from training data and seeds. The model is a mixture that includes monomers and dimers, built from the monomers, with a description of the dimeric structure (spacing, orientation). The technique is modular, meaning that the co-operative effect of dimerization is made explicit by evaluating the difference between expected and observed models. The model is validated using HT-SELEX and generated datasets, and by comparing to some earlier PPM and ADM techniques. The ADM models explain data slightly better than PPM models for 314 tested TFs (or their DNA-binding domains) from four families (bHLH, bZIP, ETS and Homeodomain), the ADM mixture models by MODER2 being the best on average.
Subject: 1182 Biochemistry, cell and molecular biology
PROTEIN-DNA INTERACTIONS
TRANSCRIPTION FACTOR
EM ALGORITHM
SITES
IDENTIFICATION
SEQUENCE
RECOGNITION
SPECIFICITY
POSITION
11832 Microbiology and virology
113 Computer and information sciences
Rights:


Files in this item

Total number of downloads: Loading...

Files Size Format View
btaa045.pdf 898.7Kb PDF View/Open

This item appears in the following Collection(s)

Show full item record