Modeling binding specificities of transcription factor pairs with random forests

Show full item record



Antikainen , A A , Heinonen , M & Lähdesmäki , H 2022 , ' Modeling binding specificities of transcription factor pairs with random forests ' , BMC Bioinformatics , vol. 23 , no. 1 , 212 .

Title: Modeling binding specificities of transcription factor pairs with random forests
Author: Antikainen, Anni A.; Heinonen, Markus; Lähdesmäki, Harri
Contributor organization: CAMM - Research Program for Clinical and Molecular Metabolism
University of Helsinki
Faculty of Medicine
Helsinki Institute for Information Technology
Date: 2022-06-03
Language: eng
Number of pages: 17
Belongs to series: BMC Bioinformatics
ISSN: 1471-2105
Abstract: Background Transcription factors (TFs) bind regulatory DNA regions with sequence specificity, form complexes and regulate gene expression. In cooperative TF-TF binding, two transcription factors bind onto a shared DNA binding site as a pair. Previous work has demonstrated pairwise TF-TF-DNA interactions with position weight matrices (PWMs), which may however not sufficiently take into account the complexity and flexibility of pairwise binding. Results We propose two random forest (RF) methods for joint TF-TF binding site prediction: ComBind and JointRF. We train models with previously published large-scale CAP-SELEX DNA libraries, which comprise DNA sequences enriched for binding of a selected TF pair. JointRF builds a random forest with sub-sequences selected from CAP-SELEX DNA reads with previously proposed pairwise PWM. JointRF outperforms (area under receiver operating characteristics curve, AUROC, 0.75) the current state-of-the-art method i.e. orientation and spacing specific pairwise PWMs (AUROC 0.59). Thus, JointRF may be utilized to improve prediction accuracy for pre-determined binding preferences. However, pairwise TF binding is currently considered flexible; a pair may bind DNA with different orientations and amounts of dinucleotide gaps or overlap between the two motifs. Thus, we developed ComBind, which utilizes random forests by considering simultaneously multiple orientations and spacings of the two factors. Our approach outperforms (AUROC 0.78) PWMs, as well as JointRF (p
Subject: Transcription factor pair
Random forest
DNA binding site
1182 Biochemistry, cell and molecular biology
11832 Microbiology and virology
Peer reviewed: Yes
Rights: cc_by
Usage restriction: openAccess
Self-archived version: publishedVersion

Files in this item

Total number of downloads: Loading...

Files Size Format View
Modeling_bindin ... rs_with_random_forests.pdf 3.853Mb PDF View/Open

This item appears in the following Collection(s)

Show full item record