Learning pairwise Markov Network structures with logistic regression

Show full item record



Permalink

http://urn.fi/URN:NBN:fi:hulib-201712125847
Title: Learning pairwise Markov Network structures with logistic regression
Author: Kuronen, Juri
Contributor: University of Helsinki, Faculty of Social Sciences, Department of Social Research
Publisher: Helsingin yliopisto
Date: 2017
Language: eng
URI: http://urn.fi/URN:NBN:fi:hulib-201712125847
http://hdl.handle.net/10138/229585
Thesis level: master's thesis
Discipline: Tilastotiede
Statistics
Statistik
Abstract: This Master’s thesis introduces a new score-based method for learning the structure of a pairwise Markov network without imposing the assumption of chordality on the underlying graph structure by approximating the joint probability distribution using the popular pseudo-likelihood framework. Together with the local Markov property associated with the Markov network, the joint probability distribution is decomposed into node-wise conditional distributions involving only a tiny subset of variables each, getting rid of the problematic intractable normalizing constant. These conditional distributions can be naturally modeled using logistic regression, giving rise to pseudo-likelihood maximization with logistic regression (plmLR) which is designed to be especially well-suited for capturing pairwise interactions by restricting the explanatory variables to main effects (no interaction terms). To deal with overfitting, plmLR is regularized using an extended variant of the Bayesian information criterion. To select the best model out of the vast discrete model space of network structures, a dynamic greedy hill-climbing search algorithm can be readily implemented with the pseudo-likelihood framework where each Markov blanket is learned separately so that the full graph can be composed from the solutions to these subproblems. This work also presents a novel improvement to the algorithm by drastically reducing the search space associated with each node-wise hill-climbing run by first running a set of pairwise queries to isolate only the promising candidates. In experiments on data sets sampled from synthetic pairwise Markov networks, plmLR performs favorably against competing methods with respect to the Hamming distance between the learned and true network structure. Additionally, unlike most logistic regression based methods, plmLR is not limited to binary variables and performs well on learning benchmark network structures based on real-world non-binary models even though plmLR is not designed for their structural form.
Subject: Markov network
Markov network structure learning
logistic regression
pseudo-likelihood
Bayesian information criterion


Files in this item

Files Size Format View

There are no files associated with this item.

This item appears in the following Collection(s)

Show full item record