Learning pairwise Markov Network structures with logistic regression

Show simple item record

dc.contributor Helsingin yliopisto, Valtiotieteellinen tiedekunta, Sosiaalitieteiden laitos fi
dc.contributor University of Helsinki, Faculty of Social Sciences, Department of Social Research en
dc.contributor Helsingfors universitet, Statsvetenskapliga fakulteten, Institutionen för socialvetenskaper sv
dc.contributor.author Kuronen, Juri
dc.date.issued 2017
dc.identifier.uri URN:NBN:fi:hulib-201712125847
dc.identifier.uri http://hdl.handle.net/10138/229585
dc.description.abstract This Master’s thesis introduces a new score-based method for learning the structure of a pairwise Markov network without imposing the assumption of chordality on the underlying graph structure by approximating the joint probability distribution using the popular pseudo-likelihood framework. Together with the local Markov property associated with the Markov network, the joint probability distribution is decomposed into node-wise conditional distributions involving only a tiny subset of variables each, getting rid of the problematic intractable normalizing constant. These conditional distributions can be naturally modeled using logistic regression, giving rise to pseudo-likelihood maximization with logistic regression (plmLR) which is designed to be especially well-suited for capturing pairwise interactions by restricting the explanatory variables to main effects (no interaction terms). To deal with overfitting, plmLR is regularized using an extended variant of the Bayesian information criterion. To select the best model out of the vast discrete model space of network structures, a dynamic greedy hill-climbing search algorithm can be readily implemented with the pseudo-likelihood framework where each Markov blanket is learned separately so that the full graph can be composed from the solutions to these subproblems. This work also presents a novel improvement to the algorithm by drastically reducing the search space associated with each node-wise hill-climbing run by first running a set of pairwise queries to isolate only the promising candidates. In experiments on data sets sampled from synthetic pairwise Markov networks, plmLR performs favorably against competing methods with respect to the Hamming distance between the learned and true network structure. Additionally, unlike most logistic regression based methods, plmLR is not limited to binary variables and performs well on learning benchmark network structures based on real-world non-binary models even though plmLR is not designed for their structural form. en
dc.language.iso eng
dc.publisher Helsingin yliopisto fi
dc.publisher University of Helsinki en
dc.publisher Helsingfors universitet sv
dc.subject Markov network en
dc.subject Markov network structure learning en
dc.subject logistic regression en
dc.subject pseudo-likelihood en
dc.subject Bayesian information criterion en
dc.title Learning pairwise Markov Network structures with logistic regression en
dc.type.ontasot pro gradu -tutkielmat fi
dc.type.ontasot master's thesis en
dc.type.ontasot pro gradu-avhandlingar sv
dc.subject.discipline Tilastotiede fi
dc.subject.discipline Statistics en
dc.subject.discipline Statistik sv
dct.identifier.urn URN:NBN:fi:hulib-201712125847

Files in this item

Files Size Format View
Kuronen_Tilastotiede.pdf 2.698Mb application/pdf View/Open

This item appears in the following Collection(s)

Show simple item record