Learning pairwise Markov Network structures with logistic regression

Näytä kaikki kuvailutiedot



Pysyväisosoite

http://urn.fi/URN:NBN:fi:hulib-201712125847
Julkaisun nimi: Learning pairwise Markov Network structures with logistic regression
Tekijä: Kuronen, Juri
Muu tekijä: Helsingin yliopisto, Valtiotieteellinen tiedekunta, Sosiaalitieteiden laitos
University of Helsinki, Faculty of Social Sciences, Department of Social Research
Helsingfors universitet, Statsvetenskapliga fakulteten, Institutionen för socialvetenskaper
Julkaisija: Helsingin yliopisto
Päiväys: 2017
Kieli: eng
URI: http://urn.fi/URN:NBN:fi:hulib-201712125847
http://hdl.handle.net/10138/229585
Opinnäytteen taso: pro gradu -tutkielmat
Oppiaine: Tilastotiede
Statistics
Statistik
Tiivistelmä: This Master’s thesis introduces a new score-based method for learning the structure of a pairwise Markov network without imposing the assumption of chordality on the underlying graph structure by approximating the joint probability distribution using the popular pseudo-likelihood framework. Together with the local Markov property associated with the Markov network, the joint probability distribution is decomposed into node-wise conditional distributions involving only a tiny subset of variables each, getting rid of the problematic intractable normalizing constant. These conditional distributions can be naturally modeled using logistic regression, giving rise to pseudo-likelihood maximization with logistic regression (plmLR) which is designed to be especially well-suited for capturing pairwise interactions by restricting the explanatory variables to main effects (no interaction terms). To deal with overfitting, plmLR is regularized using an extended variant of the Bayesian information criterion. To select the best model out of the vast discrete model space of network structures, a dynamic greedy hill-climbing search algorithm can be readily implemented with the pseudo-likelihood framework where each Markov blanket is learned separately so that the full graph can be composed from the solutions to these subproblems. This work also presents a novel improvement to the algorithm by drastically reducing the search space associated with each node-wise hill-climbing run by first running a set of pairwise queries to isolate only the promising candidates. In experiments on data sets sampled from synthetic pairwise Markov networks, plmLR performs favorably against competing methods with respect to the Hamming distance between the learned and true network structure. Additionally, unlike most logistic regression based methods, plmLR is not limited to binary variables and performs well on learning benchmark network structures based on real-world non-binary models even though plmLR is not designed for their structural form.
Avainsanat: Markov network
Markov network structure learning
logistic regression
pseudo-likelihood
Bayesian information criterion


Tiedostot

Tiedosto(t) Koko Formaatti Näytä

Tähän julkaisuun ei ole liitetty tiedostoja

Viite kuuluu kokoelmiin:

Näytä kaikki kuvailutiedot