Kuronen, Juri
(Helsingin yliopisto, 2017)
This Master’s thesis introduces a new score-based method for learning the structure of a pairwise Markov network without imposing the assumption of chordality on the underlying graph structure by approximating the joint probability distribution using the popular pseudo-likelihood framework. Together with the local Markov property associated with the Markov network, the joint probability distribution is decomposed into node-wise conditional distributions involving only a tiny subset of variables each, getting rid of the problematic intractable normalizing constant. These conditional distributions can be naturally modeled using logistic regression, giving rise to pseudo-likelihood maximization with logistic regression (plmLR) which is designed to be especially well-suited for capturing pairwise interactions by restricting the explanatory variables to main effects (no interaction terms). To deal with overfitting, plmLR is regularized using an extended variant of the Bayesian information criterion.
To select the best model out of the vast discrete model space of network structures, a dynamic greedy hill-climbing search algorithm can be readily implemented with the pseudo-likelihood framework where each Markov blanket is learned separately so that the full graph can be composed from the solutions to these subproblems. This work also presents a novel improvement to the algorithm by drastically reducing the search space associated with each node-wise hill-climbing run by first running a set of pairwise queries to isolate only the promising candidates.
In experiments on data sets sampled from synthetic pairwise Markov networks, plmLR performs favorably against competing methods with respect to the Hamming distance between the learned and true network structure. Additionally, unlike most logistic regression based methods, plmLR is not limited to binary variables and performs well on learning benchmark network structures based on real-world non-binary models even though plmLR is not designed for their structural form.