Zhou, Fang
(Helsingin yliopisto, 2012)
We propose network abstraction as a research area. It is motivated by the growth of networks in many areas of life. Consider, for instance, networks of thousands of genes, millions of people, or billions of web pages. They are too large to be directly analyzed by users. The aim of network abstraction is to summarize a large network as a smaller one. An abstracted network can then help users to see the overall topology of a large network, or to understand the connections of distant nodes.
The general network abstraction task is: given a large network, transform it into a smaller one, which contains in some well-specified sense the most relevant information. In this thesis, we analyze this research area and propose methods to solve some instances of the problem. The methods also provide different trade-offs between the graph quality and simplicity, as well as between result quality and efficiency.
More specifically, we propose two approaches to abstracting a network. The first one is to simplify a weighted network by removing edges under the constraint that distances between all pairs of nodes are preserved. We first empirically show that a number of edges can be removed from real biological networks without losing any graph connectivity. We next relax the constraint of fully preserving original graph connectivity, extend lossless network simplification to lossy network simplification, and demonstrate that many more edges can be removed with little loss of quality.
The second approach we give for network abstraction is to compress a weighted network by grouping nodes and edges. We propose novel methods and experimentally show that real graphs can be compressed efficiently with relatively little error. We next consider graphs with weights also on the nodes, and utilize them as node importances to extend the definition of weighted graph compression. We present new compression operations and demonstrate that the compressed graph can preserve more information related to more important nodes.
Furthermore, we propose the idea of using node weights and compression to summarize the metabolisms in a set of organisms, and apply the methodology to better understand the metabolic biodiversity between Archaea and Eubacteria, the two most fundamental branches of life.