THESIS
2017

xii, 105 pages : illustrations ; 30 cm

**Abstract**
Data in several applications can be represented as an uncertain graph, whose edges are
labelled with a probability of existence. Exact query processing on uncertain graphs is
prohibitive for most applications, as it involves evaluation over an exponential number of
instantiations. Thus, typical approaches employ Monte-Carlo sampling, which i) draws a
number of possible graphs (samples), ii) evaluates the query on each of them, and iii) aggregates
the individual answers to generate the final result. However, this approach can also
be extremely time consuming for large uncertain graphs commonly found in practice. To
overcome the high cost, in this thesis we propose two approaches. The first extracts deterministic
representative instances that capture structural properties of the u...[

Read more ]

Data in several applications can be represented as an uncertain graph, whose edges are
labelled with a probability of existence. Exact query processing on uncertain graphs is
prohibitive for most applications, as it involves evaluation over an exponential number of
instantiations. Thus, typical approaches employ Monte-Carlo sampling, which i) draws a
number of possible graphs (samples), ii) evaluates the query on each of them, and iii) aggregates
the individual answers to generate the final result. However, this approach can also
be extremely time consuming for large uncertain graphs commonly found in practice. To
overcome the high cost, in this thesis we propose two approaches. The first extracts deterministic
representative instances that capture structural properties of the uncertain graph.
The query and mining tasks can then be efficiently processed using deterministic algorithms
on these representatives. The second approach sparsifies the uncertain graph (i.e., reduces
the number of its edges) and redistributes its probabilities, minimizing the information loss.
Then, Monte-Carlo sampling applied to the reduced graph becomes much more efficient.

For the first approach, in order to maintain data utility, the representative instance should
preserve structural characteristics of the uncertain graph. We start with representatives that capture the expected vertex degrees, as this is a fundamental property of the graph topology.
We then generalize the notion of vertex degree to the concept of n-clique cardinality, i.e.,
the number of cliques of size n that contain a vertex. For the first problem, we propose two
methods: Average Degree Rewiring (ADR), which is based on random edge rewiring, and
Approximate B-matching (ABM), which applies graph matching techniques. For the second
problem, we develop a greedy approach and a game theoretic framework. We experimentally
demonstrate, with real uncertain graphs, that indeed the representative instances can be used
to answer, efficiently and accurately, queries based on several metrics such as shortest path
distance, clustering coefficient and betweenness centrality.

For the second approach, inspired by the literature in deterministic graph sparsification,
we aim at sparsified uncertain graphs that maintain the expected cut sizes of the original
graph. In addition, in order to reduce the uncertainty of the graph and the variance of the performed queries, we aim at decreasing the entropy of the input graph. To achieve these goals, we propose a framework that consists of two phases: The first (backbone generation), creates
a deterministic backbone graph. The second, (probability assignment), generates probabilities
for the edges of the backbone graph that capture the expected cuts, while avoiding the probability values that incur high entropy. Based on this framework, we propose two methods: Gradient Decent Backbone (GDB), which assigns probabilities based on gradient decent, and Expectation Maximization Degree (EMD), which iteratively updates the backbone graph and assigns probabilities until convergence. Moreover, we modify state-of-the-art
methods of deterministic graph sparsification to comply with the uncertain setting, and we
experimentally demonstrate that they underperform compared to the proposed techniques
in a variety of graph related queries, including shortest path distance, page rank, clustering
coefficient and reliability.

## Post a Comment

Your email address will not be published.