THESIS
2022
1 online resource (xviii, 117 pages) : illustrations (some color)
Abstract
Graph data has been prevalent with its ability to model many real-life applications woven
by huge and complex networks of relations and interactions. Recently, graph neural
networks (GNNs) have come into the spotlight in their great success to model the tangled
relations in graph data, as a powerful machine learning tool. However, the major challenge
that limits the adoption and deployment of GNNs to large-scale graphs in the real
world, despite their promising performance, lies in the inability to utilize all the data in
finite time and the scalability of the algorithm itself.
To mitigate the memory requirement, tremendous efforts have been made towards
sampling-based mini-batch GNN training. Therefore, various sampling strategies have
been proposed, aiming to improve the training eff...[
Read more ]
Graph data has been prevalent with its ability to model many real-life applications woven
by huge and complex networks of relations and interactions. Recently, graph neural
networks (GNNs) have come into the spotlight in their great success to model the tangled
relations in graph data, as a powerful machine learning tool. However, the major challenge
that limits the adoption and deployment of GNNs to large-scale graphs in the real
world, despite their promising performance, lies in the inability to utilize all the data in
finite time and the scalability of the algorithm itself.
To mitigate the memory requirement, tremendous efforts have been made towards
sampling-based mini-batch GNN training. Therefore, various sampling strategies have
been proposed, aiming to improve the training efficiency and scalability of current GNN
models over large-scale graphs, such as node-wise sampling, layer-wise importance sampling,
and the fundamentally different subgraph sampling. Though existing samplers
can help GNNs to scale over large-scale graphs, information loss from not sufficiently extracting higher-order neighbors inevitably limits the expressive power of these shallow
architectures. Moreover, the size of the graph data and the complexity of the model can
easily reach the limit of a single device.
To mitigate the memory requirement with ever-increasing graph data and model size,
distributed GNN processing is the inevitable remedy. Yet, a distributed implementation
of the GNN learning algorithms, especially an efficient one, is nowhere near trivial. Aside
from the substantial memory footprint, distributed GNNs are memory-intensive as well
as compute-intensive due to coupled irregular neighborhood access and iterative learning
procedure. Consequently, both the intensity and the volume of data communication
and the balance of workload makes distributed GNNs more challenging. Some efforts
have been made towards distributed GNN training, at the price of heavy preprocessing,
complex workflow, sampling overhead, information loss, and no convergence guarantee.
In this thesis, we investigate the scaling of GNN training on large graphs in pursuit
of effectiveness and efficiency. First, we propose to further advance the effectiveness
of sampling-based GNNs in a single device environment. We propose an adaptive and
structure-aware important subgraph sampler to make adaptive GNNs obtain the benefit
from higher-order neighbor information. Evaluation on eight benchmark datasets demonstrates
the competitive performance of our approach to six SOTA approaches.
Second, we propose to study efficient distributed GNN processing over large graphs
while preserving the effectiveness. We identify one major archenemy of efficient distributed
GNN processing is the cross-device data communication by virtue of such data
movement in GNN aggregation. To avoid the communication, we propose to abstract decentralized
GNN processing as sequential matrix multiplication and uses historical embeddings
via cache. Theoretically, we show bounded approximation errors of embeddings
and gradients with convergence guarantee. Empirically, evaluation on five large-scale
benchmark datasets with prevalent GNN models via four different system setups, shows
its ability to generalize and superiority in efficiency while preserving effectiveness, compared
to five SOTA systems. The performance with little or no accuracy loss demonstrates
consistency with our theoretical findings.
Finally, we conclude this thesis with future directions and challenges.
Post a Comment