Learning dynamic and static sparse structures for deep neural networks

HKUST Electronic Theses

Learning dynamic and static sparse structures for deep neural networks

by Zhourong Chen

THESIS 2019

Ph.D. Computer Science and Engineering

xiii, 93 pages : illustrations ; 30 cm

Abstract

In the past decade, deep neural networks (DNNs) have produced superior results in a wide range of machine learning applications. However, the structures of these networks are usually dense and handcrafted by human experts. Learning sparse structures from data for DNNs still remains a challenging problem. In this thesis, we investigate learning two types of sparse structures for DNNs. The first are dynamic sparse structures which are conditioned on each individual input sample, while the second are static sparse structures which are learned from data and fixed the same for different input samples. Learning these sparse structures is expected to help ease overfitting, reduce time and space complexity, and improve the interpretability of deep models.

For learning dynamic sparse structures, the most essential problem is how to dynamically configure the network structure for each individual input sample on the fly. We propose a new framework called GaterNet for this problem in convolutional neural networks (CNNs). It is the first framework in the literature for learning dynamic sparse structures for CNNs. GaterNet utilizes a dedicated sub-network to generate binary gates from input and prunes filters in a CNN for the specific input based on the gate values. It results in a dynamic CNN which essentially processes different samples with different sparse structures. Our experiments show that, with the help of this dynamic pruning, the generalization performance of the CNN can be significantly improved.

For learning static sparse structures, we propose two methods called Tree Skeleton Expansion (TSE) and Tree Receptive Field Growing (TRFG) respectively for standard feedforward neural networks (FNNs). Although many previous methods have been proposed for CNNs, little has been done for FNNs in the literature. There are usually applications where CNNs are not applicable and FNNs are the only choice among neural networks. In TSE, we assume that the data is generated from a multi-layer probabilistic graphical model (PGM). We construct a tree-structured PGM to model the data, use its structure as a skeleton, and expand the connections in the skeleton to form a deep sparse structure for FNNs. TSE is fast and the resulting sparse models can achieve comparable performance with much fewer parameters compared with dense FNNs. In TRFG, we are inspired by convolutional layers where each unit is connected to a group of strongly-correlated units in a spacial local region. As there are no such explicit spatial structures in general data, we propose to build a tree-structured PGM over the input units such that the strongly-correlated units are close to each other in the tree. Then we construct the next layer by introducing a unit for each local region in the PGM. The process can be repeated on each layer and lead to a deep sparse FNN. Experiments show that, TRFG can efficiently capture the salient correlations at different layers and learn sparse models which have better performance and interpretability than dense FNNs.

[ Hide abstract ]

View Copyrighted to the author. Reproduction is prohibited without the author’s prior written consent.

Details

Collection HKUST Electronic Theses Degree Ph.D. Department Computer Science and Engineering Supervisors Zhang, Nevin L. Authors Chen, Zhourong Subjects Neural networks (Computer science) Machine learning Mathematical models Language English Call number Thesis CSED 2019 ChenZ DOI 10.14711/thesis-991012757468503412

Full record

Learning dynamic and static sparse structures for deep neural networks

by Zhourong Chen

Post a Comment Cancel reply