THESIS
2019
xiii, 123 pages : illustrations ; 30 cm
Abstract
Deep neural network models, though very powerful and highly successful, are computationally expensive in terms of space and time. Recently, there have been a number of attempts on compressing the network. These attempts greatly reduce the network size, and allow the possibility of deploying deep models in resource-constrained environments.
In this thesis, we focus on two kinds of network compression methods: quantization and sparsification. We first propose to directly minimize the loss w.r.t. the quantized weights by using the proximal Newton algorithm. We provide a closed-form solution
for binarization, as well as an efficient approximate solution for ternarization and m-bit (where m 2) quantization. To speed up distributed training of weight-quantized networks, we then propose to...[
Read more ]
Deep neural network models, though very powerful and highly successful, are computationally expensive in terms of space and time. Recently, there have been a number of attempts on compressing the network. These attempts greatly reduce the network size, and allow the possibility of deploying deep models in resource-constrained environments.
In this thesis, we focus on two kinds of network compression methods: quantization and sparsification. We first propose to directly minimize the loss w.r.t. the quantized weights by using the proximal Newton algorithm. We provide a closed-form solution
for binarization, as well as an efficient approximate solution for ternarization and m-bit (where m > 2) quantization. To speed up distributed training of weight-quantized networks, we then propose to use gradient quantization to reduce the communication cost,
and theoretically study how the combination of weight and gradient quantization affects convergence. In addition, since previous quantization methods usually have inferior performance on LSTMs, we study why training quantized LSTMs is difficult, and show that
popular normalization schemes can help stabilize the training of quantized LSTMs.
While weight quantization reduces redundancy in weight representation, network sparsification can reduce redundancy in the number of weights. To achieve a higher compression rate, we extend the previous quantization-only formulation to a more general
network compression framework, which allows simultaneous quantization and sparsification.
Finally, we find that sparse deep neural networks obtained by pruning resemble biological networks in many ways. Inspired by the power law distributions of many biological networks, we show that these pruned deep networks also exhibit properties of the power law, and these properties can be used for faster learning and smaller networks in continual learning.
Post a Comment