THESIS
2021
1 online resource (xii, 131 pages) : illustrations (some color)
Abstract
Deep Convolutional Neural Networks (CNNs) have achieved substantial advances in a wide range of vision tasks. However, the superior performance of CNNs usually requires powerful hardware with abundant computation and memory resources. Considering there are growing demands to run vision tasks on mobile devices, the limited storage and computing power of the mobile devices confines the high-performance model from being widely deployed. To mitigate this gap, we are motivated to compress the neural networks to facilitate their deployment on mobile devices.
In general, there are four major approaches for neural network compression. The first is designing a more compact neural network, manually or through neural architecture search (NAS). The second is quantizing the weights and activations...[
Read more ]
Deep Convolutional Neural Networks (CNNs) have achieved substantial advances in a wide range of vision tasks. However, the superior performance of CNNs usually requires powerful hardware with abundant computation and memory resources. Considering there are growing demands to run vision tasks on mobile devices, the limited storage and computing power of the mobile devices confines the high-performance model from being widely deployed. To mitigate this gap, we are motivated to compress the neural networks to facilitate their deployment on mobile devices.
In general, there are four major approaches for neural network compression. The first is designing a more compact neural network, manually or through neural architecture search (NAS). The second is quantizing the weights and activations in the neural network. The third is pruning the redundant channels, and the fourth is distilling the knowledge in an over-parameterized teacher network to a compact student network.
In this dissertation, we focus on improving the algorithm of a specific aforementioned method or a combination of these methods. We first proposed Bi-Real Net for enhancing the accuracy of the binary neural network, i.e. an extreme version of the quantization network. Based on this advanced binarization algorithm, we apply knowledge distillation to further enhance accuracy. Moreover, we dive into the optimization strategies for binarized networks. Then we study using neural architecture search to find the best architecture that is suitable for binarization. And we extend this neural architecture search algorithm to search for a good pruning scheme for network channel pruning. Lastly, we study the dependence of neural architecture search algorithm on the data.
Post a Comment