Optimization frameworks for compact DNN acceleration based on collaborative modeling and history data

HKUST Electronic Theses

Optimization frameworks for compact DNN acceleration based on collaborative modeling and history data

by Jiandong Mu

THESIS 2021

Ph.D. Electronic and Computer Engineering

1 online resource (xii, 95 pages) : illustrations (chiefly color)

Abstract

As Deep Neural Networks (DNNs) become increasingly popular, there is a growing trend to accelerate the DNN applications on hardware platforms like GPUs, FPGAs, etc., to gain higher performance and efficiency. However, it is time-consuming to design the hardware efficient architectures and tune the performance due to the strong background requirements in hardware details, large design space, and the expensive cost to evaluate each design point. In this work, we propose comprehensive frameworks for DNN design optimization on FPGA and GPU, respectively. For FPGA, a novel data structure, LoopTree, is proposed to provide a high-level abstraction of the OpenCL-based DNN design. A coarse-grained model and a fine-grained model are developed to predict the performance of the LoopTree so that the design space of the LoopTree can be explored efficiently. An average estimation error of 8.87% and 4.8% has been observed for our coarse-grained model and fine-grained model, respectively, which are much smaller than the widely used operation statistics based estimation. For GPU counterparts, we automatically generate the design according to the templates and the corresponding parameters combinations. A novel transfer learning and Guided Genetic Algorithm (GGA) based framework are proposed to speed up the hardware tuning process. Our experiments show that we can achieve superior performance than the state-of-the-art work, such as auto-tuning framework TVM and the handcraft optimized library cuDNN, while reducing the searching time by 8.96x and 4.58x compared with the XGBoost tuner and GA tuner in TVM.

We also dive into auto-pruning to achieve more compact neural networks to ease the hardware constraints. We observe that the widely used reinforcement learning (RL) algorithm has become the timing performance bottleneck of the auto-pruning process. Therefore, we propose a framework which can significantly accelerate the RL algorithm by taking advantage of the pruning history in other pruning scenarios. The experiments have shown that our framework can accelerate the auto-pruning process by 1.5 ~ 2.5x for ResNet-20, and 1.81 ~ 2.375x for other neural networks like ResNet-56, ResNet-18, and MobileNet v1.

[ Hide abstract ]

View Copyrighted to the author. Reproduction is prohibited without the author’s prior written consent.

Details

Collection HKUST Electronic Theses Degree Ph.D. Department Electronic and Computer Engineering Supervisors Zhang, Wei Authors Mu, Jiandong Subjects Neural networks (Computer science) Design and construction Field programmable gate arrays Deep learning (Machine learning) Language English Call number Thesis ECE 2021 Mu DOI 10.14711/thesis-991013039828703412

Full record

Optimization frameworks for compact DNN acceleration based on collaborative modeling and history data

by Jiandong Mu

Post a Comment Cancel reply