A cost-effective CNN accelerator design on FPGA

HKUST Electronic Theses

A cost-effective CNN accelerator design on FPGA

by Fong chi Fung Brian

THESIS 2019

M.Phil. Electronic and Computer Engineering

xi, 55 pages : illustrations (some color) ; 30 cm

Abstract

Despite convolutional neural networks (CNNs) being applied to a vast range of applications, deploying CNNs on a portable system is challenging due to the enormous data volume, intensive computation, and frequent memory access.

For data volume and frequent memory access, many approaches have been proposed to reduce the CNN complexity, such as pruning and quantization. However, existing designs adopt channel dimension tiling, which requires a regular channel number. After pruning, the channel number may become highly irregular, which will incur heavy zero padding. As for quantization, simple aggressive bit reduction usually results in a large accuracy drop. To address these challenges, row-based tiling in the kernel dimension is adapted to different kernel shapes and significantly reduces the zero padding. Moreover, a configurable processing units (PUs) design is developed to dynamically group or split to enable efficient resource sharing. As for quantization, the recently proposed incremental network quantization (INQ) algorithm is considered, which uses low bit weights with a power of 2 format. We further propose an approximate-shifter-based processing element (PE) design as the building block of the PUs to facilitate the convolution computation. To evaluate, the RTL-based INQ quantized AlexNet is realized on a standalone FPGA. Compared with the state-of-art designs, our accelerator achieves 1.87x higher performance, which demonstrates the efficiency of the proposed design methods.

Apart from reducing the data volume, reducing the CNN intensive computation is also critical for acceleration. Convolution under the spectral-domain has been proposed in order to simplify the compute-intensive convolution layers. However, utilizing the spectral-domain introduces domain incompatibility to other layers. To address these challenges, a spectral-domain approximate activation is proposed with the recently proposed spectral-domain pooling to solve the domain incompatibilities. Lastly, compared with latest spectral-domain activation algorithms, our proposed activation algorithm is evaluated under the Tensorflow software with the dataset CIFAR-10 and achieves an ~3% accuracy improvement.

[ Hide abstract ]

View Copyrighted to the author. Reproduction is prohibited without the author’s prior written consent.

Details

Collection HKUST Electronic Theses Degree M.Phil. Department Electronic and Computer Engineering Authors Fong, Brian Chi Fung Subjects Neural networks (Computer science) Field programmable gate arrays Machine learning Language English Call number Thesis ECED 2019 Fong DOI 10.14711/thesis-991012761368803412

Full record

A cost-effective CNN accelerator design on FPGA

by Fong chi Fung Brian

Post a Comment Cancel reply