THESIS
2021
1 online resource (x, 40 pages) : illustrations (some color)
Abstract
Convolutional neural networks (CNNs) have achieved near or exceeding human performance
in computer vision, yet their large computational and memory requirements made
them difficult to deploy in both large data centers and embedded systems. One main
factor for the computational and memory burden is the large number of floating point
(FP) operations. Fixed-point (FXP) quantization is promising towards reducing the resource
requirements of CNNs, yet low bitwidth implementations require fine-tuning to
recover accuracy. Another method to reduce the number of arithmetic operations is to apply
fast algorithms such as Winograd filtering algorithm. However, the numerical errors
of Winograd filtering algorithm result in CNN accuracy penalty when combined with low
bitwidth arithmetic.
In this thes...[
Read more ]
Convolutional neural networks (CNNs) have achieved near or exceeding human performance
in computer vision, yet their large computational and memory requirements made
them difficult to deploy in both large data centers and embedded systems. One main
factor for the computational and memory burden is the large number of floating point
(FP) operations. Fixed-point (FXP) quantization is promising towards reducing the resource
requirements of CNNs, yet low bitwidth implementations require fine-tuning to
recover accuracy. Another method to reduce the number of arithmetic operations is to apply
fast algorithms such as Winograd filtering algorithm. However, the numerical errors
of Winograd filtering algorithm result in CNN accuracy penalty when combined with low
bitwidth arithmetic.
In this thesis, we propose a CNN accelerator utilizing a novel block floating point
(BFP) scheme for reducing bitwidth down to 10-bit and supporting Winograd filtering
algorithm. First, we derive our block floating point processing element (PE) design from a
fused floating point dot-product unit. Our VLSI synthesis results show that our PE design
can reduce area and power by 27.11% and 44.86% respectively.
Second, we develop our novel block floating point scheme to combine quantization
with Winograd filtering algorithm. Our block floating point quantization enables integer
arithmetic for both Winograd encoding and channel accumulation within BFP blocks,
reducing the hardware cost for both operations.
Third, we implement our PE design and block floating point scheme on an end-to-end
CNN accelerator on FPGA. We evaluate three alternate BFP schemes and selects BFP10+F2
Winograd to balance between accuracy and throughput. We compare our design with a
baseline FP16 design and show that our BFP quantization reduces 50.1% LUTs, 48.3%
Register, 27.3% BRAM and 43.8% DSP as well as achieves 32.1% higher frequency. Finally,
we perform case studies with different CNNs and show that the accuracy drop is within
1% of the FP32 network.
Post a Comment