THESIS
2020
viii, 35 pages : illustrations ; 30 cm
Abstract
Winograd minimum filtering algorithm has been used to accelerate the compute-bound
Convolutional Neural Networks (CNN) on FPGA in recent years. However, most of the
Winograd accelerators are only designed to process convolution with limited filter size such as
3 × 3 efficiently. Large filters in the convolution are normally decomposed into small 3 × 3 tiles
using a low-efficient algorithm called overlap-and-add (OLA) Winograd, which prevents the
applications relying heavily on large filters such as image super resolution or neural network
search (NAS) from further speed-up. This work solves this problem by proposing a novel
decomposing algorithm named nested Winograd to replace OLA-Winograd. We also show that
the proposed algorithm can be easily integrated into current 3 × 3 Win...[
Read more ]
Winograd minimum filtering algorithm has been used to accelerate the compute-bound
Convolutional Neural Networks (CNN) on FPGA in recent years. However, most of the
Winograd accelerators are only designed to process convolution with limited filter size such as
3 × 3 efficiently. Large filters in the convolution are normally decomposed into small 3 × 3 tiles
using a low-efficient algorithm called overlap-and-add (OLA) Winograd, which prevents the
applications relying heavily on large filters such as image super resolution or neural network
search (NAS) from further speed-up. This work solves this problem by proposing a novel
decomposing algorithm named nested Winograd to replace OLA-Winograd. We also show that
the proposed algorithm can be easily integrated into current 3 × 3 Winograd accelerators with
slight architecture changes by proposing a reconfigurable Winograd accelerator with runtime to
process convolution with arbitrary filter size and stride. An FPGA implementation of nested
Winograd with Winograd kernel ?(3,3) shows a 1.42-, 1.44- and 3.28-times throughput
improvement over the FPGA implementation with OLA-Winograd, in processing 5 × 5, 7 × 7
and 9 × 9 convolution layers from different CNN benchmarks.
Post a Comment