THESIS
2021
1 online resource (xv, 46 pages) : illustrations (some color)
Abstract
Image classification is a fundamental problem in computer vision. Although deep neural networks can surpass human vision in classifying images, most of these neural networks require expensive computational resources.
In this thesis, a neural network layer based on Spatial Transformer Network is proposed to improve the efficiency of neural networks. We call this structure “spatial transform bottleneck”, as an analogy to the broadly utilized bottleneck layer in ResNet that reduces the channel dimension, where the proposed structure applies spatial transformation to reduce the spatial dimension. The proposed neural network layer performs three operations on the input feature maps. First, a spatial transformer is introduced to resample input features into a lower
dimensional space. Then fea...[
Read more ]
Image classification is a fundamental problem in computer vision. Although deep neural networks can surpass human vision in classifying images, most of these neural networks require expensive computational resources.
In this thesis, a neural network layer based on Spatial Transformer Network is proposed to improve the efficiency of neural networks. We call this structure “spatial transform bottleneck”, as an analogy to the broadly utilized bottleneck layer in ResNet that reduces the channel dimension, where the proposed structure applies spatial transformation to reduce the spatial dimension. The proposed neural network layer performs three operations on the input feature maps. First, a spatial transformer is introduced to resample input features into a lower
dimensional space. Then features in the lower dimensional space are extracted by a sequence of computationally expensive operations. Finally, an inverse spatial transformer is utilized to
restore the result back to the original space. By performing the key operations in a lower dimensional space, the efficiency of the neural network is improved.
Furthermore, by utilizing spatial transformers with a light-weighted localization network, the neural network can either compute the whole feature maps with a lower resolution, or local features with a relatively high resolution. To minimize the overhead of the localization network, we propose to use Transformer to construct the localization network, instead of the widely
utilized Convolutional Neural Network.
To illustrate the effectiveness of the proposed spatial transform bottleneck, we apply it on ResNet for image classification. Depending on the configurations, ResNet utilizing spatial
transform bottleneck can reduce up to 42% operations with comparable performance as the baselines.
Post a Comment