Neural architecture design : search methods and domain knowledge

HKUST Electronic Theses

Neural architecture design : search methods and domain knowledge

by Han Shi

THESIS 2022

Ph.D. Computer Science and Engineering

1 online resource (xiv, 133 pages) : illustrations (some color)

Abstract

Deep Learning has emerged as a milestone in the machine learning community due to its remarkable ability in a variety of tasks, such as computer vision and natural language processing. It has been demonstrated that the architecture of a neural network influences the performance significantly and thus it’s important to determine the neural architecture structure. Typically, the methods for neural architecture design can be classified into two categories. One category is designing neural architecture by search methods, which aims to achieve potential neural architectures automatically. For example, NASNet architecture is found in a defined search space using a reinforcement learning algorithm. Another category is designing neural architecture manually based on domain knowledge. Most practical architectures like ResNet and Transformer are proposed based on prior knowledge. In this thesis, we provide a comprehensive discussion on neural architecture design from the above two perspectives.

Firstly, we introduce a neural architecture search algorithm using Bayesian optimization, named BONAS. In the search phase, GCN embedding extractor and Bayesian sigmoid regressor constitute the surrogate model for Bayesian optimization and candidate architectures in the search space are selected based on the acquisition function. In the query phase, we merge them as a super network and evaluate each architecture by weight sharing mechanism. The proposed BONAS can obtain significant architecture with exploitation and exploration balance.

Secondly, we focus on the self-attention module in the famous Transformer and propose a differentiable architecture search method to find important attention patterns. Different from prior works, we find that diagonal elements in the attention map can be dropped without harming the performance. To understand this observation, we provide theoretical proof from the perspective of universal approximation. Furthermore, we achieve a series of attention masks for efficient architecture design based on our proposed search method.

Thirdly, we attempt to understand the feed-forward module in Transformer from a unified framework. Specifically, we introduce the concept of memory tokens and build the relationship between feed-forward and self-attention. Moreover, we propose a novel architecture named uni-attention, which contains all four types of attention connections in our framework. Uni-attention achieves better performance compared with previous baselines given the same number of memory tokens.

Finally, we investigate the over-smoothing phenomenon in the whole Transformer architecture. We provide a theoretical analysis by building the relationship between self-attention and the graph field. Specifically, we find that layer normalization plays an important role in the over-smoothing problem, and verify this empirically. To alleviate this issue, we propose hierarchical fusion architectures such that the output can be more diverse.

[ Hide abstract ]

View Copyrighted to the author. Reproduction is prohibited without the author’s prior written consent.

Details

Collection HKUST Electronic Theses Degree Ph.D. Department Computer Science and Engineering Supervisors Kwok, James Authors Shi, Han Subjects Neural networks (Computer science) Design Machine learning Data processing Language English Call number Thesis CSE 2022 Shi DOI 10.14711/thesis-991013114555203412

Full record

Neural architecture design : search methods and domain knowledge

by Han Shi

Post a Comment Cancel reply