THESIS
2021
1 online resource (xvii, 113 pages) : illustrations (some color)
Abstract
Attention is an essential mechanism for creatures to perceive the world. Psychologists describe
it as the allocation of limited cognitive processing resources. Neural attention is a
technique motivated by cognitive attention. It selectively processes information from sources
by computing input-dependent dynamic weights to boost the information from relevant portions.
Neural attention was first proposed in deep learning in 2014 and underwent extensive
developments in the decade. A milestone in the development is the proposal of Transformers
in 2017. The transformer is the first deep architecture constructed solely on attention mechanisms
without any recurrence and convolution. Attention as led to great success in many
areas, such as natural language processing, computer vision, and socia...[
Read more ]
Attention is an essential mechanism for creatures to perceive the world. Psychologists describe
it as the allocation of limited cognitive processing resources. Neural attention is a
technique motivated by cognitive attention. It selectively processes information from sources
by computing input-dependent dynamic weights to boost the information from relevant portions.
Neural attention was first proposed in deep learning in 2014 and underwent extensive
developments in the decade. A milestone in the development is the proposal of Transformers
in 2017. The transformer is the first deep architecture constructed solely on attention mechanisms
without any recurrence and convolution. Attention as led to great success in many
areas, such as natural language processing, computer vision, and social networks, and has
become an essential component in neural networks.
While much research on attention focuses exploring its uses in ever-increasing applications,
there is also significant interest in enhancing attention in current models. The motivations
are to avoid networks being distracted by irrelevant information and to improve the network’s
interpretability. Existing works fall into two categories. One enhances attention by purely
improving attention mechanisms and the other enhances attention by exploring patterns in
data. In this thesis, we focus on enhancing attention in deep natural language processing (NLP) models. The contributions of this thesis are as follows:
First, we propose Gated Attention Network (GA-Net), a novel sparse attention network,
for sequence data. GA-Net combines the techniques of attention and dynamic network configuration.
It dynamically selects a subset of elements to pay attention to and filters irrelevant
elements. Besides, an efficient end-to-end learning method using Gumbel-softmax is designed
to relax the binary gates and enable back-propagation to facilitate GA-Net training. GA-Net achieves better performance in text classification tasks compared with all baseline models
with global or local attention and obtains better interpretability.
Second, we propose DeepRapper, a Transformer-based autoregressive language model
which carefully models rhymes and rhythms for rap generation. To enhance attention, Deep-Rapper generates rap lyrics in the reverse order. In addition, it also utilizes rhyme representations
and constraint. To our knowledge, DeepRapper is the first system to generate rap
with both rhymes and rhythms. Both objective and subjective evaluations demonstrate that
DeepRapper generates creative and high-quality raps with good rhymes and rhythms.
Post a Comment