Enhancing attentions in deep NLP models

HKUST Electronic Theses

Enhancing attentions in deep NLP models

by Lanqing Xue

THESIS 2021

Ph.D. Computer Science and Engineering

1 online resource (xvii, 113 pages) : illustrations (some color)

Abstract

Attention is an essential mechanism for creatures to perceive the world. Psychologists describe it as the allocation of limited cognitive processing resources. Neural attention is a technique motivated by cognitive attention. It selectively processes information from sources by computing input-dependent dynamic weights to boost the information from relevant portions. Neural attention was first proposed in deep learning in 2014 and underwent extensive developments in the decade. A milestone in the development is the proposal of Transformers in 2017. The transformer is the first deep architecture constructed solely on attention mechanisms without any recurrence and convolution. Attention as led to great success in many areas, such as natural language processing, computer vision, and social networks, and has become an essential component in neural networks.

While much research on attention focuses exploring its uses in ever-increasing applications, there is also significant interest in enhancing attention in current models. The motivations are to avoid networks being distracted by irrelevant information and to improve the network’s interpretability. Existing works fall into two categories. One enhances attention by purely improving attention mechanisms and the other enhances attention by exploring patterns in data. In this thesis, we focus on enhancing attention in deep natural language processing (NLP) models. The contributions of this thesis are as follows:

First, we propose Gated Attention Network (GA-Net), a novel sparse attention network, for sequence data. GA-Net combines the techniques of attention and dynamic network configuration. It dynamically selects a subset of elements to pay attention to and filters irrelevant elements. Besides, an efficient end-to-end learning method using Gumbel-softmax is designed to relax the binary gates and enable back-propagation to facilitate GA-Net training. GA-Net achieves better performance in text classification tasks compared with all baseline models with global or local attention and obtains better interpretability.

Second, we propose DeepRapper, a Transformer-based autoregressive language model which carefully models rhymes and rhythms for rap generation. To enhance attention, Deep-Rapper generates rap lyrics in the reverse order. In addition, it also utilizes rhyme representations and constraint. To our knowledge, DeepRapper is the first system to generate rap with both rhymes and rhythms. Both objective and subjective evaluations demonstrate that DeepRapper generates creative and high-quality raps with good rhymes and rhythms.

[ Hide abstract ]

View Copyrighted to the author. Reproduction is prohibited without the author’s prior written consent.

Details

Collection HKUST Electronic Theses Degree Ph.D. Department Computer Science and Engineering Supervisors Zhang, Nevin Lianwen Authors Xue, Lanqing Subjects Natural language processing (Computer science) Machine learning Language English Call number Thesis CSE 2021 Xue DOI 10.14711/thesis-991012980417203412

Full record

Enhancing attentions in deep NLP models

by Lanqing Xue

Post a Comment Cancel reply