THESIS
2019
xiii, 85 pages : illustrations ; 30 cm
Abstract
Dialogue systems are designed to communicate with a human via natural language and help
people in many aspects. Task-oriented dialogue systems, in particular, aim to accomplish users
goal (e.g., restaurant reservation or ticket booking) in minimal conversational turns. The earliest
systems were designed with a large amount of hand-crafted rules and templates by experts,
which were costly and limited. Therefore, data-driven statistical dialogue systems, including
the powerful neural-based systems, have considerable attention over the last few decades to
reduce the cost and provide robustness.
One of the main challenges in building neural task-oriented dialogue systems is to model
long dialogue context and external knowledge information. Some neural dialogue systems are
modulariz...[
Read more ]
Dialogue systems are designed to communicate with a human via natural language and help
people in many aspects. Task-oriented dialogue systems, in particular, aim to accomplish users
goal (e.g., restaurant reservation or ticket booking) in minimal conversational turns. The earliest
systems were designed with a large amount of hand-crafted rules and templates by experts,
which were costly and limited. Therefore, data-driven statistical dialogue systems, including
the powerful neural-based systems, have considerable attention over the last few decades to
reduce the cost and provide robustness.
One of the main challenges in building neural task-oriented dialogue systems is to model
long dialogue context and external knowledge information. Some neural dialogue systems are
modularized. Although they are known to be stable and easy to interpret, they usually require
expensive human labels for each component and have unwanted module dependencies. On
the other hand, end-to-end approaches learn the hidden dialogue representation automatically
and directly retrieve/generate system responses. They require much less human involvement,
especially in the dataset construction. However, most of the existing models suffer from incorporating
too much information into end-to-end learning frameworks.
In this thesis, we focus on learning task-oriented dialogue systems with deep learning models,
which is an important research direction in natural language processing. We leverage the
neural copy mechanism and memory-augmented neural networks to address the existing challenge
of modeling and optimizing information in conversation. We show the effectiveness of our strategy by achieving state-of-the-art performance in multi-domain dialogue state tracking,
retrieval-based dialogue systems, and generation-based dialogue systems.
We first improve the performance of a dialogue state tracking module, which is the core
module in modularized dialogue systems. Unlike most of the existing dialogue state trackers,
which are over-dependent on domain ontology and lacking knowledge sharing across domains,
our proposed model, the transferable dialogue state generator (TRADE), leverages its copy
mechanism to get rid of ontology, share knowledge between domains, and memorize the long
dialogue context. We also evaluate our system on a more advanced setting, unseen domain
dialogue state tracking. We empirically show that TRADE enables zero-shot dialogue state
tracking and can adapt to new few-shot domains without forgetting the previous domains.
Second, we utilize two memory-augmented neural networks, the recurrent entity network
and dynamic query memory network, to improve end-to-end retrieval-based dialogue learning.
They are able to capture dialogue sequential dependencies and memorize long-term information.
We also propose a recorded delexicalization copy strategy to simplify the problem by
replacing real entity values with ordered entity types. Our models are shown to surpass other
retrieval baselines, especially when the conversation has a large number of turns.
Lastly, we tackle end-to-end generation-based dialogue learning with two successive proposed
models, the memory-to-sequence model (Mem2Seq) and global-to-local memory pointer
network (GLMP). Mem2Seq is the first model to combine multi-hop memory attention with the
idea of the copy mechanism, which allows an agent to effectively incorporate knowledge base
information into a generated response. It can be trained faster and outperforms other baselines
in three different task-oriented dialogue datasets, including human-human dialogues. Moreover,
GLMP is an extension of Mem2Seq, which further introduces the concept of response
sketching and double pointers copying. We empirically show that GLMP surpasses Mem2Seq
in terms of both automatic evaluation and human evaluation, and achieves the state-of-the-art
performance.
Post a Comment