This thesis investigates the controllability of deep learning-based, end-to-end, generative dialogue
systems in both task-oriented and chit-chat scenarios. In particular, we study the different
aspects of controlling generative dialogue systems, including controlling styles and topics and
continuously adding and combining dialogue skills.
In the three decades since the first dialogue system was commercialized, the basic architecture
of such systems has remained substantially unchanged, consisting of four pipelined basic
components, namely, natural language understanding (NLU), dialogue state tracking (DST), a
dialogue manager (DM) and natural language generation (NLG). The dialogue manager, which
is the critical component of the modularized system, controls the response content and styl...[
Read more ]
This thesis investigates the controllability of deep learning-based, end-to-end, generative dialogue
systems in both task-oriented and chit-chat scenarios. In particular, we study the different
aspects of controlling generative dialogue systems, including controlling styles and topics and
continuously adding and combining dialogue skills.
In the three decades since the first dialogue system was commercialized, the basic architecture
of such systems has remained substantially unchanged, consisting of four pipelined basic
components, namely, natural language understanding (NLU), dialogue state tracking (DST), a
dialogue manager (DM) and natural language generation (NLG). The dialogue manager, which
is the critical component of the modularized system, controls the response content and style.
This module is usually programmed by rules and is designed to be highly controllable and
easily extendable.
With the emergence of powerful "deep learning" architectures, end-to-end generative dialogue
systems have been proposed to optimize overall system performance and simplify training.
However, these systems cannot be easily controlled and extended as the modularized dialogue
manager can. This is because a single neural system is used, which is usually a large pre-trained
language model (e.g., GPT-2), and thus it is hard to surgically change desirable attributes (e.g.,
style, topics, etc.). More importantly, uncontrollable dialogue systems can generate offensive
and even toxic responses.
Therefore, in this thesis, we study controllable methods for end-to-end generative dialogue
systems in task-oriented and chit-chat scenarios. Throughout the chapters, we describe 1) how to control the style and topics of chit-chat models, 2) how to continuously control and extend
task-oriented dialogue systems, and 3) how to compose and control multi-skill dialogue models.
To elaborate, we firstly propose a residual adapter model to control style and topics in conversational
models such as DialoGPT, Meena, and Blender-Bot. Our proposed model adds less than
1.5% task-specific parameters per style/topic, making it deployable for online systems. We run
a comprehensive automatic and human evaluation to show controllability in the response generation
in terms of style and topics, without losing fluency without requiring dialogue-specific
datasets.
Secondly, we propose a highly controllable architectural method based on residual adapters
for continuous update of task-oriented dialogue systems with new features based on the user’s
needs, e.g., adding new slots and intents or even completely new domains. Moreover, we analyze
the trade-off between performance, number-of-parameters, and episodic memory sizes in
other methods (regularization, rehearsal, architectural).
Finally, we propose a novel theoretical framework to control the end-to-end dialogue model
with multiple composable and control skills. We empirically show the effectiveness of using
specialized parameters in combined chit-chat and task-oriented datasets.
Post a Comment