THESIS
2021
1 online resource (xvi, 107 pages) : illustrations (some color)
Abstract
This thesis investigates a fully data-driven, end-to-end dialogue learning framework for building
versatile conversational agents that are able to understand human emotion, converse with
empathy, access external knowledge, and assist humans in complex tasks.
Dialogue systems, a.k.a. conversational agents, are computer systems designed to interact with
humans in natural languages. Conventional dialogue systems are highly modularized, typically
consisting of multiple components for language understanding, dialogue management, and natural
language generation. Although these systems are known to be stable in well-designed
domains, they rely on extensive domain-specific rules, which limit their flexibility and scalability.
As an effective alternative to rule-based methods, data-driven end-to...[
Read more ]
This thesis investigates a fully data-driven, end-to-end dialogue learning framework for building
versatile conversational agents that are able to understand human emotion, converse with
empathy, access external knowledge, and assist humans in complex tasks.
Dialogue systems, a.k.a. conversational agents, are computer systems designed to interact with
humans in natural languages. Conventional dialogue systems are highly modularized, typically
consisting of multiple components for language understanding, dialogue management, and natural
language generation. Although these systems are known to be stable in well-designed
domains, they rely on extensive domain-specific rules, which limit their flexibility and scalability.
As an effective alternative to rule-based methods, data-driven end-to-end deep learning
approaches have emerged. Neural dialogue systems, typically based on sequence-to-sequence
architectures, can leverage a massive amount of conversational data to learn dialogue reasoning
and response generation strategies jointly with minimal predefined rules and achieve promising
performance.
Despite the rapid development of neural dialogue systems, various challenges remain. Firstly,
large-scale conversational models trained on online conversations (e.g., Reddit) usually lack
empathy and consistent characteristics. Secondly, the end-to-end architecture inherently makes
it difficult for the models to interact with external knowledge and to complete tasks. Last but
not least, most dialogue models are optimized to a single conversational skill such as empathy,
and ignore others (e.g., knowledge).
In this thesis, we tackle these challenges and introduce a versatile dialogue system based on generative language pre-training and parameter-efficient transfer learning methods. And we
show the effectiveness of our approaches in empathy modeling, knowledge acquisition, and
continuous dialogue skills integration.
To endow dialogue models with empathy, we propose a two-stage multi-task fine-tuning approach.
In the first stage, a pre-trained language model (e.g., GPT) is fine-tuned on a persona-aware
dialogue dataset with response modeling and contrastive learning objective. In the second
stage, the model is fine-tuned on empathetic conversations with a custom persona and an additional
emotion recognition objective. Based on this method, we develop a web demo, namely
the empathetic chatbot CAiRE, for interacting with real users and learning from user feedback.
To enable dialogue models to efficiently interface with external knowledge, we introduce a minimalist
task-oriented dialogue learning framework (MinTL). Our approach leverages the prior
knowledge of pre-trained language models and effective dialogue state tracking formulation for
jointly learning user-model and model-knowledge_base interactions while requiring minimal
human supervision. MinTL outperforms other baselines in both the full training and the
simulated low resource settings. In addition, we extend MinTL to more realistic task-oriented
dialogue scenarios and show the robustness of our approach.
Finally, we combine all the aforementioned dialogue skills into a unified versatile generative
dialogue model, named AdapterBot. AdapterBot uses a fixed language model backbone for response
generation and multiple lightweight residual adapters for modeling dialogue skills. Each
adapter can be trained independently, thus allowing a continual integration of skills without retraining
the entire model. We empirically show the competitive performance of AdapterBot on
a diverse set of dialogue tasks.
Post a Comment