THESIS
2022
1 online resource (xvi, 126 pages) : illustrations (chiefly color)
Abstract
Conversational models aim to generate readable textual responses to input queries.
In recent years, conversational models are typically built using deep neural networks,
which are called deep conversational models (DCMs). DCMs can be used for task-oriented
dialogues or chit-chat conversations. In this thesis, we investigate ways to enhance DCMs
for chit-chat conversations via input augmentation and data source expansion.
DCM maps an input query to some output responses. It has been observed in previous
work that the output is often uninformative and lacks diversity due to two reasons: (1)
The input does not contain sufficient information to determine an appropriate output,
and (2) the DCM is trained via likelihood maximization and hence captures only the most
salient input-output relati...[
Read more ]
Conversational models aim to generate readable textual responses to input queries.
In recent years, conversational models are typically built using deep neural networks,
which are called deep conversational models (DCMs). DCMs can be used for task-oriented
dialogues or chit-chat conversations. In this thesis, we investigate ways to enhance DCMs
for chit-chat conversations via input augmentation and data source expansion.
DCM maps an input query to some output responses. It has been observed in previous
work that the output is often uninformative and lacks diversity due to two reasons: (1)
The input does not contain sufficient information to determine an appropriate output,
and (2) the DCM is trained via likelihood maximization and hence captures only the most
salient input-output relationships. One common method to address the first issue is to
include background documents as additional inputs to the DCM, and one popular way
to deal with the second issue is to include retrieved responses to similar input queries
as additional inputs to the DCM. In this thesis, we advance the state-of-the-art in both
of those two lines of work. For the first line of work, we propose an output-anticipated
memory module to enable the DCM to better attend to the relevant information in the
background documents. For the second line of work, we develop a memory module to
extract relationships between clusters of similar inputs and clusters of outputs (which are more robust than relationships between individual inputs and outputs), and use the
relationships to improve the performance of the DCM.
Nowadays, DCMs are often trained in huge corpora. However, there are still scenarios
with low resources. One example is an online chatbot that needs to quickly adapt to
a new user after a few rounds of conversations. Our third contribution in this thesis is
a meta-learning-based method to help with the adaptation by utilizing data from the
user's friends, who are expected to have similar interests and expectations. There are also
applications, such as conversations on airline booking, where there are limited public data
and abundant private data containing sensitive information. Our fourth contribution to
this thesis is a teacher-student framework to train DCMs on both private and public data
while ensuring the privacy of sensitive information in the private data.
Post a Comment