THESIS
2023
1 online resource (xi, 57 pages) : color illustrations
Abstract
Task-oriented dialogue systems (TODS) have existed for a few decades in the form
of virtual assistants such as Amazon Alexa, Apple Siri, Microsoft XiaoIce, and others.
Traditional approaches adopted neural networks for a modular approach, incorporating
various components that harness a wide range of Natural Language Processing (NLP)
capabilities: intent classification of the user’s query as Natural Language Understanding
(NLU), Dialogue State Tracking (DST) to handle a global semantic state of the dialogue,
and response generation as Natural Language Generation (NLG), while the more recent
approaches are steering towards end-to-end systems by only leveraging a single module
to do the complete TODS. As such, this task is notably difficult to tackle in a zero-shot
scenario, where so many...[
Read more ]
Task-oriented dialogue systems (TODS) have existed for a few decades in the form
of virtual assistants such as Amazon Alexa, Apple Siri, Microsoft XiaoIce, and others.
Traditional approaches adopted neural networks for a modular approach, incorporating
various components that harness a wide range of Natural Language Processing (NLP)
capabilities: intent classification of the user’s query as Natural Language Understanding
(NLU), Dialogue State Tracking (DST) to handle a global semantic state of the dialogue,
and response generation as Natural Language Generation (NLG), while the more recent
approaches are steering towards end-to-end systems by only leveraging a single module
to do the complete TODS. As such, this task is notably difficult to tackle in a zero-shot
scenario, where so many different capabilities and domain knowledge are required for
the system to accomplish its goal.
Recently, with the emergence of large language models, the field of NLP has witnessed
a strong paradigm shift in the way many existing tasks are tackled. The scalability of
transformer models along with the progress in computing hardware enabled language
models to scale up to billions and even hundreds of billions of parameters. These large language models undergo extensive pre-training and acquire an enormous amount of
general knowledge, to the point that many tasks which previously required task-specific
data can now be tackled with little to no additional data, such as question answering
or summarization. Along with that, researchers have very recently noticed the apparition
of emergent abilities in large language models exceeding a certain scale which was
not anticipated beforehand. Although the exploration of large language models is still in
its early stages, there is a need to investigate more complex and composite tasks, such
as task-oriented dialogue. Many people using large language models wrongly believes
that these models are all-powerful and can easily perform any given tasks, without being
aware of the hidden drawbacks behind these direct interactions such as hallucination and
unfaithful information in the context of task-oriented dialogue.
In this thesis, we investigate the potential of instruction-tuned large language models
(LLMs) to perform end-to-end task-oriented dialogue systems in a zero-shot scenario,
meaning without model parameter updates and with no additional task-specific and no
domain-specific data. We propose InstructTODS, the first framework to efficiently leverage
these models to perform end-to-end task-oriented dialogue. Through our investigation,
we show that InstructTODS manages to perform on par with the state-of-the-art
fine-tuned TODS baselines, all the while removing the resource and training requirements
as well as being adaptable across any domains and tasks.
Post a Comment