THESIS
2021
1 online resource (xv, 105 pages) : illustrations (some color)
Abstract
Multilingualism is the ability of a speaker to communicate natively in more than one language.
In multilingual communities, switching languages within a conversation, called code-switching,
commonly occurs, and this creates a demand for multilingual dialogue and speech recognition
systems to cater to this need. However, understanding code-switching utterances is a very
challenging task for these systems because the model has to adapt to code-switching styles.
Deep learning approaches have enabled natural language systems to achieve significant improvement
towards human-level performance on languages with huge amounts of training data
in recent years. However, they are unable to support numerous low-resource languages, mainly
mixed languages. Also, code-switching, despite being a frequen...[
Read more ]
Multilingualism is the ability of a speaker to communicate natively in more than one language.
In multilingual communities, switching languages within a conversation, called code-switching,
commonly occurs, and this creates a demand for multilingual dialogue and speech recognition
systems to cater to this need. However, understanding code-switching utterances is a very
challenging task for these systems because the model has to adapt to code-switching styles.
Deep learning approaches have enabled natural language systems to achieve significant improvement
towards human-level performance on languages with huge amounts of training data
in recent years. However, they are unable to support numerous low-resource languages, mainly
mixed languages. Also, code-switching, despite being a frequent phenomenon, is a characteristic
only of spoken language and thus lacks transcriptions required for training deep learning
models. On the other hand, conventional approaches to solving the low-resource issue in code-switching
are focused on applying linguistic theories to the statistical model. The constraints
defined in these theories are useful. Still, they cannot be postulated as a universal rule for
all code-switching scenarios, especially for languages that are syntactically divergent, such as
English and Mandarin.
In this thesis, we address the aforementioned issues by proposing language-agnostic multi-task
training methods. First, we introduce a meta-learning-based approach, meta-transfer learning,
in which information is judiciously extracted from high-resource monolingual speech data
to the code-switching domain. The meta-transfer learning quickly adapts the model to the code-switching
task from a number of monolingual tasks by learning to learn in a multi-task learning
fashion. Second, we propose a novel multilingual meta-embeddings approach to effectively represent
code-switching data by acquiring useful knowledge learned in other languages, learning
the commonalities of closely related languages and leveraging lexical composition. The method
is far more efficient compared to contextualized pre-trained multilingual models. Third, we introduce
multi-task learning to integrate syntactic information as a transfer learning strategy to a
language model and learn where to code-switch.
To further alleviate the issue of data scarcity and limitations of linguistic theory, we propose
a data augmentation method using Pointer-Gen, a neural network using a copy mechanism to
teach the model the code-switch points from monolingual parallel sentences, and we use the
augmented data for multilingual transfer learning. We disentangle the need for linguistic theory,
and the model captures code-switching points by attending to input words and aligning the
parallel words, without requiring any word alignments or constituency parsers. More importantly,
the model can be effectively used for languages that are syntactically different, such as
English and Mandarin, and it outperforms the linguistic theory-based models.
In essence, we effectively tackle the data scarcity issue by introducing multilingual transfer
learning methods to transfer knowledge from high-resource languages to the code-switching
domain, and we compare their effectiveness with the conventional methods using linguistic
theories.
Post a Comment