THESIS
2012
xiii, 69 p. : ill. ; 30 cm
Abstract
We show for the first time an end-to-end speech transcription and translation system with cross-lingual language modeling based on weighted finite-state transducers (WFSTs). The system can perform the decoding of the speech of a resource-poor source language to its transcription as well as into a resource-rich target language. The proposed cross-lingual language modeling approach uses phrase-level translation that includes phrase-level transduction and syntactic reordering. The phrase-level transduction is capable of performing n to m cross-lingual transduction instead of word-level transduction only allowing n to n transduction. The syntactic reordering serves to model the syntactic discrepancies between the resource-poor and resource-rich languages....[
Read more ]
We show for the first time an end-to-end speech transcription and translation system with cross-lingual language modeling based on weighted finite-state transducers (WFSTs). The system can perform the decoding of the speech of a resource-poor source language to its transcription as well as into a resource-rich target language. The proposed cross-lingual language modeling approach uses phrase-level translation that includes phrase-level transduction and syntactic reordering. The phrase-level transduction is capable of performing n to m cross-lingual transduction instead of word-level transduction only allowing n to n transduction. The syntactic reordering serves to model the syntactic discrepancies between the resource-poor and resource-rich languages.
As such, we can leverage the statistics from a resource-rich language to improve the language model of a resource-poor language in a truly cross-lingual language model. This cross-lingual language model can simultaneously improve the speech recognition performance of the resource-poor language and provide a translation of the resource-poor language to the resource-rich language. In this thesis, we focus on the recognition and translation of a non-standard Chinese language, Cantonese, which does not have a written form, to standard Chinese.
The cross-lingual language model is trained from a large amount of resource-rich language (e.g. Mandarin) data and a small amount of resource-poor language (e.g. Cantonese) data, as well as some parallel data of the resource-poor and resource-rich languages. Evaluations on Cantonese speech recognition and Cantonese to standard Mandarin Chinese translation tasks show that our proposed cross-lingual language modeling improves the recognition and translation performance significantly, up to 12.5% relative word error rate (WER) reduction over the baseline language model interpolation, 6.6% relative WER reduction and 18.5% relative bilingual evaluation understudy (BLEU) score improvement, compared to the best word-level transduction approach. This model can be further generalized to speech translation of any source and target language pairs via the transcription and translation framework.
Post a Comment