THESIS
2008
x, 75 leaves : ill. ; 30 cm
Abstract
We present our studies on the task of automatically learning bilingual semantic frames from a Chinese and English parallel corpus in this thesis. Bilingual semantic frames, the mappings of core semantic arguments (roles) for a predicate pair in a bi-sentence, have the potential to improve the translation quality of the Statistical Machine Translation (SMT) system....[
Read more ]
We present our studies on the task of automatically learning bilingual semantic frames from a Chinese and English parallel corpus in this thesis. Bilingual semantic frames, the mappings of core semantic arguments (roles) for a predicate pair in a bi-sentence, have the potential to improve the translation quality of the Statistical Machine Translation (SMT) system.
As a prerequisite, we first report our research on the subtask of Chinese Semantic Role Labeling (SRL). We present our implementation of two new state-of-the-art Chinese shallow semantic parsers, based on the Support Vector Machine (SVM) and the Maximum Entropy classification techniques. We also present a full-scale feature comparison and classifier performance comparison, and propose some new important features in this subtask.
We also propose to learn bilingual semantic frames from a parallel corpus of translated sentence pairs. We first present our observation on a reference set that is manually extracted from the parallel corpus. We find that a considerable 15.73% of semantic argument mappings are not direct mappings but mismatches, which means the core semantic argument i in Chinese is not aligned to i in English.
We then present a conventional model SYN_ALIGN that acquires bilingual semantic frames from the results of semantic role projection based on syntactic constituent alignment. The evaluation result shows that, unfortunately, SYN_ALIGN only achieves a very modest performance (44.80% F-measure) due to its brittle assumption that all semantic arguments in one language can directly map to their syntactic counterparts in the other language. Therefore, we propose our novel model ARG_ALIGN to learn bilingual semantic frames using phrasal similarity measure of semantic roles that are automatically produced by two monolingual semantic parsers. As a result, ARG_ALIGN surpasses SYN_ALIGN by about 25 points in F-measure and has an 86% F-measure upper bound.
Our experimental results suggest that, for integrating bilingual semantic frames into an SMT system, ARG_ALIGN is a much better solution to acquire such frames.
Post a Comment