THESIS
2017
ix, 41 pages : illustrations ; 30 cm
Abstract
Word embeddings have attracted much attention recently given their simplicity
of word representation and generalization ability for a lot of downstream tasks.
Different from alphabetic writing systems such as English, Chinese characters
are often composed of subcharacter components which are also semantically informative.
In this thesis, we propose an approach to jointly embed Chinese words as well
as their characters and fine-grained subcharacter components. We use three likelihoods
to evaluate whether the context words, characters, and components can
predict the current target word, and collected 13,253 subcharacter components
to demonstrate the existing approaches of decomposing Chinese characters are
not enough. Evaluation on intrinsic word similarity and word analogy tasks...[
Read more ]
Word embeddings have attracted much attention recently given their simplicity
of word representation and generalization ability for a lot of downstream tasks.
Different from alphabetic writing systems such as English, Chinese characters
are often composed of subcharacter components which are also semantically informative.
In this thesis, we propose an approach to jointly embed Chinese words as well
as their characters and fine-grained subcharacter components. We use three likelihoods
to evaluate whether the context words, characters, and components can
predict the current target word, and collected 13,253 subcharacter components
to demonstrate the existing approaches of decomposing Chinese characters are
not enough. Evaluation on intrinsic word similarity and word analogy tasks as
well as extrinsic downstream classification tasks demonstrates the superior performance
of our model.
Post a Comment