THESIS
2002
v, 77 leaves : ill. ; 30 cm
Abstract
In this thesis, we try to solve the problem of word sense disambiguation (WSD) in natural language processing by Sense Pruning using a knowledge-based approach. Traditional WSD methods provide only one meaning for each word in a passage. However, we believe that textual information alone may not be sufficient to determine the exact meaning of each word which has to be resolved when higher-level knowledge becomes available. Thus, we propose that the objective of WSD is to reduce the number of plausible meanings of a word as much as possible through "Sense Pruning". After Sense Pruning, we will associate a word with a list of plausible meanings. We would like to keep the truly correct sense of each word on its own meaning list and yet keep the number of possible meanings of a whole senten...[
Read more ]
In this thesis, we try to solve the problem of word sense disambiguation (WSD) in natural language processing by Sense Pruning using a knowledge-based approach. Traditional WSD methods provide only one meaning for each word in a passage. However, we believe that textual information alone may not be sufficient to determine the exact meaning of each word which has to be resolved when higher-level knowledge becomes available. Thus, we propose that the objective of WSD is to reduce the number of plausible meanings of a word as much as possible through "Sense Pruning". After Sense Pruning, we will associate a word with a list of plausible meanings. We would like to keep the truly correct sense of each word on its own meaning list and yet keep the number of possible meanings of a whole sentence as small as possible.
We applied Sense Pruning to Chinese WSD, making use of the HowNet. HowNet is a knowledge base that describes all entities in its database by a set of unambiguous sememes. It provides information about the relationship between concepts or their attributes, in which concepts are represented by the sememes. One of our contributions is integrating various knowledge from HowNet for Sense Pruning, such as, relations between sememes, infomation structures in Chinese, relations of object and attribute, and characteristics of functional words. Based on HowNet, four additional databases were developed for Sense Pruning in this thesis.
We evaluated our Sense Pruning algorithm on the Corpus of Sinica from Taiwan. Two criteria were used for the evaluation: recall rate and reduction of the number of possible meanings of a sentence. Effects of the size of the analytical window and the analytical unit, and the speed of the algorithm were fully studied. In summary, Sense Pruning achieves a recall rate of 97% while reducing the number of possible meanings of a sentence by 48% when a whole sentence is taken as an analytical unit.
Post a Comment