THESIS
2002
xii, 55 leaves : ill. ; 30 cm
Abstract
Parsing, the task of identifying the syntactic components, e.g., noun and verb phrases, in a sentence, is one of the fundamental tasks in natural language processing. Many natural language applications, e.g., Machine translation, Speech Recognition or Information Extraction, would benefit from, or even require high-accuracy parsing as a preprocessing step....[
Read more ]
Parsing, the task of identifying the syntactic components, e.g., noun and verb phrases, in a sentence, is one of the fundamental tasks in natural language processing. Many natural language applications, e.g., Machine translation, Speech Recognition or Information Extraction, would benefit from, or even require high-accuracy parsing as a preprocessing step.
Due to the ambiguity of language, a sentence often has many possible parse trees. The challenge for building a good performance parser is how to choose the best parse tree from all possible parses. The maximum entropy probability model combines diverse pieces of contextual evidence and estimates the probability of a certain class occurring within a certain linguistic context. The model provides a clear way to resolve the problem of ambiguity in natural language.
The Chinese Maximum Entropy Parser accepts a segmented sentence and generates a set of scored candidate parse trees in three stages: part-of-speech tagging, text chunking and tree building. This thesis will detail describe the parser and experiments performed on the Chinese Penn Treebank Xinhua news corpus. The evaluation is done with standard accepted measures from PARSE-VAL. Given a training set of 3369 hand-annotated parsed sentences, the parser achieves 77.28% recall, 75.30% precision and 74.27% f-score for a held out test set of 345 sentences. The result shows that the parser achieves the state-of-the-art performance with a universal statistical modeling technique based on the maximum entropy principle.
Post a Comment