THESIS
2016
xii, 63 pages : illustrations (some color) ; 30 cm
Abstract
Disfluencies in spoken language remain a challenge to both language processing
applications and human perception. Disfluency identification and removal are therefore
beneficial steps to improve performance of spoken language understanding tasks. In particular, it is important for close captioning video conferences.
We investigated different approaches for disfluency identification and removal, ranging
from rule-based, translation models and supervised classification using either Conditional Random Fields (CRF) or Deep Neural Networks (DNN).
As supervised classifier in our task requires huge amount of human annotation and
labeling, the rule-based approach and translation model allow us to use less human labeling than supervised classification.
In the rule-based approach, we used...[
Read more ]
Disfluencies in spoken language remain a challenge to both language processing
applications and human perception. Disfluency identification and removal are therefore
beneficial steps to improve performance of spoken language understanding tasks. In particular, it is important for close captioning video conferences.
We investigated different approaches for disfluency identification and removal, ranging
from rule-based, translation models and supervised classification using either Conditional Random Fields (CRF) or Deep Neural Networks (DNN).
As supervised classifier in our task requires huge amount of human annotation and
labeling, the rule-based approach and translation model allow us to use less human labeling than supervised classification.
In the rule-based approach, we used regular expressions for matching simple disfluent
words and obtained 81.01 in the Bilingual Evaluation Understudy (BLEU) score, a way
to scale the reconstruction quality.
Secondly, we used a Weighted Finite State Transducer (WFST) to translate phrases that have complex disfluencies into more fluent ones and obtained 72.02 score in BLEU measure.
We applied for the first time DNN with the same feature set as a CRF baseline system and obtained better performance for one disfluency type, repeat, with precision at 81.7% and recall 82.0%. Furthermore, the harmonic mean F1 score for overall weighted disfluencies was improved by 3.7% when we used CRF system with additional
word vector feature trained by a neural network. We constructed a DNN-CRF hybrid system by using voting algorithm. For the disfluency type, false start, our F1 result is 52.2%. This is significant improvement over a CRF system’s baseline 36.7%.
It has been found that BLEU score is improved from 81.01 to 82.80 by elaborately combining all the three approaches.
Post a Comment