THESIS
2000
x, 43 leaves : ill. ; 30 cm
Abstract
In Mandarin speech recognition, initial-final subword units are commonly used. According to the Frequency Dictionary of Modern Chinese[4], among the top 9000 most frequent words, 26.7% are unigrams, 69.8% are bigrams, 2.7% are trigrams, 0.0007% 4-grams, and 0.0002% 5-grams. Another study[19] showed that in general, 75% of Chinese words are bigrams, 14% trigrams, 6% n-grams with n [greater than] 3. Each character is monosyllabic. If initial-final segmentation is used, each Chinese word would only consist of two to six units. This is relatively short compared with English words which contain about seven phonemes on average. For this reason, the utterance verification of Chinese keywords performs relatively lower than English particularly for short Chinese utterances. In this thesis, we p...[
Read more ]
In Mandarin speech recognition, initial-final subword units are commonly used. According to the Frequency Dictionary of Modern Chinese[4], among the top 9000 most frequent words, 26.7% are unigrams, 69.8% are bigrams, 2.7% are trigrams, 0.0007% 4-grams, and 0.0002% 5-grams. Another study[19] showed that in general, 75% of Chinese words are bigrams, 14% trigrams, 6% n-grams with n [greater than] 3. Each character is monosyllabic. If initial-final segmentation is used, each Chinese word would only consist of two to six units. This is relatively short compared with English words which contain about seven phonemes on average. For this reason, the utterance verification of Chinese keywords performs relatively lower than English particularly for short Chinese utterances. In this thesis, we propose three methods improving the overall performance for both Chinese long and short keyword utterances.
To improve confidence scoring for verification of keywords, we propose a state-independent Log Likelihood Ratio (LLR) that discriminates between true and mis-recognition scores. A 13% improvement is obtained with the state-independent LLR with 10% false rejection rate. Moreover, for setting the optimal rejection threshold, a dynamic threshold setting method is proposed so that each keyword has an individual threshold. This method gives a maximum 10% improvement in false acceptance rate. Initial-final HMMs is popular for Chinese speech recognition. However, since most Chinese keywords are very short, keyword recognition accuracy suffers because there are only a very few initial and finals in each keyword. We propose using higher resolution subword units for HMM based Chinese keyword verification. In addition to the initial unit, the finals split into two segments as well. A 10% error reduction rate is obtained compared to the baseline system for short utterances when we fix the false rejection rate at 25%.
Post a Comment