THESIS
2003
xii, 103 leaves : ill. ; 30 cm
Abstract
One important issue in speech recognition is the ability to handle variations caused by unseen speakers or noise. One approach to handle variations is to adapt the trained acoustic models to the testing environment. Maximum likelihood linear regression is one of these techniques for handling speaker variations. Can speaker adaptation techniques be applicable to handling noise variations? To answer this question, an in-depth analysis on the effects of speaker and noise variations on speech has been performed. It shows that the speaker variation leads to a shift and a rotation in the cepstral features while additive noise causes a shift in the cepstral features and a reduction in the feature variance....[
Read more ]
One important issue in speech recognition is the ability to handle variations caused by unseen speakers or noise. One approach to handle variations is to adapt the trained acoustic models to the testing environment. Maximum likelihood linear regression is one of these techniques for handling speaker variations. Can speaker adaptation techniques be applicable to handling noise variations? To answer this question, an in-depth analysis on the effects of speaker and noise variations on speech has been performed. It shows that the speaker variation leads to a shift and a rotation in the cepstral features while additive noise causes a shift in the cepstral features and a reduction in the feature variance.
This finding is consistent with the recently proposed feature vector normalization technique, which uses the utterance mean and variance to compensate the shift and dynamic range reduction. This empirical normalization has been shown to improve the recognition performance under additive noise and channel effect. However, the estimates may not be optimal. Instead, normalization that maximizes the likelihood is proposed. In addition, we propose two different approximations with different computational complexities. Because the effect of additive noise is dependent on the signal-to-noise ratio, its influence on different phonetic units can vary. We generalized our framework allowing different normalization factors for different phonetic classes. Applying the ML normalization on the Aurora 3 corpus, we obtained a relative improvement of 10% compared to the ESTI standard baseline.
Frame skipping Viterbi algorithm (FSVA) has been recently shown to be useful in handling impulsive noises. However, how to handle impulsive noise and additive noise together has not been addressed. As an extension to the ML normalization, we integrate the FSVA algorithm into the ML normalization framework. This integration results in a relative improvement of 25% when impulsive noises are present.
Post a Comment