THESIS
1998
v, 70 leaves : ill. ; 30 cm
Abstract
Automatic speech recognition is very sensitive to mismatch between training and testing condition, eg, differences in background noise level and data acquisition equipment. The recognition accuracy can be enhanced by using more robust speech features, or by adapting testing data to the training conditions or vice versa. In this study, we combine the HMM MFCC-based model transform proposed by Vaseghi and Milner in 1993 and Maximum Likelihood (ML) framework by Sankar and Lee in 1996 with additional adaptation of dynamic and acceleration features. It was reported by Hanson and Applebaum in 1990 that dynamic and acceleration features are more reliable in noisy environment. Our experiment shows that a further 5% enhancement of recognition accuracy results after performing dynamic feature adj...[
Read more ]
Automatic speech recognition is very sensitive to mismatch between training and testing condition, eg, differences in background noise level and data acquisition equipment. The recognition accuracy can be enhanced by using more robust speech features, or by adapting testing data to the training conditions or vice versa. In this study, we combine the HMM MFCC-based model transform proposed by Vaseghi and Milner in 1993 and Maximum Likelihood (ML) framework by Sankar and Lee in 1996 with additional adaptation of dynamic and acceleration features. It was reported by Hanson and Applebaum in 1990 that dynamic and acceleration features are more reliable in noisy environment. Our experiment shows that a further 5% enhancement of recognition accuracy results after performing dynamic feature adjustment when the Signal-to-Noise (SNR) ratio is less than 10dB. Furthermore, using ML framework can eliminate the need for segmentation of noise from noisy adaptation data and can also compensate for inaccuracies in the estimation of the compensation coefficients as well as those due to the approximations used in deriving the transformation. Finally, we adopt Cepstral Mean Bias technique, which simply added a bias to all mixture means, to tackle a more general noise environment containing additive background noise and constant channel distortion.
Post a Comment