THESIS
2001
xv, 93 leaves : ill. ; 30 cm
Abstract
Recently, multi-band automatic speech recognition (MBASR) has been proposed by Bourlard et al. and Hermansky et al. to improve robustness under noisy environment. It is motivated by the empirical findings by Harvey Fletcher of Bell Labs from a thorough study of human speech recognition in which partial speech recognition in sub-bands is believed to take place and then the sub-band decisions are recombined to arrive at a global decision. They found that the full-band error rate is empirically equal to the product of sub-band error rates. This implies that human can recognize speech correctly if there exists a correct sub-band recognition....[
Read more ]
Recently, multi-band automatic speech recognition (MBASR) has been proposed by Bourlard et al. and Hermansky et al. to improve robustness under noisy environment. It is motivated by the empirical findings by Harvey Fletcher of Bell Labs from a thorough study of human speech recognition in which partial speech recognition in sub-bands is believed to take place and then the sub-band decisions are recombined to arrive at a global decision. They found that the full-band error rate is empirically equal to the product of sub-band error rates. This implies that human can recognize speech correctly if there exists a correct sub-band recognition.
The MBASR framework proposed by Bourlard et al. and Hermansky et al. is to divide the full frequency band into sub-bands and a speech recognizer is built for each sub-band. During recognition, decisions from individual sub-band recognizers are recombined at some phonetic or linguistic level.
In this thesis, we study MBASR by implementing the following features. First, it can be used for continuous speech recognition. Second, it allows asynchronous recombination of sub-band information at any modeling units. Third, sub-band information are recombined in an optimal sense.
HMM composition framework is introduced as the back-bone in our multi-band system to address the sub-band asynchrony issue. In addition, continuous speech recognition can easily be realized under this framework. Using linear recombination of sub-band log-likelihoods, string-based minimum classification error criterion with competing strings derived from the N-best algorithm is employed to optimize the sub-band weightings using simulated noisy speech. Finally, the multi-band system is evaluated under various noisy environments. Our experimental results suggested that our multi-band framework can achieve the best sub-band performance at all evaluated noisy environments. With ideal end-points, it maintains the same or better performance than a full-band system at most evaluated noisy environments. With non-ideal end-points, it performs slightly worse than a full-band system on real noises that we evaluated.
Post a Comment