THESIS
2003
xix, 123 leaves : ill. ; 30 cm
Abstract
In most speech recognition systems, acoustic features are extracted from the whole frequency spectrum of a speech signal. Corruption of the speech signal in one particular frequency band will affect the whole feature vector. It is known that such full-band (FB) speech recognition systems are not robust under noisy environments. To address this problem, multi-band (MB) speech recognition was proposed recently. In the MB approach, the whole frequency spectrum is divided into sub-bands and a model is trained for each sub-band. During recognition, the sub-band results are recombined to yield a global decision. It is hoped that the adverse effect of noises may be mitigated using appropriate recombination logic....[
Read more ]
In most speech recognition systems, acoustic features are extracted from the whole frequency spectrum of a speech signal. Corruption of the speech signal in one particular frequency band will affect the whole feature vector. It is known that such full-band (FB) speech recognition systems are not robust under noisy environments. To address this problem, multi-band (MB) speech recognition was proposed recently. In the MB approach, the whole frequency spectrum is divided into sub-bands and a model is trained for each sub-band. During recognition, the sub-band results are recombined to yield a global decision. It is hoped that the adverse effect of noises may be mitigated using appropriate recombination logic.
The main contribution of this thesis is a new acoustic modeling method which we call frequency-stream-tying hidden Markov model (FSTHMM) . FSTHMM is motivated by the MB HMM generated by HMM composition. An FSTHMM has multiple streams of acoustic features, and each stream consists of acoustic features extracted from one sub-band. These streams are tied across states as in an MB HMM created by HMM composition to allow asynchronous transitions of different sub-band acoustic features. As the number of model parameters remains nearly the same as that of an FB system, the FSTHMM can be viewed as an effective modeling technique to expand the state space without increasing the number of model parameters. Unlike the MB HMM, the sub-band parameters of an FSTHMM are jointly optimized during training. The sub-band weights can be trained afterward to further improve the system's performance.
In a series of experiments on the Aurora2 corpus, it is shown that FSTHMMs outperform their MB HMM counterparts. The superior performance of the FSTHMMs over the corresponding FB HMMs on clean data but not on noisy data shows that our FSTHMMs should be applied to matched condition only. Using the multi-condition training method, FSTHMMs give a slight improvement over the corresponding FB HMM counterparts on test data with high signal-to-noise ratio (SNR). For data with low SNR, the performance of FSTHMMs varies under different noise environments.
Post a Comment