THESIS
2001
iv, 60 leaves : ill. ; 30 cm
Abstract
We describe an architecture for speech recognition based interactive toys and discuss a strategy we have adopted to deal with the fact that the speech recognizer must deal with users whose age ranges from children to adults. Large variations in vocal tract length between children and adults can significantly degrade the performance of speech recognizers. We propose a model-based approach to vocal tract length normalization (VTLN), and compare it with the more common feature-based approach. Our results indicate that there is very little difference in performance between the two schemes. We also compare the performance of three warping factor detection schemes previously proposed for feature based VTLN, and demonstrate that these schemes are also effective for model based VTLN. Assuming e...[
Read more ]
We describe an architecture for speech recognition based interactive toys and discuss a strategy we have adopted to deal with the fact that the speech recognizer must deal with users whose age ranges from children to adults. Large variations in vocal tract length between children and adults can significantly degrade the performance of speech recognizers. We propose a model-based approach to vocal tract length normalization (VTLN), and compare it with the more common feature-based approach. Our results indicate that there is very little difference in performance between the two schemes. We also compare the performance of three warping factor detection schemes previously proposed for feature based VTLN, and demonstrate that these schemes are also effective for model based VTLN. Assuming enough memory is available to store pre-warped models, the computational cost of model warping is lower than that of feature warping. Under certain conditions, the computational cost may also be lower even if the warped models must be computed dynamically. In addition, the model warping approach requires no changes to the standard Mel-Frequency Cepstral Coefficient (MFCC) front-end. This technique is well suited for embedded systems using DSP and/or custom VLSI chips.
Post a Comment