THESIS
2000
xi, 59 leaves : ill. ; 30 cm
Abstract
The challenge of speaker adaptation is to reliably fine-tune models of a general population to fit the characteristics of a particular speaker with few data. In recent years, various adaptation techniques for hidden Markov models with mixture Gaussians have been proposed, such as MAP estimation, MLLR transformation and vector field smoothing. When the amount of adaptation data is sparse, most Gaussians in the HMMs are unobserved. Adaptation can be done by grouping similar Gaussians together to form regression classes and then transforming the Gaussians in groups. The grouping of Gaussians is often done at the full-space level. However, if the allocation of the adaptation data to each full-space regression class is too uneven, some estimated transformations will be less reliable. Hence,...[
Read more ]
The challenge of speaker adaptation is to reliably fine-tune models of a general population to fit the characteristics of a particular speaker with few data. In recent years, various adaptation techniques for hidden Markov models with mixture Gaussians have been proposed, such as MAP estimation, MLLR transformation and vector field smoothing. When the amount of adaptation data is sparse, most Gaussians in the HMMs are unobserved. Adaptation can be done by grouping similar Gaussians together to form regression classes and then transforming the Gaussians in groups. The grouping of Gaussians is often done at the full-space level. However, if the allocation of the adaptation data to each full-space regression class is too uneven, some estimated transformations will be less reliable. Hence, in this dissertation, we propose to perform the grouping at a finer acoustic level - subspace Gaussian level. Full-space Gaussians are first projected to orthogonal subspaces, and the resulting subspace Gaussians in the same subspace are clustered to form subspace regression classes (SSRCs). The motivation of using SSRCs is that clustering at lower dimension results in lower distortion, thus, fewer regression classes are needed. Besides, during the adaptation process, the number of transformation parameters is reduced with the use of SSRCs. Although reducing the number of transformation parameters lowers the complexity of the transformation, in situations where the amount of adaptation data is scarce, fewer transformation parameters are preferred. Furthermore, SSRC can offer a more even distribution of adaptation data over the full-space Gaussians, which can alleviate the problem of poorly re-estimated Gaussians.
We conducted several experiments on rapid speaker adaptation to evaluate the effectiveness of SSRCs. Experimental results show improvement of our subspace regression classes over traditional full-space regression classes.
A variant of the procedure is also proposed for task adaptation and similar success is achieved.
Post a Comment