THESIS
2007
xvii, 144 leaves : ill. ; 30 cm
Abstract
Recently, acoustic based language identification systems (LID) have been gaining attention because it does not required transcribed training data. More importantly, with the Shifted-delta-cepstral features, its performance is competitive with the more complex phonotactic approach. Most of the acoustic-based LID systems use Gaussian mixture models (GMMs). In this thesis, we construct a state-of-the-art GMM-based LID system with good improvements from discriminative training. The discriminative training is applied in three different parts of the LID process, namely, the sequence modeling, the acoustic model learning and the frontend feature extraction....[
Read more ]
Recently, acoustic based language identification systems (LID) have been gaining attention because it does not required transcribed training data. More importantly, with the Shifted-delta-cepstral features, its performance is competitive with the more complex phonotactic approach. Most of the acoustic-based LID systems use Gaussian mixture models (GMMs). In this thesis, we construct a state-of-the-art GMM-based LID system with good improvements from discriminative training. The discriminative training is applied in three different parts of the LID process, namely, the sequence modeling, the acoustic model learning and the frontend feature extraction.
While most of the language specific information for acoustic-based LID system comes from the separation in acoustic space, the sequence informa-tion of the GMM indices does provide additional information. In this work, we generalized the approach of using SVM for sequence modeling, recently proposed in speaker verification, for LID. SVM assigns model weights by their ability to classify the language and inherently smooths the model to avoid over-fitting. To further improve the token sequence representation, we further generalized the SVM-based sequence modeling to n-best GMM token sequences.
While discriminatively acoustic training criteria, such as minimum mutual information (MMI), have been proposed for learning LID acoustic model parameters, however, they are typically hard to train and computationally complex. In this thesis, we proposed an iterative discriminative acoustic training technique that extends the GMM to emphasize the mis-classified training data. This method can be viewed as an simplified variation of the BOOSTING techniques widely used in pattern recognition. In addition to being relatively simple and fast, the relative improvement from this approach is close to that obtained by the MMI approach.
Shifted-delta cepstral (SDC) features, which can be viewed as a linearly transformed set of features from a sequence of cepstral features, have shown significant performance improvement in LID. We have developed the frame-work to use heteroscedastic linear discriminative analysis (HLDA) to obtain a more optimized linear transformation that may further improve feature selection.
The combination of the above discriminative approaches improved the LID equal error rate from 4.3% to 2.8% which represents a relative improve-ment of more than 30%.
Post a Comment