Optimizing the performance of GPD-based discriminative training in speech recognition
by Lam Wai Bun
M.Phil. Electrical and Electronic Engineering
iii, 63 leaves : ill. ; 30 cm
Speech Recognition is becoming more important in our daily life. Many applications are starting to use this technology to make them more effective. One important part of the speech recognition process is the training of speech models, which directly affects the performance of the recognition system....[ Read more ]
Speech Recognition is becoming more important in our daily life. Many applications are starting to use this technology to make them more effective. One important part of the speech recognition process is the training of speech models, which directly affects the performance of the recognition system.
The Maximum Likelihood (ML) approach is widely used because of its simplicity and ease of calculation. However, it has some disadvantages. To attain high performance, large number of training utterances is required, which is costly and time consuming to collect. In addition, the objective of the ML approach is not necessarily consistent with the objective of speech recognition, to minimize recognition error, unless the form of the source distribution and the model distribution are the same, a condition rarely satisfied in practice.
The Minimum Classification Error (MCE) rate training approach has been proposed recently to overcome these problems. In particular, segmental Generalized Probabilistic Descent (GPD)-based discriminative training updates the model parameter set to minimize an empirical estimate of the recognition error. However this approach depends on several parameters which degrade its performance if set improperly. Moreover, the inclusion of "hard" decision rules makes the algorithm highly dependent on the initial models used.
In the thesis, brief guidelines on parameter setting for the segmental training are described. Moreover, a novel "soft" GPD-based discriminative training approach is described. This "soft" discriminative training uses a new discriminative function to eliminate "hard" decision rules. Unfortunately, experimental results show that the recognition performance of models trained with soft discriminative training is not as good as the performance of models trained with the segmental approach. Further experiments elucidate possible reasons for this and suggest possible directions for future research.