THESIS
2007
xi, 88 leaves : ill. ; 30 cm
Abstract
One of the most common language verification (LV) approaches is the phonotactic language verification where spoken utterances are first tokenized with a phoneme recognizer into a sequence of phoneme tokens and verifica-tion is based solely on differences in token sequence distributions, or the phonotactics....[
Read more ]
One of the most common language verification (LV) approaches is the phonotactic language verification where spoken utterances are first tokenized with a phoneme recognizer into a sequence of phoneme tokens and verifica-tion is based solely on differences in token sequence distributions, or the phonotactics.
In this thesis, a source-channel model is proposed to represent the conver-sion of the speech to phoneme sequence via the tokenization process. Depend-ing on the channel characteristics, discrimination between languages can be significantly reduced after passing the channel and this reduction can be re-lated to the mutual information of the source and output sequences. Because phoneme recognition accuracy is widely used to characterize the performance of a phoneme recognizer, we analyze its effect on language discrimination as well as other types of phoneme recognizer behavior.
A verification task can be viewed as a special case of hypothesis testing such that Neyman-Pearson theorem and other information theoretic analysis are applicable. Under this framework, different factors affecting the trade-off between type I and type II errors are studied. For LV, the commonly used performance metric is the equal error rate (EER), which is the point where the type I and II error rates are equal. In this thesis, we propose the use of sequence divergence as a measure for the characterizing language similarities. We then analyze the influence on LV EER by language similarity as well as other factors, such as test duration, amount of training material.
Simulations are performed to validate the theoretical prediction. In ad-dition, LV experiments using real speech data are carried out. Discrepancy between the two are analyzed and can be attributed to factors such as invalid modeling assumptions and inaccurate model estimation.
Post a Comment