Improved speaker verification with discrimination power weighting
by Chan Siu Man
M.Phil. Electrical and Electronic Engineering
xv, 95 leaves : ill. (some col.) ; 30 cm
Biometric authentication has become increasingly popular. Speaker verification, or sometimes called voice print, authenticates speaker identity by voice. One major advantage of speaker verification is its ability to authenticate over the telephone on which many business transactions are conducted....[ Read more ]
Biometric authentication has become increasingly popular. Speaker verification, or sometimes called voice print, authenticates speaker identity by voice. One major advantage of speaker verification is its ability to authenticate over the telephone on which many business transactions are conducted.
Different speech sounds contain different amount of information about a speaker's identity. The traditional verification approach weights a phoneme's importance by its duration. In this thesis, a more systematic investigation upon the contributions of different phonetic units on speaker verification is conducted. A new posterior probability transformation weighting approach is proposed.
We started by analyzing the contributions of different phonetic units by removing the effect of duration variation. Next, we confirmed that even if all the sound units (or called phonemes) are normalized to the same length, the speaker information between phonemes varies significantly. We then investigated 3 different weighting schemes: i) the use of Kullback-Leibler (KL) distance, ii) posterior probability transformation (PPT), and iii) generalized linear model (GLM). These schemes were applied to both text-dependent and text-independent speaker verification tasks.
In this study, it is discovered that while duration is an essential indicator of phoneme importance, a proper weighting of the scores of different phonemes can improve the verification performance. Both the KL distance approach and posterior probability transformation (PPT) give similar weighting factors. However, PPT also applies different shifts to the verification scores of phonemes in an effect to create phoneme-dependent thresholds, which consistently gives better performance. Furthermore, because PPT estimates the speaker posterior probability, the resulting verification score is more meaningful. PPT can be considered as a special case of GLM, which is an alternative to estimate phoneme weights. The experiments in this study demonstrated that applying posterior probability transformation resulted in approximately 15% relative improvement in equal error rate on text dependent verification using the YOHO corpus and 5% improvement on text-independent verification on the SPIDRE corpus.