THESIS
2000
xii, 88 leaves : ill. ; 30 cm
Abstract
Automatic speech recognition technology has developed rapidly in the past decade. Applications of this technology have been seen everywhere such as voice dialing in mobile phone and enquiry system in many big companies' hot-line. As automatic speech recognization systems become more and more popular, the scope and type of people using it increases. The variety of speaker differences especially accent has make a challenge or problem to these systems....[
Read more ]
Automatic speech recognition technology has developed rapidly in the past decade. Applications of this technology have been seen everywhere such as voice dialing in mobile phone and enquiry system in many big companies' hot-line. As automatic speech recognization systems become more and more popular, the scope and type of people using it increases. The variety of speaker differences especially accent has make a challenge or problem to these systems.
Automatic speech recognition systems are usually trained by one or several accent groups. When the user has a different accent with the training accent(s), the performance of speech recognition systems degrade. This is attributed to both acoustic and phonological differences between languages. There are many languages in the world. Some languages are closer to each other than are others. For Example, English and German are closer to each other than either is to Chinese. Languages differ from each other in terms of their phoneme inventory, grammar, stress pattern and etc. People will acquire a certain speaking style from their language. As Asian languages such as Chinese is very different from English. We can see there are great differences in their speaking style.
We propose a simple and faster automatic speech recognition system by exploring such differences and making use the techniques of accent classification and accent adaptation. It is shown that some acoustic features such as energy, formants and pitch are sensitive to Cantonese and other Asian accents. Using these features, we could make an efficient accent classifier. The performance is 88.51% for the classification between native English accent and Cantonese accent. We further simplify the complexity of the classifier by i) using just one single HMM to model accent instead of a set of phoneme class models and ii) choosing a more discriminative feature set through the feature selection methods. Both methods retain the discriminative factors of the original classifier but operating in faster way. Using sequential backwards search (SBS) and single HMM, the classification accuracy between native English and Cantonese accent is improved to 92.57% and the number of parameters input to the classifier is halved.
After the accent of the speaker is known, the system will use the corresponding accent-adapted lexicon and acoustic models. We present a fast method to obtain accent-adapted lexicon by simply using the linguistic knowledge. By adding the possible pronunciations of a words of the accent group to the lexicon, we could normalize the phonological differences between accents. Finally the acoustic differences are normalized by applying MLLR adaptation to the acoustic models. We propose a method of acoustic adaptation without using accented data. People carries their speaking style from mother language to second language. Therefore such adaptated models can also be easily developed by just using source language data only which is usually available. We found from our experiment that using source language data gave comparable performance, compared to the case of using accented data for the adaptation.
In summary, we have shown an effective way of accent classification using single HMM as accent model and selected prosodic features as the input parameters to the HMM. We presents an fast way to obtain accent-specific lexicon by using linguistic knowledge. We also shows that acoutic models can by obtained easily by MLLR adaptation without accented data.
Post a Comment