THESIS
2020
xiii, 126 pages : illustrations ; 30 cm
Abstract
Perceiving a target speech in a multi-speaker environment is common. Consequently, the
ability to recognize a target syllable (vowel and consonant) in the presence of another syllable
is essential for speech perception. Human ability to recognize individual vowels from the
simultaneous presentation has been the subject of many studies in English. Nonetheless,
studies to recognize mixed consonants and vowels from simultaneous presentations have
received little attention. There are few studies in English, but similar studies in Mandarin
could not be found.
This study examines syllable recognition when listening to concurrent pairs in Mandarin. In
experiment 1, concurrent Mandarin vowels recognition was studied. The accuracy of
recognizing two concurrent tonal vowels increased wit...[
Read more ]
Perceiving a target speech in a multi-speaker environment is common. Consequently, the
ability to recognize a target syllable (vowel and consonant) in the presence of another syllable
is essential for speech perception. Human ability to recognize individual vowels from the
simultaneous presentation has been the subject of many studies in English. Nonetheless,
studies to recognize mixed consonants and vowels from simultaneous presentations have
received little attention. There are few studies in English, but similar studies in Mandarin
could not be found.
This study examines syllable recognition when listening to concurrent pairs in Mandarin. In
experiment 1, concurrent Mandarin vowels recognition was studied. The accuracy of
recognizing two concurrent tonal vowels increased with the logarithmic spectral contrasts
between the two vowels. This finding is new as past studies only showed linear increases.
Experiments 2 and 3 extended the range of concurrent syllables by adding an initial
consonant to vowels, which formed a “consonant + vowel (CV)”-pattern syllable. Results
indicated that increasing both consonant and vowel spectral contrast can improve recognition
performance. In addition, the recognition accuracy was further evaluated in terms of
consonants, vowels and tones separately. A power function model was fitted to construct the
relationship among consonants, vowels, tones and syllables recognition performance. The
weighting coefficients of the model revealed that the relative contributions of consonants
were more than vowels and tones in concurrent syllable recognition. Moreover, a deep-learning model, trained to separate speech, was programmed to separate concurrent syllables.
A comparison of human performance with the deep-learning models indicated that both
human and machine performance was significantly affected by spectral differences in
concurrent vowels. Interestingly, in concurrent syllables separation when different categories
of consonants were involved, similar findings were found for model recognizing syllables
differed on consonants. The effect of spectral contrasts was not significant when both vowel
differences and consonant differences were presented. Possible insights in model training are
discussed.
Post a Comment