THESIS
2002
xvii, 125 leaves : ill. ; 30 cm
Abstract
The performance of automatic speech recognition (ASR) is known to degrade under noise corruption. Such means of corruption include additive noises such as air-conditioning noise, which corrupts across the whole speech waveform, short-time noises such as the slamming of a door or frame loss in network communication, which corrupts only a portion of the speech. Many previous researches focused on techniques which model the additive noise across the speech. However, for short-time noises that affect only a portion of speech, an alternative approach is to perform ASR on the unaffected (clean) parts of the speech signal....[
Read more ]
The performance of automatic speech recognition (ASR) is known to degrade under noise corruption. Such means of corruption include additive noises such as air-conditioning noise, which corrupts across the whole speech waveform, short-time noises such as the slamming of a door or frame loss in network communication, which corrupts only a portion of the speech. Many previous researches focused on techniques which model the additive noise across the speech. However, for short-time noises that affect only a portion of speech, an alternative approach is to perform ASR on the unaffected (clean) parts of the speech signal.
In this thesis we propose a robust algorithm that automatically discards badly scored observations and which effectively performs ASR based on the clean observations only. Analogous to the use of the trimmed-mean as a robust estimate of the sample mean, we search for the state sequence with most likelihood that ignores K frames. Like using the Viterbi algorithm to find the maximum likely state sequence, we developed a dynamic algorithm, called the frame-skipping Viterbi algorithm (FSVA) to implement the proposed idea. To find the best number of skips, K , for different noise environments, we developed a method called log-likelihood ratio thresholding (LLRT) which gives a performance comparable to that obtained using the best value of K .
The FSVA is evaluated under two different noise environments, i) in GSM channels with white noise or comfort noise replacement and ii) in short-time additive noise in different conditions. In all cases, the FSVA outperforms the Viterbi algorithm.
The Viterbi algorithm has wide range of applications beyond ASR such as coding and pattern recognition tasks. These tasks can be potentially benefit from the FSVA.
Post a Comment