THESIS
1996
viii, 94 leaves : ill. ; 30 cm
Abstract
In this research, we study experimentally the effectiveness of more than 30 objective measures in estimating the subjective quality of Cantonese and Mandarin speech in an analog cellular system AMPS (Advanced Mobile Phone System). These measures can be classified into five classes: Signal-to-Noise Ra-tio based, Linear-Prediction-Coefficient based, Spectral Distance based, Psychoacoustically motivated based and other recent measures. It was found that all these measures require perfect synchronization of the original and distorted speech which is difficult to perform. In addition, an output-based measure was invented based on the visual effect of the spectrogram and the result is found to be good....[
Read more ]
In this research, we study experimentally the effectiveness of more than 30 objective measures in estimating the subjective quality of Cantonese and Mandarin speech in an analog cellular system AMPS (Advanced Mobile Phone System). These measures can be classified into five classes: Signal-to-Noise Ra-tio based, Linear-Prediction-Coefficient based, Spectral Distance based, Psychoacoustically motivated based and other recent measures. It was found that all these measures require perfect synchronization of the original and distorted speech which is difficult to perform. In addition, an output-based measure was invented based on the visual effect of the spectrogram and the result is found to be good.
In the experiment, a set of phonetically-balanced sentences is designed for both Cantonese and Mandarin. Each sentence is spoken by two male and two female native speakers and recorded in DAT to form the original speech database. These sentences are then sent through the wireless channel via an AMPS cellular phone installed in a car. Some designated route are planned in order to catch various distortions, such as high/low power, fast/slow fading and multipath fading. Simultaneously, in the laboratory, the distorted speech is recorded from a normal telephone to a DAT deck. The distorted speech is then synchronized and scaled with the corresponding original speech quality measure is done through surveying to obtain Mean Opinion Score (MOS).The figure of merit is to find the best objective measures that is most statistically correlated with the MOS.
It was found that for the Cantonese database, psychoacoustically motivated measures and two recently developed measures are superior to other measures. Among the best four measures, the correlation of Bark Spectral Distance (MSD), Mel Spectral Distance (BSD), Coherence Function (COH) and Information Index (RII) are 0.89, 0.88, 0.86, 0.82 respectively. The other reasonable good measures are Log Spectral Distance (0.79), Frequency Variant Segmental SNR (0.70). For the M d an arin database, the best four measures are Mel Spectral Distance (0.71), Bark Spectral Distance (0.69), Frequency Weighted Log Spectral Distance (0.60) and Output Based Measure (0.76). As a conclusion, some objective measures are statistically good in reflecting the subjective speech quality of the AMPS cellular phone as perceived by the users.
Post a Comment