THESIS
2014
xii, 110 pages : illustrations ; 30 cm
Abstract
In triphone-based acoustic modeling, it is difficult to robustly model infrequent
triphones due to their lack of training samples. Naive maximum-likelihood (ML) estimation
of infrequent triphone models produces poor triphone models and eventually
affects the overall performance of an automatic speech recognition (ASR) system.
Among different techniques proposed to solve the infrequent triphone problem, the
most widely used method in current ASR systems is state tying because of its effectiveness
in reducing model size and achieving good recognition results. However, state
tying inevitably introduces quantization errors since triphones tied to the same state are
not distinguishable in that state. This thesis addresses the problem by the use of distinct
acoustic modeling where ev...[
Read more ]
In triphone-based acoustic modeling, it is difficult to robustly model infrequent
triphones due to their lack of training samples. Naive maximum-likelihood (ML) estimation
of infrequent triphone models produces poor triphone models and eventually
affects the overall performance of an automatic speech recognition (ASR) system.
Among different techniques proposed to solve the infrequent triphone problem, the
most widely used method in current ASR systems is state tying because of its effectiveness
in reducing model size and achieving good recognition results. However, state
tying inevitably introduces quantization errors since triphones tied to the same state are
not distinguishable in that state. This thesis addresses the problem by the use of distinct
acoustic modeling where every modeling unit has a unique model and a distinct
acoustic score.
The main contribution of this thesis is the formulation of the estimation of triphone
models as an adaptation problem through our proposed distinct acoustic modeling
framework named eigentriphone modeling. The rational behind eigentriphone
modeling is that a basis is derived from the frequent triphones and then each triphone
is modeled as a point in the space spanned by the basis. The eigenvectors in the basis
represent the most important context-dependent characteristics among the triphones and thus the infrequent triphones can be robustly modeled with few training samples.
Furthermore, the proposed framework is very flexible and can be applied to other modeling
units. Since grapheme-based modeling is useful in automatic speech recognition
of under-resourced languages, we further apply our distinct acoustic modeling framework
to estimate context-dependent grapheme models and we call our new method
eigentrigrapheme modeling. Experimental evaluation of eigentriphone modeling was
carried out on the Wall Street Journal word recognition task and the TIMIT phoneme
recognition task. Experimental evaluation of eigentrigrapheme modeling was carried
out on four official South African under-resourced languages. It is shown that distinct
acoustic modeling using the proposed eigentriphone framework consistently performs
better than the conventional tied-state HMMs.
Post a Comment