THESIS
2015
xiv, 114 pages : illustrations ; 30 cm
Abstract
Multi-task learning (MTL) learns multiple tasks together and improves the performance
of all the tasks by exploiting extra information from each other with the shared
internal representation. Additional related secondary task(s) acts as regularizer(s) to
help improve the generalization performance of each singly learning task; the effect is
more prominent when the amount of training data is relatively small. Recently, deep
neural network (DNN) is widely utilized for acoustic modeling in automatic speech
recognition (ASR). The hidden layers of a DNN provide an ideal internal representation
for the shared knowledge. The main contribution of this thesis is the proposal of
three methods of that apply MTL to DNN for acoustic modeling by exploiting extra
information from related task...[
Read more ]
Multi-task learning (MTL) learns multiple tasks together and improves the performance
of all the tasks by exploiting extra information from each other with the shared
internal representation. Additional related secondary task(s) acts as regularizer(s) to
help improve the generalization performance of each singly learning task; the effect is
more prominent when the amount of training data is relatively small. Recently, deep
neural network (DNN) is widely utilized for acoustic modeling in automatic speech
recognition (ASR). The hidden layers of a DNN provide an ideal internal representation
for the shared knowledge. The main contribution of this thesis is the proposal of
three methods of that apply MTL to DNN for acoustic modeling by exploiting extra
information from related tasks, meanwhile imposing the guideline that the secondary
tasks should not require additional language resources. The guideline is important
when language resources are limited.
In the first method, phone and grapheme acoustic models are trained together
within the same deep neural network. The extra information is the phone-to-grapheme
mappings, which is confirmed by analysis and visualization of the implicit phone-to-grapheme correlation matrix computed from the model parameters. The training
convergence curve also shows that MTL training generalizes better to unseen data than
common single task learning does. Moreover, two extensions are proposed to further
improve the performance.
State tying, to some extent, relieves the data scarcity problem in context-dependent
acoustic modeling. However, quantization errors are inevitably introduced. The second
MTL method in this thesis aims at robust modeling of a large set of distinct context-dependent
acoustic units. More specifically, distinct triphone states are trained with a
smaller set of tied states, benefiting from better inductive bias to reach a better optimum.
In return, they embed more contextual information into the hidden layers of the
MTL-DNN acoustic models.
Our last method works in a multi-lingual setting when data of multiple languages
are available. Multi-lingual acoustic modeling is improved by learning a universal phone set (UPS) modeling task together with language-specific triphones modeling
tasks to help implicitly map the phones of multiple languages to each other.
MTL methods were proved to be effective on a board range of data sets. The
contributions of this thesis include the three proposed MTL methods, and the heuristic
guidelines we impose to find helpful secondary tasks. With the successful explorations,
we hope to stimulate more interest of MTL in improving ASR, and our results show
that it is promising for wider applications.
Post a Comment