Multi-task learning deep neural networks for automatic speech recognition

HKUST Electronic Theses

Multi-task learning deep neural networks for automatic speech recognition

by Dongpeng Chen

THESIS 2015

Ph.D. Computer Science and Engineering

xiv, 114 pages : illustrations ; 30 cm

Abstract

Multi-task learning (MTL) learns multiple tasks together and improves the performance of all the tasks by exploiting extra information from each other with the shared internal representation. Additional related secondary task(s) acts as regularizer(s) to help improve the generalization performance of each singly learning task; the effect is more prominent when the amount of training data is relatively small. Recently, deep neural network (DNN) is widely utilized for acoustic modeling in automatic speech recognition (ASR). The hidden layers of a DNN provide an ideal internal representation for the shared knowledge. The main contribution of this thesis is the proposal of three methods of that apply MTL to DNN for acoustic modeling by exploiting extra information from related tasks, meanwhile imposing the guideline that the secondary tasks should not require additional language resources. The guideline is important when language resources are limited.

In the first method, phone and grapheme acoustic models are trained together within the same deep neural network. The extra information is the phone-to-grapheme mappings, which is confirmed by analysis and visualization of the implicit phone-to-grapheme correlation matrix computed from the model parameters. The training convergence curve also shows that MTL training generalizes better to unseen data than common single task learning does. Moreover, two extensions are proposed to further improve the performance.

State tying, to some extent, relieves the data scarcity problem in context-dependent acoustic modeling. However, quantization errors are inevitably introduced. The second MTL method in this thesis aims at robust modeling of a large set of distinct context-dependent acoustic units. More specifically, distinct triphone states are trained with a smaller set of tied states, benefiting from better inductive bias to reach a better optimum. In return, they embed more contextual information into the hidden layers of the MTL-DNN acoustic models.

Our last method works in a multi-lingual setting when data of multiple languages are available. Multi-lingual acoustic modeling is improved by learning a universal phone set (UPS) modeling task together with language-specific triphones modeling tasks to help implicitly map the phones of multiple languages to each other.

MTL methods were proved to be effective on a board range of data sets. The contributions of this thesis include the three proposed MTL methods, and the heuristic guidelines we impose to find helpful secondary tasks. With the successful explorations, we hope to stimulate more interest of MTL in improving ASR, and our results show that it is promising for wider applications.

[ Hide abstract ]

View Copyrighted to the author. Reproduction is prohibited without the author’s prior written consent.

Details

Collection HKUST Electronic Theses Degree Ph.D. Department Computer Science and Engineering Supervisors Mak, Brian Authors Chen, Dongpeng Subjects Automatic speech recognition Mathematical models Machine learning Computer multitasking Language English Call number Thesis CSED 2015 Chen DOI 10.14711/thesis-b1514768

Full record

Multi-task learning deep neural networks for automatic speech recognition

by Dongpeng Chen

Post a Comment Cancel reply