THESIS
2014
viii, [63] pages : illustrations ; 30 cm
Abstract
We show for the first time how to use a multilingual acoustic modeling approach for Hindi
speech recognition under low resource conditions using deep neural networks (DNNs) and Hindi-English phone mapping. The system can leverage statistics from a resource-rich language,
English, to improve the acoustic modeling of a resource-poor language, Hindi.
Hindi is one of the most widely spoken languages in the world, spoken by over 300
million people natively and another 155 million as a second language. But like many Indian
languages, it is challenging to obtain large amounts of recorded and transcribed Hindi speech
data to train a large vocabulary continuous speech recognition (LVCSR) system. As a result,
there is a growing demand for techniques that utilize the data that is availabl...[
Read more ]
We show for the first time how to use a multilingual acoustic modeling approach for Hindi
speech recognition under low resource conditions using deep neural networks (DNNs) and Hindi-English phone mapping. The system can leverage statistics from a resource-rich language,
English, to improve the acoustic modeling of a resource-poor language, Hindi.
Hindi is one of the most widely spoken languages in the world, spoken by over 300
million people natively and another 155 million as a second language. But like many Indian
languages, it is challenging to obtain large amounts of recorded and transcribed Hindi speech
data to train a large vocabulary continuous speech recognition (LVCSR) system. As a result,
there is a growing demand for techniques that utilize the data that is available in abundance for
resource-rich languages to help build speech recognition systems for Hindi.
In this thesis, we propose an approach for acoustic modeling for Hindi speech by borrowing
from English data. As a first step, we start with a deep neural network that has been fine-tuned on English. We then replace the output layer with that corresponding to Hindi, borrowing
therest of the hidden layers from the English network, and then fine-tune the whole network on
Hindi. This process is repeated by adding more English data. We see consistent gains in phone
recognition accuracy from training the hidden layers with larger amounts of English data.
Using the deep neural network approach on 0.7 hours of Hindi data, the recognition results
improved from 34% to 54% over the baseline Gaussian Mixture Model - Hidden Markov Model
based models. More significantly, we show that by using increasing amounts of English data,
with the multilingual DNN approach, we are able to increase Hindi phone recognition accuracy
from 55% with 1 hour of English data to a final 60% recognition accuracy with 14 hours of
English data. Our results have surpassed the current state-of-the-art for Hindi phone recognition
under low resource conditions by 6%. We also extended our work on Hindi phone recognition to
Hindi word recognition by building a 3-gram Hindi language model with a vocabulary size of
500K Hindi words. The Hindi word recognition performance on our test corpus is 53%, which is
the best result to date for Hindi LVCSR under low resource conditions.
Post a Comment