Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/426105
Title: Strategies for Handling Large Vocabulary and Data Sparsity Problems for Tamil Speech Recognition
Researcher: Madhavaraj, A
Guide(s): Ramakrishnan, A G
Keywords: Engineering
Engineering and Technology
Engineering Electrical and Electronic
University: Indian Institute of Science Bangalore
Completed Date: 2020
Abstract: This thesis focuses on the design and development of every building block of a very large vocabulary, continuous speech recognition (LVCSR) system and various experiments conducted in order to enhance its performance. To our knowledge, this is the first report of development of such a full-fledged Tamil LVCSR system, since we could not find any journal or even refereed conference papers building a Tamil LVCSR system and reporting recognition results on large scale open-source speech recognition test datasets. A large read speech corpus of 217 hours has been collected and annotated at the sentence level for the development of the LVCSR system. Out of this, 160 hours of data has been used for training the LVCSR, 50 hours as the test set, and the publicly available 7 hours of OpenSLR-65 data released by Google is used as the development set. The major contributions of the thesis are: Collection of a large amount of Tamil speech and text data, editing the transcriptions to match the spoken utterances and using them to develop a deep neural network (DNN) and graphical model-based Tamil LVCSR system. Handling the unlimited vocabulary problem in Tamil by proposing subword modeling technique using novel subword dictionary creation and word segmentation techniques implemented efficiently using weighted finite state transducer (WFST) framework. Addressing the data sparsity problem by leveraging data from multiple low and medium resourced languages by pooling data using novel phone/senone mapping techniques and training a multitask DNN (MT-DNN). Proposing a novel coactivation loss for speaker-adapting the DNN using asymptotic Bayesian approximation through Laplace approximation, by using mean and covariance statistics of the activation values at all the hidden layers of the speaker-independent DNN. Studying the use of scattering transform features in acoustic modeling and proposing a DNN architecture inspired by it to jointly perform feature extraction and acoustic modeling from raw speech signal...
Pagination: xx, 138
URI: http://hdl.handle.net/10603/426105
Appears in Departments:Electrical Engineering

Files in This Item:
File Description SizeFormat 
01_title.pdfAttached File39.31 kBAdobe PDFView/Open
02_prelim pages.pdf140.51 kBAdobe PDFView/Open
03_table of content.pdf41.14 kBAdobe PDFView/Open
04_abstract.pdf26.7 kBAdobe PDFView/Open
05_chapter 1.pdf713.82 kBAdobe PDFView/Open
06_chapter 2.pdf1.71 MBAdobe PDFView/Open
07_chapter 3.pdf2.59 MBAdobe PDFView/Open
08_chapter 4.pdf536.56 kBAdobe PDFView/Open
09_chapter 5.pdf361.51 kBAdobe PDFView/Open
10_chapter 6.pdf709.16 kBAdobe PDFView/Open
11_annexure.pdf325.9 kBAdobe PDFView/Open
80_recommendation.pdf65.06 kBAdobe PDFView/Open
Show full item record


Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge: