Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/246056
Title: Salient Features for Multilingual Speech Recognition in Indian Scenario
Researcher: Hari Krishna Vydana
Guide(s): Anil Kumar Vuppala
Keywords: Automatic Speech Recognition
Code-Mixing
Common Pone-set
Engineering and Technology,Engineering,Engineering Multidisciplinary
Indian Languages
Indian Scenarios
Joint Acoustic Model
Multilingual
Residual Networks
University: International Institute of Information Technology, Hyderabad
Completed Date: 08/01/2019
Abstract: Automatic Speech Recognition (ASR) systems have witnessed a lot of progress in the past decade. In high resourced scenarios like English, ASR systems have shown performance comparable to human parity level on specific tasks. ASR systems for Indian languages are less studied compared to other high resourced languages like English. Developing an Indian language ASR systems requires addressing certain challenges which are innate to Indian languages. Often Indian language ASR systems have to be developed for low-resourced scenarios. Apart from multilingual nature, bilingualism is very prevalent in the Indian population which leads frequent code-switching and word borrowing between any two languages. Operating parallel ASR systems with code-switching capabilities in Indian scenarios is a huge challenge. This motivated us to work towards multilingual ASR systems which can handle code-mixing and word borrowing efficiently. In this thesis, we address various issues related to the development of ASR systems for Indian scenarios. An integrated ASR system is developed using common phone-set which can efficiently handle multilingual code-mixed speech. Acoustic modeling approaches such as HMM-GMM, HMM-SGMM and RNN-CTC have been studied to find the most suitable acoustic model. Residual networks have been explored to improve the performance of the joint acoustic models. Studies directed towards supplementing the conventional features along with articulatory features have been explored for developing multilingual ASR systems. Fricative landmarks are detected and the detected landmarks are used as the features for improving the performance of multilingual ASR system. Distinctive features from speech are modeled using a statistical approach and their relevance for improving the performance of a multilingual ASR is explored. In a low resourced scenario, the meta-level information about the speaker is not accessible. A speaker normalization method that can handle those scenarios is explored.
Pagination: All Pages
URI: http://hdl.handle.net/10603/246056
Appears in Departments:Department of Electronic and Communication Engineering

Files in This Item:
File Description SizeFormat 
appendix.pdfAttached File747.31 kBAdobe PDFView/Open
bibliography.pdf133.94 kBAdobe PDFView/Open
chapter1.pdf145.78 kBAdobe PDFView/Open
chapter2.pdf878.15 kBAdobe PDFView/Open
chapter3.pdf222.86 kBAdobe PDFView/Open
chapter4.pdf277.99 kBAdobe PDFView/Open
chapter5.pdf2.27 MBAdobe PDFView/Open
chapter6.pdf163.32 kBAdobe PDFView/Open
chapter7.pdf163.26 kBAdobe PDFView/Open
chapter8.pdf128.7 kBAdobe PDFView/Open
front_pages.pdf321.76 kBAdobe PDFView/Open
publications.pdf118.03 kBAdobe PDFView/Open
Show full item record


Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge: