Salient Features for Multilingual Speech Recognition in Indian Scenario

Hari Krishna Vydana

Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/246056

Title:	Salient Features for Multilingual Speech Recognition in Indian Scenario
Researcher:	Hari Krishna Vydana
Guide(s):	Anil Kumar Vuppala
Keywords:	Automatic Speech Recognition Code-Mixing Common Pone-set Engineering and Technology,Engineering,Engineering Multidisciplinary Indian Languages Indian Scenarios Joint Acoustic Model Multilingual Residual Networks
University:	International Institute of Information Technology, Hyderabad
Completed Date:	08/01/2019
Abstract:	Automatic Speech Recognition (ASR) systems have witnessed a lot of progress in the past decade. In high resourced scenarios like English, ASR systems have shown performance comparable to human parity level on specific tasks. ASR systems for Indian languages are less studied compared to other high resourced languages like English. Developing an Indian language ASR systems requires addressing certain challenges which are innate to Indian languages. Often Indian language ASR systems have to be developed for low-resourced scenarios. Apart from multilingual nature, bilingualism is very prevalent in the Indian population which leads frequent code-switching and word borrowing between any two languages. Operating parallel ASR systems with code-switching capabilities in Indian scenarios is a huge challenge. This motivated us to work towards multilingual ASR systems which can handle code-mixing and word borrowing efficiently. In this thesis, we address various issues related to the development of ASR systems for Indian scenarios. An integrated ASR system is developed using common phone-set which can efficiently handle multilingual code-mixed speech. Acoustic modeling approaches such as HMM-GMM, HMM-SGMM and RNN-CTC have been studied to find the most suitable acoustic model. Residual networks have been explored to improve the performance of the joint acoustic models. Studies directed towards supplementing the conventional features along with articulatory features have been explored for developing multilingual ASR systems. Fricative landmarks are detected and the detected landmarks are used as the features for improving the performance of multilingual ASR system. Distinctive features from speech are modeled using a statistical approach and their relevance for improving the performance of a multilingual ASR is explored. In a low resourced scenario, the meta-level information about the speaker is not accessible. A speaker normalization method that can handle those scenarios is explored.
Pagination:	All Pages
URI:	http://hdl.handle.net/10603/246056
Appears in Departments:	Department of Electronic and Communication Engineering

Files in This Item:

File	Description	Size	Format
appendix.pdf	Attached File	747.31 kB	Adobe PDF	View/Open
bibliography.pdf		133.94 kB	Adobe PDF	View/Open
chapter1.pdf		145.78 kB	Adobe PDF	View/Open
chapter2.pdf		878.15 kB	Adobe PDF	View/Open
chapter3.pdf		222.86 kB	Adobe PDF	View/Open
chapter4.pdf		277.99 kB	Adobe PDF	View/Open
chapter5.pdf		2.27 MB	Adobe PDF	View/Open
chapter6.pdf		163.32 kB	Adobe PDF	View/Open
chapter7.pdf		163.26 kB	Adobe PDF	View/Open
chapter8.pdf		128.7 kB	Adobe PDF	View/Open
front_pages.pdf		321.76 kB	Adobe PDF	View/Open
publications.pdf		118.03 kB	Adobe PDF	View/Open

Show full item record

Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge:

Shodhganga : a reservoir of Indian theses @ INFLIBNET