Please use this identifier to cite or link to this item:
http://hdl.handle.net/10603/246056
Title: | Salient Features for Multilingual Speech Recognition in Indian Scenario |
Researcher: | Hari Krishna Vydana |
Guide(s): | Anil Kumar Vuppala |
Keywords: | Automatic Speech Recognition Code-Mixing Common Pone-set Engineering and Technology,Engineering,Engineering Multidisciplinary Indian Languages Indian Scenarios Joint Acoustic Model Multilingual Residual Networks |
University: | International Institute of Information Technology, Hyderabad |
Completed Date: | 08/01/2019 |
Abstract: | Automatic Speech Recognition (ASR) systems have witnessed a lot of progress in the past decade. In high resourced scenarios like English, ASR systems have shown performance comparable to human parity level on specific tasks. ASR systems for Indian languages are less studied compared to other high resourced languages like English. Developing an Indian language ASR systems requires addressing certain challenges which are innate to Indian languages. Often Indian language ASR systems have to be developed for low-resourced scenarios. Apart from multilingual nature, bilingualism is very prevalent in the Indian population which leads frequent code-switching and word borrowing between any two languages. Operating parallel ASR systems with code-switching capabilities in Indian scenarios is a huge challenge. This motivated us to work towards multilingual ASR systems which can handle code-mixing and word borrowing efficiently. In this thesis, we address various issues related to the development of ASR systems for Indian scenarios. An integrated ASR system is developed using common phone-set which can efficiently handle multilingual code-mixed speech. Acoustic modeling approaches such as HMM-GMM, HMM-SGMM and RNN-CTC have been studied to find the most suitable acoustic model. Residual networks have been explored to improve the performance of the joint acoustic models. Studies directed towards supplementing the conventional features along with articulatory features have been explored for developing multilingual ASR systems. Fricative landmarks are detected and the detected landmarks are used as the features for improving the performance of multilingual ASR system. Distinctive features from speech are modeled using a statistical approach and their relevance for improving the performance of a multilingual ASR is explored. In a low resourced scenario, the meta-level information about the speaker is not accessible. A speaker normalization method that can handle those scenarios is explored. |
Pagination: | All Pages |
URI: | http://hdl.handle.net/10603/246056 |
Appears in Departments: | Department of Electronic and Communication Engineering |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
appendix.pdf | Attached File | 747.31 kB | Adobe PDF | View/Open |
bibliography.pdf | 133.94 kB | Adobe PDF | View/Open | |
chapter1.pdf | 145.78 kB | Adobe PDF | View/Open | |
chapter2.pdf | 878.15 kB | Adobe PDF | View/Open | |
chapter3.pdf | 222.86 kB | Adobe PDF | View/Open | |
chapter4.pdf | 277.99 kB | Adobe PDF | View/Open | |
chapter5.pdf | 2.27 MB | Adobe PDF | View/Open | |
chapter6.pdf | 163.32 kB | Adobe PDF | View/Open | |
chapter7.pdf | 163.26 kB | Adobe PDF | View/Open | |
chapter8.pdf | 128.7 kB | Adobe PDF | View/Open | |
front_pages.pdf | 321.76 kB | Adobe PDF | View/Open | |
publications.pdf | 118.03 kB | Adobe PDF | View/Open |
Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).
Altmetric Badge: