Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/535770
Title: Towards Building an Automatic Speech Recognition Systems in Indian Context using Deep Learning
Researcher: Sai Ganesh,Mirishkar
Guide(s): Anil Kumar, Vuppala
Keywords: Engineering
Engineering and Technology
Engineering Electrical and Electronic
University: International Institute of Information Technology, Hyderabad
Completed Date: 2023
Abstract: Automatic Speech Recognition (ASR) systems are increasingly prevalent in our daily lives, with newlinecommercial applications such as Siri, Alexa, and Google Assistant. However, the focus of these newlinesystems has been largely angled towards English, leaving a considerable portion of non-English newlinespeakers underserved. This is particularly evident in India, a linguistically diverse country with newlinemany languages classified as low-resource in the context of ASR due to the scarcity of annotated newlinespeech data. This thesis aims to bridge this gap, focusing on enhancing ASR systems for Indian newlinelanguages using deep learning methodologies. India is a land of language diversity. There are newlineapproximately 2000 languages spoken around, and among those officially registered are 23. Of newlinethose, very few have ASR capability. This is because building an ASR system requires thousands of hours of annotated speech data, a vast amount of text, and a lexicon that can span all newlinethe words in the languages. The necessity for a comprehensive presence in the diverse Indian newlinemarkets demands the development of multilingual Automatic Speech Recognition (ASR) systems. It s a common scenario where ASR systems for Indian languages have to be implemented newlinein low-resourced contexts. Furthermore, the complexity of the linguistic landscape is amplified due to the high prevalence of bilingualism in the Indian population, leading to frequent newlineinstances of code-switching and linguistic borrowing between languages. Operating concurrent newlineASR systems that can handle code-switching in the Indian context presents a considerable challenge. This predicament has spurred our research endeavors, driving us to focus on constructing newlinea large corpus for one language and leveraging its phonetic space on other language families in newlinemonolingual and multilingual ASR scenarios. newlineThis thesis incorporates a crowd-sourcing strategy to collect an extensive speech corpus, particularly for Telugu. Using this approach, around 2000 hours of Telugu speech data, capturing newlineregional variations through three mo
Pagination: 142
URI: http://hdl.handle.net/10603/535770
Appears in Departments:Department of Electronic and Communication Engineering

Files in This Item:
File Description SizeFormat 
01_title.pdfAttached File45.4 kBAdobe PDFView/Open
02_prelim_pages.pdf183.42 kBAdobe PDFView/Open
03_content.pdf39.61 kBAdobe PDFView/Open
04_abstract.pdf38.64 kBAdobe PDFView/Open
05_chapter_1.pdf103.97 kBAdobe PDFView/Open
06_chapter_2.pdf311.7 kBAdobe PDFView/Open
07_chapter_3.pdf1.56 MBAdobe PDFView/Open
08_chapter_4.pdf165.25 kBAdobe PDFView/Open
09_chapter_5.pdf402.07 kBAdobe PDFView/Open
10_chapter_6.pdf131.74 kBAdobe PDFView/Open
11_chapter_7.pdf308.37 kBAdobe PDFView/Open
12_chapter_8.pdf82.52 kBAdobe PDFView/Open
13_annexures.pdf82.01 kBAdobe PDFView/Open
80_recommendation.pdf95.28 kBAdobe PDFView/Open
Show full item record


Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge: