Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/324398
Title: An Aggrandized Framework for Improving Large Vocabulary Continuous Speech Recognition LVCSR of Lecture Speech in Indian English
Researcher: Disha Kaur Phull
Guide(s): Bharadwaja Kumar, G
Keywords: Computer Science
Computer Science Interdisciplinary Applications
Engineering and Technology
University: VIT University
Completed Date: 2020
Abstract: Automatic Speech Recognition (ASR) is concerned about converting spoken utterances in audio signal into text. It has a wide variety of applications like voice user interfaces which support voice dialing, domotic appliance control and voice search. During the past few decades, drastic developments have been reported in ASR for many languages such as English, Finnish, German, etc. However, development of Indian English (IE) speech recognition models seems to be quite untended. IE is one of the varieties of English spoken in Indian subcontinent showing idiosyncrasy in terms of pronunciation, vocabulary, dialect and accent from English spoken in other parts of world. Also, there is dearth of resources in terms of transcribed speech data and spoken language corpora in IE. In spite of these challenges, an aggrandized framework has been proposed to improve newlinethe speech recognition accuracy in this thesis. In this work, (i) an Indian English newlineacoustic model has been developed that gives 35% lessWord Error Rate (WER) in comparison to existing English acoustic models such as HUB4; (ii) the acoustic-phonetic analysis of vowels and consonants has been carried out to understand the characteristic differences within Indian English varieties; (iii) an effective methodology has been proposed by using Wikipedia dump corpus along with Google search to interpolate and adapt the language models closer to the topic of the spoken lecture which has further reduced the Word Error Rate to 14% with a major decrease in the perplexity of language model; (iv) an effective retrieval method is proposed using Elasticsearch framework to retrieve documents using the key phrases identified from the ASR output to create domain-specific language model that reduces the search time by more than 90% in comparison to conventional search and retrieval mechanism. Finally, from the reported results, one can conclude that the framework proposed drastically reduces the perplexity as well as WER and improves the performance of speech recognition of Indian Englis
Pagination: i-ix, 1-111
URI: http://hdl.handle.net/10603/324398
Appears in Departments:School of Computing Science and Engineering -VIT-Chennai

Files in This Item:
File Description SizeFormat 
01_title page.pdfAttached File154.86 kBAdobe PDFView/Open
02_declaration & certificate.pdf277.72 kBAdobe PDFView/Open
03_abstract.pdf85.88 kBAdobe PDFView/Open
04_acknowledgement.pdf65.22 kBAdobe PDFView/Open
05_table of contents.pdf224.4 kBAdobe PDFView/Open
06_list of figures.pdf104.08 kBAdobe PDFView/Open
07_list of tables.pdf72.29 kBAdobe PDFView/Open
08_list of terms and abbreviations.pdf108.34 kBAdobe PDFView/Open
09_chapter_01.pdf922.87 kBAdobe PDFView/Open
10_chapter_02.pdf724.37 kBAdobe PDFView/Open
11_chapter_03.pdf1.49 MBAdobe PDFView/Open
12_chapter_04.pdf1.06 MBAdobe PDFView/Open
13_chapter_05.pdf1.26 MBAdobe PDFView/Open
14_chapter_06.pdf1.34 MBAdobe PDFView/Open
15_chapter_07.pdf346.64 kBAdobe PDFView/Open
16_references.pdf1.13 MBAdobe PDFView/Open
17_list of publications.pdf64.87 kBAdobe PDFView/Open
80_recommendation.pdf501.88 kBAdobe PDFView/Open
Show full item record


Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge: