Robust detection of vowels in speech signal with application to children s ASR

Kumar, Avinash

Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/423552

Title:	Robust detection of vowels in speech signal with application to children s ASR
Researcher:	Kumar, Avinash
Guide(s):	Pradhan, Gayadhar
Keywords:	Engineering Engineering and Technology Engineering Electrical and Electronic
University:	National Institute of Technology Patna
Completed Date:	2019
Abstract:	This thesis proposes acoustic modeling as well as signal processing approaches for robustly detecting vowels and corresponding onset points newline(VOPs) and offset points (VEPs) in a given speech signal. The VOP newlineand VEP are defined as the instant of starting and ending of a vowel, newlinerespectively. The knowledge of vowel and non-vowel regions is then newlineexploited for non-uniformly suppressing the pitch induced mismatch newlinein children s automatic speech recognition (ASR) system. newlineAt first, using mel-frequency cepstral coefficients (MFCCs) as the newlinefront-end features, three-class classifiers (vowels, non-vowels and silences) are developed using recently reported state-of-the-art acoustic newlinemodeling methods for the task of detecting vowels, VOPs and VEPs newlinein a given speech signal. Among the explored acoustic modeling techniques, best performance is observed for pre-trained deep neural networks (DNN). To further enhance the performance, a novel front-end newlinefeature exploiting the temporal and spectral characteristics of the excitation source information in speech signal is proposed. The use of newlinethe proposed feature results in the detection of vowel regions that are newlinequite different from those obtained through the MFCCs. Exploiting newlinethose differences in the obtained evidences taking two different kinds newlineof features, a technique to combine the evidences is also proposed. The newlinestatistical learning based approaches provides significantly improved newlineperformance when compared with the explicit signal processing methods reported in the literature. However, performance of statistical classifiers degrades significantly when speech signal is corrupted by newlineambient noises. newlineIn order to enhance the robustness towards ambient noises, a signal newlineprocessing approach based on non-local means (NLM) estimation is newlinethen proposed for the detection of vowels, VOPs and VEPs. In the newlineNLM algorithm, the signal value at each sample point is estimated newlineas the weighted sum of signal values at other sample points within a newlinesearch neighborhood.
Pagination:	xxxi, 155p.
URI:	http://hdl.handle.net/10603/423552
Appears in Departments:	Electronics and Communications Engineering

Files in This Item:

File	Description	Size	Format
01_title.pdf	Attached File	102.09 kB	Adobe PDF	View/Open
02_prelim pages.pdf		121.95 kB	Adobe PDF	View/Open
03_content.pdf		242.54 kB	Adobe PDF	View/Open
04_abstract.pdf		51.43 kB	Adobe PDF	View/Open
05_chapter 1.pdf		126.85 kB	Adobe PDF	View/Open
06_chapter 2.pdf		181.46 kB	Adobe PDF	View/Open
07_chapter 3.pdf		436.7 kB	Adobe PDF	View/Open
08_chapter 4.pdf		997.74 kB	Adobe PDF	View/Open
09_chapter 5.pdf		2.87 MB	Adobe PDF	View/Open
10_chapter 6.pdf		351.27 kB	Adobe PDF	View/Open
11_chapter 7.pdf		102.54 kB	Adobe PDF	View/Open
12_annexures.pdf		144.09 kB	Adobe PDF	View/Open
80_recommendation.pdf		158.35 kB	Adobe PDF	View/Open

Show full item record

Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge:

Shodhganga : a reservoir of Indian theses @ INFLIBNET