Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/423552
Title: Robust detection of vowels in speech signal with application to children s ASR
Researcher: Kumar, Avinash
Guide(s): Pradhan, Gayadhar
Keywords: Engineering
Engineering and Technology
Engineering Electrical and Electronic
University: National Institute of Technology Patna
Completed Date: 2019
Abstract: This thesis proposes acoustic modeling as well as signal processing approaches for robustly detecting vowels and corresponding onset points newline(VOPs) and offset points (VEPs) in a given speech signal. The VOP newlineand VEP are defined as the instant of starting and ending of a vowel, newlinerespectively. The knowledge of vowel and non-vowel regions is then newlineexploited for non-uniformly suppressing the pitch induced mismatch newlinein children s automatic speech recognition (ASR) system. newlineAt first, using mel-frequency cepstral coefficients (MFCCs) as the newlinefront-end features, three-class classifiers (vowels, non-vowels and silences) are developed using recently reported state-of-the-art acoustic newlinemodeling methods for the task of detecting vowels, VOPs and VEPs newlinein a given speech signal. Among the explored acoustic modeling techniques, best performance is observed for pre-trained deep neural networks (DNN). To further enhance the performance, a novel front-end newlinefeature exploiting the temporal and spectral characteristics of the excitation source information in speech signal is proposed. The use of newlinethe proposed feature results in the detection of vowel regions that are newlinequite different from those obtained through the MFCCs. Exploiting newlinethose differences in the obtained evidences taking two different kinds newlineof features, a technique to combine the evidences is also proposed. The newlinestatistical learning based approaches provides significantly improved newlineperformance when compared with the explicit signal processing methods reported in the literature. However, performance of statistical classifiers degrades significantly when speech signal is corrupted by newlineambient noises. newlineIn order to enhance the robustness towards ambient noises, a signal newlineprocessing approach based on non-local means (NLM) estimation is newlinethen proposed for the detection of vowels, VOPs and VEPs. In the newlineNLM algorithm, the signal value at each sample point is estimated newlineas the weighted sum of signal values at other sample points within a newlinesearch neighborhood.
Pagination: xxxi, 155p.
URI: http://hdl.handle.net/10603/423552
Appears in Departments:Electronics and Communications Engineering

Files in This Item:
File Description SizeFormat 
01_title.pdfAttached File102.09 kBAdobe PDFView/Open
02_prelim pages.pdf121.95 kBAdobe PDFView/Open
03_content.pdf242.54 kBAdobe PDFView/Open
04_abstract.pdf51.43 kBAdobe PDFView/Open
05_chapter 1.pdf126.85 kBAdobe PDFView/Open
06_chapter 2.pdf181.46 kBAdobe PDFView/Open
07_chapter 3.pdf436.7 kBAdobe PDFView/Open
08_chapter 4.pdf997.74 kBAdobe PDFView/Open
09_chapter 5.pdf2.87 MBAdobe PDFView/Open
10_chapter 6.pdf351.27 kBAdobe PDFView/Open
11_chapter 7.pdf102.54 kBAdobe PDFView/Open
12_annexures.pdf144.09 kBAdobe PDFView/Open
80_recommendation.pdf158.35 kBAdobe PDFView/Open
Show full item record


Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge: