Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/429864
Title: Spectrotemporal Processing of Speech Signals Using the Riesz Transform
Researcher: Dhiman, Jitendra Kumar
Guide(s): Seelamantula, Chandra Sekhar
Keywords: Engineering
Engineering and Technology
Engineering Electrical and Electronic
University: Indian Institute of Science Bangalore
Completed Date: 2021
Abstract: Speech signals possess a rich time-varying spectral content, which makes their analysis a challenging signal processing problem. Developing methods for accurate speech analysis has a direct impact on applications such as speech synthesis, speaker recognition, speech recognition, voice morphing, etc. A widely used tool to visualize the time-varying spectral content is the spectrogram, which represents the spectral content of the signal in the joint time-frequency plane. A spectrogram can be viewed as a collection of several localized spectrotemporal patches. By analyzing the structure of two-dimensional (2-D) patterns in the spectrogram, we propose modeling it using 2-D amplitude-modulated and frequency-modulated (AM-FM) sinusoids. The justification for the 2-D AM-FM model for speech can be provided based on the physical process behind its generation. From a speech production perspective, the AM and FM components correspond to the vocal-tract smooth envelope and excitation signal, respectively. We demonstrate that analyzing speech jointly in time and frequency reveals several important characteristics, which are otherwise not evident either in purely time-domain or frequency-domain analysis. The central problem in this dissertation is 2-D demodulation of a speech spectrogram, which yields 2-D AM and FM components. We advocate the use of the Riesz transform, which is a 2-D extension of the Hilbert transform, to demodulate narrowband and pitch adaptive spectrograms. Interestingly, the 2-D AM and FM components obtained as a result of demodulation have potential benefits for speech analysis. We demonstrate the impact of the proposed modeling technique for vocal tract filter estimation, voiced/unvoiced component separation, pitch tracking, speech synthesis, and periodic/aperiodic decomposition of speech signals. The accuracy of the estimated speech parameters is validated considering the task of speech reconstruction. The first part of the thesis is focused on theoretical developments related to 2-D modeling. We con...
URI: http://hdl.handle.net/10603/429864
Appears in Departments:Electrical Engineering

Files in This Item:
File Description SizeFormat 
01_title.pdfAttached File3.07 MBAdobe PDFView/Open
02_prelim pages.pdf1.12 MBAdobe PDFView/Open
03_table of contents.pdf279.36 kBAdobe PDFView/Open
04_abstract.pdf169.5 kBAdobe PDFView/Open
05_chapter 1.pdf5.01 MBAdobe PDFView/Open
06_chaper 2.pdf14.78 MBAdobe PDFView/Open
07_chapter 3.pdf12.05 MBAdobe PDFView/Open
08_chapter 4.pdf4.24 MBAdobe PDFView/Open
09_chapter 5.pdf3.17 MBAdobe PDFView/Open
11_annexure.pdf772.21 kBAdobe PDFView/Open
80_recommendation.pdf4.99 MBAdobe PDFView/Open
Show full item record


Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge: