Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/354556
Title: Analysis and Implementation of Hybrid Learning Models for Vocal Emotion Conversion
Researcher: Susmitha Vekkot
Guide(s): Deepa Gupta
Keywords: Engineering and Technology
Engineering Electrical and Electronic;Electronics and Communication Engineering; Natural Language Processing; Neutral speech; voice conversion; Natural Language Processing; Neutral speech; voice conversion; Language Processing; digital signal processing (DSP); human-computer interaction (HCI); speech-to-speech; S2S
University: Amrita Vishwa Vidyapeetham University
Completed Date: 2020
Abstract: This thesis focuses on investigating, analyzing, designing, and implementing effective feature mapping strategies for neutral to emotional voice conversion. Majority of existing methods in literature synthesize one or more aspects of emotion from singlespeaker/ single-language training sets, but scalability is a significant issue in multispeaker emotion conversion. The widely varying nuances of expression from multiple newlinespeakers and cross-cultural backgrounds need to be incorporated for building speaker independent models. Non-availability of a large amount of emotional training data with parallel text also challenges the expansion of emotion conversion systems to multiple languages. The thesis aims to solve the above problems in generating emotional speech by employing intelligent soft computing techniques and signal processing aspects. The investigation of various methods for mapping emotionally relevant features is performed and validated in multiple languages. Here the spectral features we used are mel cepstral coefficients with different training methodologies for parallel and non-parallel data scenarios. The various modules shaping the characteristics of speech emotion and their useful integrations are focused, which led to the following contributions. The most significant feature characterizing emotion in any language is the prosody (fundamental frequency (F0), duration and intensity) or speech rhythm. As the first step in emotion conversion system design, a feasibility study for the prosodic modification in five selected languages for a multilingual scenario are explored, designated as pilot study in the thesis. A dynamic prosody modification algorithm is developed and tested in different languages varying in linguistic similarity. Performance evaluation is conducted using standard metrics like root mean square error (RMSE) and correlation. Although their acoustic and perceptual features differ as a function of linguistic similarity, some parameters fundamental for emotional expression occurred ...
Pagination: xxv, 195
URI: http://hdl.handle.net/10603/354556
Appears in Departments:Department of Electronics & Communication Engineering (Amrita School of Engineering)

Files in This Item:
File Description SizeFormat 
01_title.pdfAttached File135.96 kBAdobe PDFView/Open
02_certificate.pdf951.11 kBAdobe PDFView/Open
03_preliminary pages.pdf272.43 kBAdobe PDFView/Open
04_chapter 1.pdf116.44 kBAdobe PDFView/Open
05_chapter 2.pdf424.7 kBAdobe PDFView/Open
06_chapter 3.pdf1.14 MBAdobe PDFView/Open
07_chapter 4.pdf637.41 kBAdobe PDFView/Open
08_chapter 5.pdf1.66 MBAdobe PDFView/Open
09_chapter 6.pdf5.84 MBAdobe PDFView/Open
10_chapter 7.pdf3.46 MBAdobe PDFView/Open
11_chapter 8.pdf210.18 kBAdobe PDFView/Open
12_bibliography.pdf133.44 kBAdobe PDFView/Open
13_publications.pdf120.41 kBAdobe PDFView/Open
80_recommendation.pdf345.69 kBAdobe PDFView/Open
Show full item record


Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge: