Please use this identifier to cite or link to this item:
http://hdl.handle.net/10603/354556
Title: | Analysis and Implementation of Hybrid Learning Models for Vocal Emotion Conversion |
Researcher: | Susmitha Vekkot |
Guide(s): | Deepa Gupta |
Keywords: | Engineering and Technology Engineering Electrical and Electronic;Electronics and Communication Engineering; Natural Language Processing; Neutral speech; voice conversion; Natural Language Processing; Neutral speech; voice conversion; Language Processing; digital signal processing (DSP); human-computer interaction (HCI); speech-to-speech; S2S |
University: | Amrita Vishwa Vidyapeetham University |
Completed Date: | 2020 |
Abstract: | This thesis focuses on investigating, analyzing, designing, and implementing effective feature mapping strategies for neutral to emotional voice conversion. Majority of existing methods in literature synthesize one or more aspects of emotion from singlespeaker/ single-language training sets, but scalability is a significant issue in multispeaker emotion conversion. The widely varying nuances of expression from multiple newlinespeakers and cross-cultural backgrounds need to be incorporated for building speaker independent models. Non-availability of a large amount of emotional training data with parallel text also challenges the expansion of emotion conversion systems to multiple languages. The thesis aims to solve the above problems in generating emotional speech by employing intelligent soft computing techniques and signal processing aspects. The investigation of various methods for mapping emotionally relevant features is performed and validated in multiple languages. Here the spectral features we used are mel cepstral coefficients with different training methodologies for parallel and non-parallel data scenarios. The various modules shaping the characteristics of speech emotion and their useful integrations are focused, which led to the following contributions. The most significant feature characterizing emotion in any language is the prosody (fundamental frequency (F0), duration and intensity) or speech rhythm. As the first step in emotion conversion system design, a feasibility study for the prosodic modification in five selected languages for a multilingual scenario are explored, designated as pilot study in the thesis. A dynamic prosody modification algorithm is developed and tested in different languages varying in linguistic similarity. Performance evaluation is conducted using standard metrics like root mean square error (RMSE) and correlation. Although their acoustic and perceptual features differ as a function of linguistic similarity, some parameters fundamental for emotional expression occurred ... |
Pagination: | xxv, 195 |
URI: | http://hdl.handle.net/10603/354556 |
Appears in Departments: | Department of Electronics & Communication Engineering (Amrita School of Engineering) |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
01_title.pdf | Attached File | 135.96 kB | Adobe PDF | View/Open |
02_certificate.pdf | 951.11 kB | Adobe PDF | View/Open | |
03_preliminary pages.pdf | 272.43 kB | Adobe PDF | View/Open | |
04_chapter 1.pdf | 116.44 kB | Adobe PDF | View/Open | |
05_chapter 2.pdf | 424.7 kB | Adobe PDF | View/Open | |
06_chapter 3.pdf | 1.14 MB | Adobe PDF | View/Open | |
07_chapter 4.pdf | 637.41 kB | Adobe PDF | View/Open | |
08_chapter 5.pdf | 1.66 MB | Adobe PDF | View/Open | |
09_chapter 6.pdf | 5.84 MB | Adobe PDF | View/Open | |
10_chapter 7.pdf | 3.46 MB | Adobe PDF | View/Open | |
11_chapter 8.pdf | 210.18 kB | Adobe PDF | View/Open | |
12_bibliography.pdf | 133.44 kB | Adobe PDF | View/Open | |
13_publications.pdf | 120.41 kB | Adobe PDF | View/Open | |
80_recommendation.pdf | 345.69 kB | Adobe PDF | View/Open |
Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).
Altmetric Badge: