Analysis and Implementation of Hybrid Learning Models for Vocal Emotion Conversion

Susmitha Vekkot

Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/354556

Title:	Analysis and Implementation of Hybrid Learning Models for Vocal Emotion Conversion
Researcher:	Susmitha Vekkot
Guide(s):	Deepa Gupta
Keywords:	Engineering and Technology Engineering Electrical and Electronic;Electronics and Communication Engineering; Natural Language Processing; Neutral speech; voice conversion; Natural Language Processing; Neutral speech; voice conversion; Language Processing; digital signal processing (DSP); human-computer interaction (HCI); speech-to-speech; S2S
University:	Amrita Vishwa Vidyapeetham University
Completed Date:	2020
Abstract:	This thesis focuses on investigating, analyzing, designing, and implementing effective feature mapping strategies for neutral to emotional voice conversion. Majority of existing methods in literature synthesize one or more aspects of emotion from singlespeaker/ single-language training sets, but scalability is a significant issue in multispeaker emotion conversion. The widely varying nuances of expression from multiple newlinespeakers and cross-cultural backgrounds need to be incorporated for building speaker independent models. Non-availability of a large amount of emotional training data with parallel text also challenges the expansion of emotion conversion systems to multiple languages. The thesis aims to solve the above problems in generating emotional speech by employing intelligent soft computing techniques and signal processing aspects. The investigation of various methods for mapping emotionally relevant features is performed and validated in multiple languages. Here the spectral features we used are mel cepstral coefficients with different training methodologies for parallel and non-parallel data scenarios. The various modules shaping the characteristics of speech emotion and their useful integrations are focused, which led to the following contributions. The most significant feature characterizing emotion in any language is the prosody (fundamental frequency (F0), duration and intensity) or speech rhythm. As the first step in emotion conversion system design, a feasibility study for the prosodic modification in five selected languages for a multilingual scenario are explored, designated as pilot study in the thesis. A dynamic prosody modification algorithm is developed and tested in different languages varying in linguistic similarity. Performance evaluation is conducted using standard metrics like root mean square error (RMSE) and correlation. Although their acoustic and perceptual features differ as a function of linguistic similarity, some parameters fundamental for emotional expression occurred ...
Pagination:	xxv, 195
URI:	http://hdl.handle.net/10603/354556
Appears in Departments:	Department of Electronics & Communication Engineering (Amrita School of Engineering)

Files in This Item:

File	Description	Size	Format
01_title.pdf	Attached File	135.96 kB	Adobe PDF	View/Open
02_certificate.pdf		951.11 kB	Adobe PDF	View/Open
03_preliminary pages.pdf		272.43 kB	Adobe PDF	View/Open
04_chapter 1.pdf		116.44 kB	Adobe PDF	View/Open
05_chapter 2.pdf		424.7 kB	Adobe PDF	View/Open
06_chapter 3.pdf		1.14 MB	Adobe PDF	View/Open
07_chapter 4.pdf		637.41 kB	Adobe PDF	View/Open
08_chapter 5.pdf		1.66 MB	Adobe PDF	View/Open
09_chapter 6.pdf		5.84 MB	Adobe PDF	View/Open
10_chapter 7.pdf		3.46 MB	Adobe PDF	View/Open
11_chapter 8.pdf		210.18 kB	Adobe PDF	View/Open
12_bibliography.pdf		133.44 kB	Adobe PDF	View/Open
13_publications.pdf		120.41 kB	Adobe PDF	View/Open
80_recommendation.pdf		345.69 kB	Adobe PDF	View/Open

Show full item record

Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge:

Shodhganga : a reservoir of Indian theses @ INFLIBNET