Please use this identifier to cite or link to this item:
http://hdl.handle.net/10603/540614
Title: | Multilingual Text to Speech Synthesis using Sequence to Sequence Neural Networks |
Researcher: | Sivanand, Achanta |
Guide(s): | Kishore, Prahallad |
Keywords: | Engineering Engineering and Technology Engineering Electrical and Electronic |
University: | International Institute of Information Technology, Hyderabad |
Completed Date: | 2018 |
Abstract: | Keywords: Statistical Parametric Speech Synthesis, Recurrent Neural Networks, Polyglot Synthesis, newlineMultilingual Synthesis, Sequence-to-Sequence Learning, End-to-End Synthesis. newlineText-to-speech (TTS) synthesis is typically carried out in two ways: (1) By concatenating waveform newlinesegments of units (often dubbed unit selection synthesis (USS)) and (2) By predicting speech parameters newlinefrom text using statistical models (also called statistical parametric speech synthesis systems (SPSS)). newlineMost commercial TTS systems use USS approach as it produces highly natural speech. However, newlinethe USS approach requires the recorded waveforms to be stored which demands memory, but the newlinestatistical approach alleviates this by modeling the speech compactly in a parametric form. Also, using newlinethe waveform directly offers little scope to alter the characteristics to produce different varieties like newlinespeakers, genders, voice-qualities, languages, etc. On the other hand, the parameters of a statistical newlinemodel can be suitably transformed to produce the desired variations. newlineThe above advantages (compactness and flexibility) come at the cost of the speech sounding slightly newlinerobotic than the unit-selection counter-part. A typical SPSS system has several components namely text newlinefeature extraction, speech parameter extraction, aligning text and speech features, a text feature-to-speech newlineparameter regression model and a duration prediction model. Each of these components are independently newlinehand-engineered making the SPSS system susceptible to errors in any one of them. The loss in naturalness newlineof SPSS output has been majorly attributed to the limitations of the regression model (also dubbed newlineacoustic model) to capture the complexity of mapping from text features to speech parameters and the newlinerepresentations used for text and speech data. In addition, the use of separate alignment model leads to newlineerroneous averaging in acoustic modeling. newlineIn this thesis, we address the issues of acoustic modeling, textual representation, acoustic representation, newlinemultilingual multispea |
Pagination: | 109 |
URI: | http://hdl.handle.net/10603/540614 |
Appears in Departments: | Department of Electronic and Communication Engineering |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
80_recommendation.pdf | Attached File | 88.6 kB | Adobe PDF | View/Open |
abstract.pdf | 69.38 kB | Adobe PDF | View/Open | |
annexures.pdf | 98.14 kB | Adobe PDF | View/Open | |
chapter 1.pdf | 946.84 kB | Adobe PDF | View/Open | |
chapter 2.pdf | 382.97 kB | Adobe PDF | View/Open | |
chapter 3.pdf | 523.37 kB | Adobe PDF | View/Open | |
chapter 4.pdf | 1.56 MB | Adobe PDF | View/Open | |
chapter 5.pdf | 186.55 kB | Adobe PDF | View/Open | |
chapter 6.pdf | 373.61 kB | Adobe PDF | View/Open | |
chapter 7.pdf | 194.91 kB | Adobe PDF | View/Open | |
chapter 8.pdf | 71.77 kB | Adobe PDF | View/Open | |
content.pdf | 77.46 kB | Adobe PDF | View/Open | |
preliminary pages.pdf | 123.44 kB | Adobe PDF | View/Open | |
title page.pdf | 73.73 kB | Adobe PDF | View/Open |
Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).
Altmetric Badge: