Multilingual Text to Speech Synthesis using Sequence to Sequence Neural Networks

Sivanand, Achanta

Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/540614

Title:	Multilingual Text to Speech Synthesis using Sequence to Sequence Neural Networks
Researcher:	Sivanand, Achanta
Guide(s):	Kishore, Prahallad
Keywords:	Engineering Engineering and Technology Engineering Electrical and Electronic
University:	International Institute of Information Technology, Hyderabad
Completed Date:	2018
Abstract:	Keywords: Statistical Parametric Speech Synthesis, Recurrent Neural Networks, Polyglot Synthesis, newlineMultilingual Synthesis, Sequence-to-Sequence Learning, End-to-End Synthesis. newlineText-to-speech (TTS) synthesis is typically carried out in two ways: (1) By concatenating waveform newlinesegments of units (often dubbed unit selection synthesis (USS)) and (2) By predicting speech parameters newlinefrom text using statistical models (also called statistical parametric speech synthesis systems (SPSS)). newlineMost commercial TTS systems use USS approach as it produces highly natural speech. However, newlinethe USS approach requires the recorded waveforms to be stored which demands memory, but the newlinestatistical approach alleviates this by modeling the speech compactly in a parametric form. Also, using newlinethe waveform directly offers little scope to alter the characteristics to produce different varieties like newlinespeakers, genders, voice-qualities, languages, etc. On the other hand, the parameters of a statistical newlinemodel can be suitably transformed to produce the desired variations. newlineThe above advantages (compactness and flexibility) come at the cost of the speech sounding slightly newlinerobotic than the unit-selection counter-part. A typical SPSS system has several components namely text newlinefeature extraction, speech parameter extraction, aligning text and speech features, a text feature-to-speech newlineparameter regression model and a duration prediction model. Each of these components are independently newlinehand-engineered making the SPSS system susceptible to errors in any one of them. The loss in naturalness newlineof SPSS output has been majorly attributed to the limitations of the regression model (also dubbed newlineacoustic model) to capture the complexity of mapping from text features to speech parameters and the newlinerepresentations used for text and speech data. In addition, the use of separate alignment model leads to newlineerroneous averaging in acoustic modeling. newlineIn this thesis, we address the issues of acoustic modeling, textual representation, acoustic representation, newlinemultilingual multispea
Pagination:	109
URI:	http://hdl.handle.net/10603/540614
Appears in Departments:	Department of Electronic and Communication Engineering

Files in This Item:

File	Description	Size	Format
80_recommendation.pdf	Attached File	88.6 kB	Adobe PDF	View/Open
abstract.pdf		69.38 kB	Adobe PDF	View/Open
annexures.pdf		98.14 kB	Adobe PDF	View/Open
chapter 1.pdf		946.84 kB	Adobe PDF	View/Open
chapter 2.pdf		382.97 kB	Adobe PDF	View/Open
chapter 3.pdf		523.37 kB	Adobe PDF	View/Open
chapter 4.pdf		1.56 MB	Adobe PDF	View/Open
chapter 5.pdf		186.55 kB	Adobe PDF	View/Open
chapter 6.pdf		373.61 kB	Adobe PDF	View/Open
chapter 7.pdf		194.91 kB	Adobe PDF	View/Open
chapter 8.pdf		71.77 kB	Adobe PDF	View/Open
content.pdf		77.46 kB	Adobe PDF	View/Open
preliminary pages.pdf		123.44 kB	Adobe PDF	View/Open
title page.pdf		73.73 kB	Adobe PDF	View/Open

Show full item record

Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge:

Shodhganga : a reservoir of Indian theses @ INFLIBNET