Query by Example Spoken Term Detection on Low Resource Languages

Gautam, Mantena

Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/535775

Title:	Query by Example Spoken Term Detection on Low Resource Languages
Researcher:	Gautam, Mantena
Guide(s):	Kishore, Prahallad
Keywords:	Computer Science Computer Science Software Engineering Engineering and Technology
University:	International Institute of Information Technology, Hyderabad
Completed Date:	2015
Abstract:	The task of a query-by-example spoken term detection (QbE-STD) is to find a spoken query within newlinea spoken audio database. A key aspect of QbE-STD is to enable searching in multi-lingual and newlinemulti-speaker audio data. A traditional QbE-STD approach is to convert spoken audio into a sequence of symbols using automatic speech recognition (ASR) and then perform text based search. newlineASR-based techniques assume the availability of labelled data for training the acoustic and language models. Such approaches are not scalable for languages where there is no availability or the newlineresources to build an ASR. To overcome this limitation, zero prior knowledge is assumed about newlinethe language of the spoken audio, and thus template matching algorithms such as dynamic time newlinewarping (DTW) are exploited for QbE-STD. For QbE-STD, Gaussian posteriorgrams are a popular feature representation as they do not require labelled data. However, Gaussian posteriorgrams newlinecan only represent a limited acoustic information of the spoken audio and thus a limitation. An newlinealternative feature representation to Gaussian posteriorgrams is phone posteriorgrams. To obtain newlinephone posteriorgrams we require labelled data from rich resource languages to train the models. newlineHowever, phone classes are not universal and thus do not perform well when there is a language newlinemis-match. newlineIn this thesis, we address the issues of developing a DTW-based algorithm for QbE-STD search newlineon low resource languages and deriving language independent features using rich resource languages for feature representation of speech. The contributions of this thesis are as follows: newline We investigate the use of a DTW-based algorithm referred to as non-segmental DTW (NSDTW), with a computational upper bound of O(mn) and analyze the performance of QbESTD with Gaussian posteriorgrams obtained from spectral and temporal features of the newlinespeech signal We introduce another variant of NS-DTW called fast NS-DTW (FNS-DTW) which uses newlinereduced feature vectors for search. With a reduction factor of and#945; and#8712; N, we show that t
Pagination:	120
URI:	http://hdl.handle.net/10603/535775
Appears in Departments:	Computer Science and Engineering

Files in This Item:

File	Description	Size	Format
80_recommendation.pdf	Attached File	104.76 kB	Adobe PDF	View/Open
abstract.pdf		124.14 kB	Adobe PDF	View/Open
annexures.pdf		279.35 kB	Adobe PDF	View/Open
chapter 1.pdf		298.88 kB	Adobe PDF	View/Open
chapter 2.pdf		490.92 kB	Adobe PDF	View/Open
chapter 3.pdf		265.92 kB	Adobe PDF	View/Open
chapter 4.pdf		330.94 kB	Adobe PDF	View/Open
chapter 5.pdf		208.19 kB	Adobe PDF	View/Open
chapter 6.pdf		76.34 kB	Adobe PDF	View/Open
content.pdf		59.08 kB	Adobe PDF	View/Open
preliminary pages.pdf		244.09 kB	Adobe PDF	View/Open
title page.pdf		68.47 kB	Adobe PDF	View/Open

Show full item record

Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge:

Shodhganga : a reservoir of Indian theses @ INFLIBNET