Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/535775
Title: Query by Example Spoken Term Detection on Low Resource Languages
Researcher: Gautam, Mantena
Guide(s): Kishore, Prahallad
Keywords: Computer Science
Computer Science Software Engineering
Engineering and Technology
University: International Institute of Information Technology, Hyderabad
Completed Date: 2015
Abstract: The task of a query-by-example spoken term detection (QbE-STD) is to find a spoken query within newlinea spoken audio database. A key aspect of QbE-STD is to enable searching in multi-lingual and newlinemulti-speaker audio data. A traditional QbE-STD approach is to convert spoken audio into a sequence of symbols using automatic speech recognition (ASR) and then perform text based search. newlineASR-based techniques assume the availability of labelled data for training the acoustic and language models. Such approaches are not scalable for languages where there is no availability or the newlineresources to build an ASR. To overcome this limitation, zero prior knowledge is assumed about newlinethe language of the spoken audio, and thus template matching algorithms such as dynamic time newlinewarping (DTW) are exploited for QbE-STD. For QbE-STD, Gaussian posteriorgrams are a popular feature representation as they do not require labelled data. However, Gaussian posteriorgrams newlinecan only represent a limited acoustic information of the spoken audio and thus a limitation. An newlinealternative feature representation to Gaussian posteriorgrams is phone posteriorgrams. To obtain newlinephone posteriorgrams we require labelled data from rich resource languages to train the models. newlineHowever, phone classes are not universal and thus do not perform well when there is a language newlinemis-match. newlineIn this thesis, we address the issues of developing a DTW-based algorithm for QbE-STD search newlineon low resource languages and deriving language independent features using rich resource languages for feature representation of speech. The contributions of this thesis are as follows: newline We investigate the use of a DTW-based algorithm referred to as non-segmental DTW (NSDTW), with a computational upper bound of O(mn) and analyze the performance of QbESTD with Gaussian posteriorgrams obtained from spectral and temporal features of the newlinespeech signal We introduce another variant of NS-DTW called fast NS-DTW (FNS-DTW) which uses newlinereduced feature vectors for search. With a reduction factor of and#945; and#8712; N, we show that t
Pagination: 120
URI: http://hdl.handle.net/10603/535775
Appears in Departments:Computer Science and Engineering

Files in This Item:
File Description SizeFormat 
80_recommendation.pdfAttached File104.76 kBAdobe PDFView/Open
abstract.pdf124.14 kBAdobe PDFView/Open
annexures.pdf279.35 kBAdobe PDFView/Open
chapter 1.pdf298.88 kBAdobe PDFView/Open
chapter 2.pdf490.92 kBAdobe PDFView/Open
chapter 3.pdf265.92 kBAdobe PDFView/Open
chapter 4.pdf330.94 kBAdobe PDFView/Open
chapter 5.pdf208.19 kBAdobe PDFView/Open
chapter 6.pdf76.34 kBAdobe PDFView/Open
content.pdf59.08 kBAdobe PDFView/Open
preliminary pages.pdf244.09 kBAdobe PDFView/Open
title page.pdf68.47 kBAdobe PDFView/Open
Show full item record


Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge: