Please use this identifier to cite or link to this item:
http://hdl.handle.net/10603/535775
Title: | Query by Example Spoken Term Detection on Low Resource Languages |
Researcher: | Gautam, Mantena |
Guide(s): | Kishore, Prahallad |
Keywords: | Computer Science Computer Science Software Engineering Engineering and Technology |
University: | International Institute of Information Technology, Hyderabad |
Completed Date: | 2015 |
Abstract: | The task of a query-by-example spoken term detection (QbE-STD) is to find a spoken query within newlinea spoken audio database. A key aspect of QbE-STD is to enable searching in multi-lingual and newlinemulti-speaker audio data. A traditional QbE-STD approach is to convert spoken audio into a sequence of symbols using automatic speech recognition (ASR) and then perform text based search. newlineASR-based techniques assume the availability of labelled data for training the acoustic and language models. Such approaches are not scalable for languages where there is no availability or the newlineresources to build an ASR. To overcome this limitation, zero prior knowledge is assumed about newlinethe language of the spoken audio, and thus template matching algorithms such as dynamic time newlinewarping (DTW) are exploited for QbE-STD. For QbE-STD, Gaussian posteriorgrams are a popular feature representation as they do not require labelled data. However, Gaussian posteriorgrams newlinecan only represent a limited acoustic information of the spoken audio and thus a limitation. An newlinealternative feature representation to Gaussian posteriorgrams is phone posteriorgrams. To obtain newlinephone posteriorgrams we require labelled data from rich resource languages to train the models. newlineHowever, phone classes are not universal and thus do not perform well when there is a language newlinemis-match. newlineIn this thesis, we address the issues of developing a DTW-based algorithm for QbE-STD search newlineon low resource languages and deriving language independent features using rich resource languages for feature representation of speech. The contributions of this thesis are as follows: newline We investigate the use of a DTW-based algorithm referred to as non-segmental DTW (NSDTW), with a computational upper bound of O(mn) and analyze the performance of QbESTD with Gaussian posteriorgrams obtained from spectral and temporal features of the newlinespeech signal We introduce another variant of NS-DTW called fast NS-DTW (FNS-DTW) which uses newlinereduced feature vectors for search. With a reduction factor of and#945; and#8712; N, we show that t |
Pagination: | 120 |
URI: | http://hdl.handle.net/10603/535775 |
Appears in Departments: | Computer Science and Engineering |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
80_recommendation.pdf | Attached File | 104.76 kB | Adobe PDF | View/Open |
abstract.pdf | 124.14 kB | Adobe PDF | View/Open | |
annexures.pdf | 279.35 kB | Adobe PDF | View/Open | |
chapter 1.pdf | 298.88 kB | Adobe PDF | View/Open | |
chapter 2.pdf | 490.92 kB | Adobe PDF | View/Open | |
chapter 3.pdf | 265.92 kB | Adobe PDF | View/Open | |
chapter 4.pdf | 330.94 kB | Adobe PDF | View/Open | |
chapter 5.pdf | 208.19 kB | Adobe PDF | View/Open | |
chapter 6.pdf | 76.34 kB | Adobe PDF | View/Open | |
content.pdf | 59.08 kB | Adobe PDF | View/Open | |
preliminary pages.pdf | 244.09 kB | Adobe PDF | View/Open | |
title page.pdf | 68.47 kB | Adobe PDF | View/Open |
Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).
Altmetric Badge: