Please use this identifier to cite or link to this item:
http://hdl.handle.net/10603/187098
Title: | Design of QbE STD system audio representation and matching perspective |
Researcher: | Madhavi, Maulik C. |
Guide(s): | Hemant A. Patil |
Keywords: | Spoken Content Retrieval Systems Keyword Spotting System GMM Framework Vocal Tract Length Normalization Gaussian Mixture Model Detection Subsystem |
University: | Dhirubhai Ambani Institute of Information and Communication Technology (DA-IICT) |
Completed Date: | |
Abstract: | newline quotThe retrieval of the spoken document and detecting the query (keyword) within the audio document have attained huge research interest. The problem of retrieving audio documents and detecting the query (keyword) using a spoken form of a query is widely known as Query-by-Example Spoken Term Detection (QbE-STD).This thesis presents the design of QbE-STD system from the representation and matching perspective. newline newlineA speech spectrum is known to be affected by the variations in the length of the vocal tract of a speaker due to the inverse relation between formants and vocal tract length. The process of compensating spectral variation caused due to the length of the vocal tract is popularly known as Vocal Tract Length Normalization (VTLN) (especially, in speech recognition literature). VTLN is a very important speaker normalization technique for speech recognition task. In this context, this thesis proposes the use of Gaussian posteriorgram of VTL-warped spectral features newlinefor a QbE-STD task. This study presents the novel use of a Gaussian Mixture Model (GMM) framework for VTLN warping factor estimation. In particular, presentedGMMframework does not require phoneme-level transcription and hence, it can be useful for the unsupervised task. In addition, we also propose the use of the mixture of GMMs for posteriorgram design. The speech data governs acoustically similar broad phonetic structures. To capture broad phonetic structure, we exploit supplementary knowledge of broad phoneme classes (such as, vowels, semi-vowels, nasals, fricatives, plosive) for the training of GMM. The mixture of GMMs is tied with GMMs of these broad phoneme classes. AGMMtrained under no supervision assumes uniform priors to each Gaussian component, whereas a mixture of GMMs assigns the prior probability based on broad phoneme class. newlineThe novelty of our work lies in prior probability assignments (as weights of the mixture of GMMs) for better Gaussian posteriorgram design. |
Pagination: | xxiv, 199 p. |
URI: | http://hdl.handle.net/10603/187098 |
Appears in Departments: | Department of Information and Communication Technology |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
01_title.pdf | Attached File | 85.66 kB | Adobe PDF | View/Open |
02_declaration and certificate.pdf | 235.4 kB | Adobe PDF | View/Open | |
03_acknowledgements.pdf | 84.7 kB | Adobe PDF | View/Open | |
04_contents.pdf | 122.38 kB | Adobe PDF | View/Open | |
05_abstract.pdf | 105.61 kB | Adobe PDF | View/Open | |
06_list of symbol and accronyms.pdf | 123.79 kB | Adobe PDF | View/Open | |
07_list of tables.pdf | 93.5 kB | Adobe PDF | View/Open | |
08_list of figures.pdf | 112.56 kB | Adobe PDF | View/Open | |
09_chapter 1.pdf | 311.04 kB | Adobe PDF | View/Open | |
10_chapter 2.pdf | 1.28 MB | Adobe PDF | View/Open | |
11_chapter 3.pdf | 413.32 kB | Adobe PDF | View/Open | |
12_chapter 4.pdf | 1.22 MB | Adobe PDF | View/Open | |
13_chapter 5.pdf | 591.08 kB | Adobe PDF | View/Open | |
14_chapter 6.pdf | 336.01 kB | Adobe PDF | View/Open | |
15_chapter 7.pdf | 846.36 kB | Adobe PDF | View/Open | |
16_reference.pdf | 176.45 kB | Adobe PDF | View/Open |
Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).
Altmetric Badge: