Design of QbE STD system audio representation and matching perspective

Madhavi, Maulik C.

Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/187098

Title:	Design of QbE STD system audio representation and matching perspective
Researcher:	Madhavi, Maulik C.
Guide(s):	Hemant A. Patil
Keywords:	Spoken Content Retrieval Systems Keyword Spotting System GMM Framework Vocal Tract Length Normalization Gaussian Mixture Model Detection Subsystem
University:	Dhirubhai Ambani Institute of Information and Communication Technology (DA-IICT)
Completed Date:
Abstract:	newline quotThe retrieval of the spoken document and detecting the query (keyword) within the audio document have attained huge research interest. The problem of retrieving audio documents and detecting the query (keyword) using a spoken form of a query is widely known as Query-by-Example Spoken Term Detection (QbE-STD).This thesis presents the design of QbE-STD system from the representation and matching perspective. newline newlineA speech spectrum is known to be affected by the variations in the length of the vocal tract of a speaker due to the inverse relation between formants and vocal tract length. The process of compensating spectral variation caused due to the length of the vocal tract is popularly known as Vocal Tract Length Normalization (VTLN) (especially, in speech recognition literature). VTLN is a very important speaker normalization technique for speech recognition task. In this context, this thesis proposes the use of Gaussian posteriorgram of VTL-warped spectral features newlinefor a QbE-STD task. This study presents the novel use of a Gaussian Mixture Model (GMM) framework for VTLN warping factor estimation. In particular, presentedGMMframework does not require phoneme-level transcription and hence, it can be useful for the unsupervised task. In addition, we also propose the use of the mixture of GMMs for posteriorgram design. The speech data governs acoustically similar broad phonetic structures. To capture broad phonetic structure, we exploit supplementary knowledge of broad phoneme classes (such as, vowels, semi-vowels, nasals, fricatives, plosive) for the training of GMM. The mixture of GMMs is tied with GMMs of these broad phoneme classes. AGMMtrained under no supervision assumes uniform priors to each Gaussian component, whereas a mixture of GMMs assigns the prior probability based on broad phoneme class. newlineThe novelty of our work lies in prior probability assignments (as weights of the mixture of GMMs) for better Gaussian posteriorgram design.
Pagination:	xxiv, 199 p.
URI:	http://hdl.handle.net/10603/187098
Appears in Departments:	Department of Information and Communication Technology

Files in This Item:

File	Description	Size	Format
01_title.pdf	Attached File	85.66 kB	Adobe PDF	View/Open
02_declaration and certificate.pdf		235.4 kB	Adobe PDF	View/Open
03_acknowledgements.pdf		84.7 kB	Adobe PDF	View/Open
04_contents.pdf		122.38 kB	Adobe PDF	View/Open
05_abstract.pdf		105.61 kB	Adobe PDF	View/Open
06_list of symbol and accronyms.pdf		123.79 kB	Adobe PDF	View/Open
07_list of tables.pdf		93.5 kB	Adobe PDF	View/Open
08_list of figures.pdf		112.56 kB	Adobe PDF	View/Open
09_chapter 1.pdf		311.04 kB	Adobe PDF	View/Open
10_chapter 2.pdf		1.28 MB	Adobe PDF	View/Open
11_chapter 3.pdf		413.32 kB	Adobe PDF	View/Open
12_chapter 4.pdf		1.22 MB	Adobe PDF	View/Open
13_chapter 5.pdf		591.08 kB	Adobe PDF	View/Open
14_chapter 6.pdf		336.01 kB	Adobe PDF	View/Open
15_chapter 7.pdf		846.36 kB	Adobe PDF	View/Open
16_reference.pdf		176.45 kB	Adobe PDF	View/Open

Show full item record

Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge:

Shodhganga : a reservoir of Indian theses @ INFLIBNET