Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/187098
Full metadata record
DC FieldValueLanguage
dc.coverage.spatial
dc.date.accessioned2018-01-08T12:12:00Z-
dc.date.available2018-01-08T12:12:00Z-
dc.identifier.urihttp://hdl.handle.net/10603/187098-
dc.description.abstractnewline quotThe retrieval of the spoken document and detecting the query (keyword) within the audio document have attained huge research interest. The problem of retrieving audio documents and detecting the query (keyword) using a spoken form of a query is widely known as Query-by-Example Spoken Term Detection (QbE-STD).This thesis presents the design of QbE-STD system from the representation and matching perspective. newline newlineA speech spectrum is known to be affected by the variations in the length of the vocal tract of a speaker due to the inverse relation between formants and vocal tract length. The process of compensating spectral variation caused due to the length of the vocal tract is popularly known as Vocal Tract Length Normalization (VTLN) (especially, in speech recognition literature). VTLN is a very important speaker normalization technique for speech recognition task. In this context, this thesis proposes the use of Gaussian posteriorgram of VTL-warped spectral features newlinefor a QbE-STD task. This study presents the novel use of a Gaussian Mixture Model (GMM) framework for VTLN warping factor estimation. In particular, presentedGMMframework does not require phoneme-level transcription and hence, it can be useful for the unsupervised task. In addition, we also propose the use of the mixture of GMMs for posteriorgram design. The speech data governs acoustically similar broad phonetic structures. To capture broad phonetic structure, we exploit supplementary knowledge of broad phoneme classes (such as, vowels, semi-vowels, nasals, fricatives, plosive) for the training of GMM. The mixture of GMMs is tied with GMMs of these broad phoneme classes. AGMMtrained under no supervision assumes uniform priors to each Gaussian component, whereas a mixture of GMMs assigns the prior probability based on broad phoneme class. newlineThe novelty of our work lies in prior probability assignments (as weights of the mixture of GMMs) for better Gaussian posteriorgram design.
dc.format.extentxxiv, 199 p.
dc.languageEnglish US
dc.relation
dc.rightsuniversity
dc.titleDesign of QbE STD system audio representation and matching perspective
dc.title.alternative
dc.creator.researcherMadhavi, Maulik C.
dc.subject.keywordSpoken Content Retrieval Systems
dc.subject.keywordKeyword Spotting System
dc.subject.keywordGMM Framework
dc.subject.keywordVocal Tract Length Normalization
dc.subject.keywordGaussian Mixture Model
dc.subject.keywordDetection Subsystem
dc.description.note
dc.contributor.guideHemant A. Patil
dc.publisher.placeGandhinagar
dc.publisher.universityDhirubhai Ambani Institute of Information and Communication Technology (DA-IICT)
dc.publisher.institutionDepartment of Information and Communication Technology
dc.date.registered2017
dc.date.completed
dc.date.awarded
dc.format.dimensions30 cm.
dc.format.accompanyingmaterialDVD
dc.source.universityUniversity
dc.type.degreePh.D.
Appears in Departments:Department of Information and Communication Technology

Files in This Item:
File Description SizeFormat 
01_title.pdfAttached File85.66 kBAdobe PDFView/Open
02_declaration and certificate.pdf235.4 kBAdobe PDFView/Open
03_acknowledgements.pdf84.7 kBAdobe PDFView/Open
04_contents.pdf122.38 kBAdobe PDFView/Open
05_abstract.pdf105.61 kBAdobe PDFView/Open
06_list of symbol and accronyms.pdf123.79 kBAdobe PDFView/Open
07_list of tables.pdf93.5 kBAdobe PDFView/Open
08_list of figures.pdf112.56 kBAdobe PDFView/Open
09_chapter 1.pdf311.04 kBAdobe PDFView/Open
10_chapter 2.pdf1.28 MBAdobe PDFView/Open
11_chapter 3.pdf413.32 kBAdobe PDFView/Open
12_chapter 4.pdf1.22 MBAdobe PDFView/Open
13_chapter 5.pdf591.08 kBAdobe PDFView/Open
14_chapter 6.pdf336.01 kBAdobe PDFView/Open
15_chapter 7.pdf846.36 kBAdobe PDFView/Open
16_reference.pdf176.45 kBAdobe PDFView/Open


Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge: