Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/472964
Title: Multistage Spoken Term Detection for Searching Speech Databases
Researcher: Deekshitha G
Guide(s): Leena Mary
Keywords: Engineering
Engineering and Technology
Engineering Electrical and Electronic
University: APJ Abdul Kalam Technological University, Thiruvananthapuram
Completed Date: 2021
Abstract: The availability of high-speed, low-cost Internet enables the use of multimedia files for a variety of applications in our day-to-day activities. A considerable volume of audio archives is available on different websites. Such vast audio resources are useful only newlineif the needed file can be retrieved accurately and efficiently. Audio search refers to the search and retrieval of a particular audio file from a large audio database. Since most of newlinethe audio archives are not well indexed or labelled, it is still a challenging task. newlineSpoken Term Detection (STD) refers to the process of locating the occurrences of newlinespoken queries in a large speech database. Generally, two methods have been adopted newlinefor STD: an Automatic Speech Recognition (ASR) based label sequence matching newlineor feature-based template matching. ASR-based techniques utilize phoneme models newlineof a language, which require a considerable amount of labelled training data in the newlineselected language. Hence such techniques are considered as language-dependent, and newlineit is not feasible to develop ASR for each language. The feature-based template newlinematching techniques address this task in a language-independent manner, but they are newlinecomputationally complex. This work combines the positive aspects of both the methods newlineby introducing a multistage architecture to address the task of STD for low-resourced newlinelanguages. Two different approaches have been proposed for Language-Dependent (LD) newlineand Language-Independent (LI) STD. newlineFor Language-Dependent STD (LD-STD), a Phoneme Recognizer (PR) trained for newlinea particular language is used to convert the input speech into a corresponding phoneme newlinesequence. After converting the input speech to corresponding text sequences, two stages newlinenamed coarse and fine search stages help to spot the query locations. In coarse search, newlinelabel level matching is performed using a sequence matching technique for shortlisting newlinethe probable query locations. A customized local alignment technique is proposed for newlinesequence matching.
Pagination: 
URI: http://hdl.handle.net/10603/472964
Appears in Departments:Rajiv Gandhi Institute of Technology

Files in This Item:
File Description SizeFormat 
01_title.pdfAttached File57.77 kBAdobe PDFView/Open
02_preliminary pages.pdf160.96 kBAdobe PDFView/Open
03_contents.pdf54.39 kBAdobe PDFView/Open
04_abstract.pdf47.97 kBAdobe PDFView/Open
05_chapter 1.pdf589.86 kBAdobe PDFView/Open
06_chapter 2.pdf452.27 kBAdobe PDFView/Open
07_chapter 3.pdf530.99 kBAdobe PDFView/Open
08_chapter 4.pdf2.03 MBAdobe PDFView/Open
09_chapter 5.pdf332.23 kBAdobe PDFView/Open
80_recommendation.pdf63.85 kBAdobe PDFView/Open
Show full item record


Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge: