Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/299275
Title: Generative model driven representation learning with discriminative classifier for environmental audio scene and sound event recognition
Researcher: Jayalakshmi S L
Guide(s): Chandrakala S
Keywords: Engineering and Technology
Engineering
Engineering Electrical and Electronic
audio scene
sound event recognition
University: Anna University
Completed Date: 2019
Abstract: The analysis of sound information is very helpful in multimedia information retrieval, audio surveillance, audio tagging, and forensic applications. Environmental Audio Scene Recognition (EASR) and Sound Event Recognition (SER) are the principle tasks that are related to audio surveillance systems. Environmental Audio Scene Recognition refers to the process of recognizing the context or environment of an audio stream, with applications in devices requiring contextual awareness. Sound Event Recognition aims to recognize the occurrence of a monophonic event in a specific environment. Environmental Audio Scene Recognition and Sound Event Recognition are challenging tasks due to the presence of multiple sound sources, background noises and overlapping or polyphonic contexts. In the environmental audio scene recognition task, the typical duration of an environmental audio scene is long, and it is in the range of few seconds to few tens of seconds. Different scenes of the same class may take different durations. An important issue is that when the data of different classes are more confusable, generative model-based classifiers such as Gaussian Mixture Model (GMM) are not appropriate since a model is built for each class using the samples of that class only. This contributes to the idea of a discriminative model-based approach that recognize the examples of environmental audio scenes. Discriminative model based classifiers such as support vector machines (SVMs) focus on modeling the decision boundaries between classes. Another issue is that SVM can handle fixed dimensional data only. In this thesis, these issues are addressed by proposing a hybrid framework that learns model-driven representations for environmental audio scenes and sound events with the help of generative models newline
Pagination: xxi, 126p.
URI: http://hdl.handle.net/10603/299275
Appears in Departments:Faculty of Information and Communication Engineering

Files in This Item:
File Description SizeFormat 
01_title.pdfAttached File40.38 kBAdobe PDFView/Open
02_certificates.pdf359.12 kBAdobe PDFView/Open
03_abstracts.pdf44.65 kBAdobe PDFView/Open
04_acknowledgements.pdf41.91 kBAdobe PDFView/Open
05_contents.pdf30.27 kBAdobe PDFView/Open
06_listofabbreviations.pdf49.83 kBAdobe PDFView/Open
07_chapter1.pdf149.79 kBAdobe PDFView/Open
08_chapter2.pdf166.41 kBAdobe PDFView/Open
09_chapter3.pdf120.78 kBAdobe PDFView/Open
10_chapter4.pdf267.18 kBAdobe PDFView/Open
11_chapter5.pdf468.4 kBAdobe PDFView/Open
12_chapter6.pdf441.53 kBAdobe PDFView/Open
13_conclusion.pdf61.47 kBAdobe PDFView/Open
14_references.pdf100.71 kBAdobe PDFView/Open
15_listofpublications.pdf58 kBAdobe PDFView/Open
80_recommendation.pdf94.03 kBAdobe PDFView/Open
Show full item record


Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge: