Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/528343
Title: Deep Learning Methods for Automatic Identification of Environmental Sounds and Acoustic Scenes
Researcher: Aswathy Madhu
Guide(s): Suresh K
Keywords: Engineering
Engineering and Technology
Engineering Electrical and Electronic
University: APJ Abdul Kalam Technological University, Thiruvananthapuram
Completed Date: 2023
Abstract: Computer audition has garnered attention from the audio and acoustic signal processing community during the past decade. This growing interest is due to its attractive audio surveillance and healthcare applications. Two fundamental problems in computer audition are automatic Environmental Sound Classification (ESC) and Acoustic Scene Classification (ASC). Despite promising application prospects, they are overshadowed by popular research areas like Automatic Speech Recognition and Music Information Recognition. It is due to the challenges posed by the environmental sounds and acoustic scenes in terms of their complex nature, lack of high-level structures usually observed in speech/music, and the large degree of intra-class and inter-class variabilities. Recently, deep learning approaches have been gaining popularity for both ESC and ASC. But the robustness of deep learning approaches depends mainly on the amount of available data and the audio signal representation. Moreover, the ASC designers have shifted their focus from improving accuracy to incorporating real-world considerations. newlineTherefore, the research work embodied in this thesis aims to develop robust deep learning frameworks to identify environmental sounds and acoustic scenes. Firstly, the influence of data augmentation in the context of ESC using a deep Convolutional Neural Network (CNN) is studied. Then, the possibility of using Generative Adversarial Networks (GAN) for data augmentation is investigated, and audio data augmentation is implemented using an existing GAN. Next, a GAN framework (EnvGAN) is implemented to generate sounds similar to the ones in three benchmark datasets. In addition, a quantitative similarity metric based on Siamese Neural Network is presented to evaluate the perceptual similarity of synthetic samples generated by EnvGAN. Next, two efficient signal representation techniques that can address the variabilities present in the acoustic scenes are proposed to obtain a robust ASC framework.
Pagination: 
URI: http://hdl.handle.net/10603/528343
Appears in Departments:College of Engineering Trivandrum

Files in This Item:
File Description SizeFormat 
01_title.pdfAttached File447.18 kBAdobe PDFView/Open
02_preliminary pages.pdf469.62 kBAdobe PDFView/Open
03_contents.pdf104.89 kBAdobe PDFView/Open
04_abstract.pdf73.51 kBAdobe PDFView/Open
05_chapter 1.pdf119.51 kBAdobe PDFView/Open
06_chapter 2.pdf162.12 kBAdobe PDFView/Open
07_chapter 3.pdf902.38 kBAdobe PDFView/Open
08_chapter 4.pdf1.85 MBAdobe PDFView/Open
09_chapter 5.pdf776.48 kBAdobe PDFView/Open
10_chapter 6.pdf225.57 kBAdobe PDFView/Open
11_chapter 7.pdf1.47 MBAdobe PDFView/Open
12_chapter 8.pdf3.31 MBAdobe PDFView/Open
13_chapter 9.pdf116.04 kBAdobe PDFView/Open
17_annexture.pdf196.97 kBAdobe PDFView/Open
80_recommendation.pdf563.86 kBAdobe PDFView/Open
Show full item record


Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge: