Deep Learning Methods for Automatic Identification of Environmental Sounds and Acoustic Scenes

Aswathy Madhu

Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/528343

Title:	Deep Learning Methods for Automatic Identification of Environmental Sounds and Acoustic Scenes
Researcher:	Aswathy Madhu
Guide(s):	Suresh K
Keywords:	Engineering Engineering and Technology Engineering Electrical and Electronic
University:	APJ Abdul Kalam Technological University, Thiruvananthapuram
Completed Date:	2023
Abstract:	Computer audition has garnered attention from the audio and acoustic signal processing community during the past decade. This growing interest is due to its attractive audio surveillance and healthcare applications. Two fundamental problems in computer audition are automatic Environmental Sound Classification (ESC) and Acoustic Scene Classification (ASC). Despite promising application prospects, they are overshadowed by popular research areas like Automatic Speech Recognition and Music Information Recognition. It is due to the challenges posed by the environmental sounds and acoustic scenes in terms of their complex nature, lack of high-level structures usually observed in speech/music, and the large degree of intra-class and inter-class variabilities. Recently, deep learning approaches have been gaining popularity for both ESC and ASC. But the robustness of deep learning approaches depends mainly on the amount of available data and the audio signal representation. Moreover, the ASC designers have shifted their focus from improving accuracy to incorporating real-world considerations. newlineTherefore, the research work embodied in this thesis aims to develop robust deep learning frameworks to identify environmental sounds and acoustic scenes. Firstly, the influence of data augmentation in the context of ESC using a deep Convolutional Neural Network (CNN) is studied. Then, the possibility of using Generative Adversarial Networks (GAN) for data augmentation is investigated, and audio data augmentation is implemented using an existing GAN. Next, a GAN framework (EnvGAN) is implemented to generate sounds similar to the ones in three benchmark datasets. In addition, a quantitative similarity metric based on Siamese Neural Network is presented to evaluate the perceptual similarity of synthetic samples generated by EnvGAN. Next, two efficient signal representation techniques that can address the variabilities present in the acoustic scenes are proposed to obtain a robust ASC framework.
Pagination:
URI:	http://hdl.handle.net/10603/528343
Appears in Departments:	College of Engineering Trivandrum

Files in This Item:

File	Description	Size	Format
01_title.pdf	Attached File	447.18 kB	Adobe PDF	View/Open
02_preliminary pages.pdf		469.62 kB	Adobe PDF	View/Open
03_contents.pdf		104.89 kB	Adobe PDF	View/Open
04_abstract.pdf		73.51 kB	Adobe PDF	View/Open
05_chapter 1.pdf		119.51 kB	Adobe PDF	View/Open
06_chapter 2.pdf		162.12 kB	Adobe PDF	View/Open
07_chapter 3.pdf		902.38 kB	Adobe PDF	View/Open
08_chapter 4.pdf		1.85 MB	Adobe PDF	View/Open
09_chapter 5.pdf		776.48 kB	Adobe PDF	View/Open
10_chapter 6.pdf		225.57 kB	Adobe PDF	View/Open
11_chapter 7.pdf		1.47 MB	Adobe PDF	View/Open
12_chapter 8.pdf		3.31 MB	Adobe PDF	View/Open
13_chapter 9.pdf		116.04 kB	Adobe PDF	View/Open
17_annexture.pdf		196.97 kB	Adobe PDF	View/Open
80_recommendation.pdf		563.86 kB	Adobe PDF	View/Open

Show full item record

Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge:

Shodhganga : a reservoir of Indian theses @ INFLIBNET