Please use this identifier to cite or link to this item:
http://hdl.handle.net/10603/528343
Title: | Deep Learning Methods for Automatic Identification of Environmental Sounds and Acoustic Scenes |
Researcher: | Aswathy Madhu |
Guide(s): | Suresh K |
Keywords: | Engineering Engineering and Technology Engineering Electrical and Electronic |
University: | APJ Abdul Kalam Technological University, Thiruvananthapuram |
Completed Date: | 2023 |
Abstract: | Computer audition has garnered attention from the audio and acoustic signal processing community during the past decade. This growing interest is due to its attractive audio surveillance and healthcare applications. Two fundamental problems in computer audition are automatic Environmental Sound Classification (ESC) and Acoustic Scene Classification (ASC). Despite promising application prospects, they are overshadowed by popular research areas like Automatic Speech Recognition and Music Information Recognition. It is due to the challenges posed by the environmental sounds and acoustic scenes in terms of their complex nature, lack of high-level structures usually observed in speech/music, and the large degree of intra-class and inter-class variabilities. Recently, deep learning approaches have been gaining popularity for both ESC and ASC. But the robustness of deep learning approaches depends mainly on the amount of available data and the audio signal representation. Moreover, the ASC designers have shifted their focus from improving accuracy to incorporating real-world considerations. newlineTherefore, the research work embodied in this thesis aims to develop robust deep learning frameworks to identify environmental sounds and acoustic scenes. Firstly, the influence of data augmentation in the context of ESC using a deep Convolutional Neural Network (CNN) is studied. Then, the possibility of using Generative Adversarial Networks (GAN) for data augmentation is investigated, and audio data augmentation is implemented using an existing GAN. Next, a GAN framework (EnvGAN) is implemented to generate sounds similar to the ones in three benchmark datasets. In addition, a quantitative similarity metric based on Siamese Neural Network is presented to evaluate the perceptual similarity of synthetic samples generated by EnvGAN. Next, two efficient signal representation techniques that can address the variabilities present in the acoustic scenes are proposed to obtain a robust ASC framework. |
Pagination: | |
URI: | http://hdl.handle.net/10603/528343 |
Appears in Departments: | College of Engineering Trivandrum |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
01_title.pdf | Attached File | 447.18 kB | Adobe PDF | View/Open |
02_preliminary pages.pdf | 469.62 kB | Adobe PDF | View/Open | |
03_contents.pdf | 104.89 kB | Adobe PDF | View/Open | |
04_abstract.pdf | 73.51 kB | Adobe PDF | View/Open | |
05_chapter 1.pdf | 119.51 kB | Adobe PDF | View/Open | |
06_chapter 2.pdf | 162.12 kB | Adobe PDF | View/Open | |
07_chapter 3.pdf | 902.38 kB | Adobe PDF | View/Open | |
08_chapter 4.pdf | 1.85 MB | Adobe PDF | View/Open | |
09_chapter 5.pdf | 776.48 kB | Adobe PDF | View/Open | |
10_chapter 6.pdf | 225.57 kB | Adobe PDF | View/Open | |
11_chapter 7.pdf | 1.47 MB | Adobe PDF | View/Open | |
12_chapter 8.pdf | 3.31 MB | Adobe PDF | View/Open | |
13_chapter 9.pdf | 116.04 kB | Adobe PDF | View/Open | |
17_annexture.pdf | 196.97 kB | Adobe PDF | View/Open | |
80_recommendation.pdf | 563.86 kB | Adobe PDF | View/Open |
Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).
Altmetric Badge: