Some Investigations on Attention Mechanism based Deep Learning Models for Speech Enhancement

Sivaramakrishna, Yechuri

Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/547934

Title:	Some Investigations on Attention Mechanism based Deep Learning Models for Speech Enhancement
Researcher:	Sivaramakrishna, Yechuri
Guide(s):	Dayal, V Sunny
Keywords:	Adaptive Wiener Gain NMF Speech Enhancemen
University:	Vellore Institute of Technology (VIT-AP)
Completed Date:	2024
Abstract:	Noise is all around us. When individuals speak, excessive environmental noise newlinecreates transmission issues and has a severe negative impact on intelligibility and speech newlinequality. To address this issue, speech enhancement methods are used to extract clean newlinespeech from environmental disturbances. newlineIn first part, we propose a novel single channel speech enhancement algorithm using newlineiterative constrained Non-negative matrix factorization (NMF) based adaptive Wiener newlinegain for non-stationary noise. The Wiener filter performance depends on the adaptive newlinegain factor value. The adaptive gain factor (and#945;) value is constant regardless of noise type and signal to noise ratio (SNR), so it will affect speech enhancement performance. To overcome this, the adaptive factor value is calculated using a genetic algorithm (GA). newlineHere, the GA adjusts the adaptive Wiener gain based on noise type and SNR level. The newlineGA-based adaptive Wiener gain minimizes Wiener filter estimation errors and improves newlinespeech quality by adjusting the base vector weights of noise and speech. Additionally, newlinethe iterative constraints NMF (IC-NMF) method for calculating the priors from noisy newlinespeech magnitudes. We select the Erlang, Inverse Gamma, Students-t, and Inverse Nak- newlineagami distributions for speech priors and Gaussian distributions for noise priors. Noise and speech samples are well correlated with those distributions. newlineIn the second part, we propose a U-Net with a gated recurrent unit and an efficient newlinechannel attention mechanism for real-time speech enhancement. The proposed U-Net newlinemodel uses skip connections to improve information flow. A novel cross-channel in- newlineteraction can be implemented via the ECA module without dimensionality reduction. newlineIn module testing, choosing an adaptable kernel size for the ECA improved network newlineperformance significantly. Additionally, the U-Net architecture uses gated recurrent newlineunits (GRU), which yields a causal system suitable for real-world use. GRU is used for newlinelearning long-range dependencies. newlineIn the third part, the advanced improv
Pagination:	xvi,151
URI:	http://hdl.handle.net/10603/547934
Appears in Departments:	Department of Electronics Engineering

Files in This Item:

File	Description	Size	Format
01_title.pdf	Attached File	69.86 kB	Adobe PDF	View/Open
02_prelim pages.pdf		202.31 kB	Adobe PDF	View/Open
03_content.pdf		51.52 kB	Adobe PDF	View/Open
04_abstract.pdf		87.25 kB	Adobe PDF	View/Open
05_chapter_1.pdf		344.19 kB	Adobe PDF	View/Open
06_chapter_2.pdf		1.06 MB	Adobe PDF	View/Open
07_chapter_3.pdf		663.26 kB	Adobe PDF	View/Open
08_chapter_4.pdf		659.03 kB	Adobe PDF	View/Open
09_chapter_5.pdf		960.1 kB	Adobe PDF	View/Open
10_chapter_6.pdf		1.42 MB	Adobe PDF	View/Open
11_chapter_7.pdf		1.44 MB	Adobe PDF	View/Open
12_references and publications.pdf		221.13 kB	Adobe PDF	View/Open
80_recommendation.pdf		46.48 kB	Adobe PDF	View/Open

Show full item record

Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge:

Shodhganga : a reservoir of Indian theses @ INFLIBNET