Dysarthria severity classification and speech transcription using deep learning frameworks

AMLU ANNA JOSHY

Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/551005

Title:	Dysarthria severity classification and speech transcription using deep learning frameworks
Researcher:	AMLU ANNA JOSHY
Guide(s):	Rajeev Rajan
Keywords:	Engineering Engineering and Technology Engineering Electrical and Electronic
University:	APJ Abdul Kalam Technological University, Thiruvananthapuram
Completed Date:	2023
Abstract:	Dysarthria is a motor speech disorder that can lead to imprecise articulations, atypical prosody, and variable speech rate. It results in poor comprehension of the dysarthric speech by the listeners, and hence reduced social activity of the dysarthrics. This research aims to assess the severity of dysarthria and improve the transcription of dysarthric speech. Dysarthria severity assessment could be treated as a diagnostic step for medication and speech therapy. Different deep learning architectures are implemented for automating this using acoustic features namely, mel frequency cepstral coefficients, constant-Q cepstral coefficients, i-vectors, and speech disorder specific features. Deep neural network, convolutional neural network(CNN), gated recurrent units, and long short-term memory networks are evaluated for their classification performance. As a further step,a detailed investigation is done to rank these features for their efficacy in highlighting the pathological aspects of the dysarthric speech using the technique of paraconsistent feature engineering. The potency of mel spectrograms in characterizing the dysarthric speech is analyzed using deep CNN models. The effectiveness of using residual connections, squeeze-and-excitation modules, multi-head attention mechanism, and the multi-task learning approach using gender, age, and disorder-type identifications as auxiliary tasks, are studied in this regard. Comparison against the existing approaches in the literature and a number of baseline classifiers shows that good margins in terms of classification accuracy are obtained by the proposed models. A dysarthric speech recognizer can enable interface for assistive technologies. But, due to the scarcity of dysarthric speech databases, the advancements in deep learning cannot be effectively used in building them. Fine-tuning of the wav2vec 2.0 model is done to address this, and the utility of severity information in identifying the common deficits affecting transcription is analysed along.
Pagination:
URI:	http://hdl.handle.net/10603/551005
Appears in Departments:	College of Engineering Trivandrum

Files in This Item:

File	Description	Size	Format
01_title.pdf	Attached File	126.35 kB	Adobe PDF	View/Open
02_preliminary pages.pdf		317.48 kB	Adobe PDF	View/Open
03_contents.pdf		53.84 kB	Adobe PDF	View/Open
04_abstract.pdf		39.8 kB	Adobe PDF	View/Open
05_chapter 1.pdf		85.95 kB	Adobe PDF	View/Open
06_chapter 2.pdf		642.57 kB	Adobe PDF	View/Open
07_chapter 3.pdf		6.96 MB	Adobe PDF	View/Open
08_chapter 4.pdf		790.66 kB	Adobe PDF	View/Open
80_recommendation.pdf		192.56 kB	Adobe PDF	View/Open

Show full item record

Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge:

Shodhganga : a reservoir of Indian theses @ INFLIBNET