Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/551005
Title: Dysarthria severity classification and speech transcription using deep learning frameworks
Researcher: AMLU ANNA JOSHY
Guide(s): Rajeev Rajan
Keywords: Engineering
Engineering and Technology
Engineering Electrical and Electronic
University: APJ Abdul Kalam Technological University, Thiruvananthapuram
Completed Date: 2023
Abstract: Dysarthria is a motor speech disorder that can lead to imprecise articulations, atypical prosody, and variable speech rate. It results in poor comprehension of the dysarthric speech by the listeners, and hence reduced social activity of the dysarthrics. This research aims to assess the severity of dysarthria and improve the transcription of dysarthric speech. Dysarthria severity assessment could be treated as a diagnostic step for medication and speech therapy. Different deep learning architectures are implemented for automating this using acoustic features namely, mel frequency cepstral coefficients, constant-Q cepstral coefficients, i-vectors, and speech disorder specific features. Deep neural network, convolutional neural network(CNN), gated recurrent units, and long short-term memory networks are evaluated for their classification performance. As a further step,a detailed investigation is done to rank these features for their efficacy in highlighting the pathological aspects of the dysarthric speech using the technique of paraconsistent feature engineering. The potency of mel spectrograms in characterizing the dysarthric speech is analyzed using deep CNN models. The effectiveness of using residual connections, squeeze-and-excitation modules, multi-head attention mechanism, and the multi-task learning approach using gender, age, and disorder-type identifications as auxiliary tasks, are studied in this regard. Comparison against the existing approaches in the literature and a number of baseline classifiers shows that good margins in terms of classification accuracy are obtained by the proposed models. A dysarthric speech recognizer can enable interface for assistive technologies. But, due to the scarcity of dysarthric speech databases, the advancements in deep learning cannot be effectively used in building them. Fine-tuning of the wav2vec 2.0 model is done to address this, and the utility of severity information in identifying the common deficits affecting transcription is analysed along.
Pagination: 
URI: http://hdl.handle.net/10603/551005
Appears in Departments:College of Engineering Trivandrum

Files in This Item:
File Description SizeFormat 
01_title.pdfAttached File126.35 kBAdobe PDFView/Open
02_preliminary pages.pdf317.48 kBAdobe PDFView/Open
03_contents.pdf53.84 kBAdobe PDFView/Open
04_abstract.pdf39.8 kBAdobe PDFView/Open
05_chapter 1.pdf85.95 kBAdobe PDFView/Open
06_chapter 2.pdf642.57 kBAdobe PDFView/Open
07_chapter 3.pdf6.96 MBAdobe PDFView/Open
08_chapter 4.pdf790.66 kBAdobe PDFView/Open
80_recommendation.pdf192.56 kBAdobe PDFView/Open
Show full item record


Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge: