Please use this identifier to cite or link to this item:
http://hdl.handle.net/10603/551005
Title: | Dysarthria severity classification and speech transcription using deep learning frameworks |
Researcher: | AMLU ANNA JOSHY |
Guide(s): | Rajeev Rajan |
Keywords: | Engineering Engineering and Technology Engineering Electrical and Electronic |
University: | APJ Abdul Kalam Technological University, Thiruvananthapuram |
Completed Date: | 2023 |
Abstract: | Dysarthria is a motor speech disorder that can lead to imprecise articulations, atypical prosody, and variable speech rate. It results in poor comprehension of the dysarthric speech by the listeners, and hence reduced social activity of the dysarthrics. This research aims to assess the severity of dysarthria and improve the transcription of dysarthric speech. Dysarthria severity assessment could be treated as a diagnostic step for medication and speech therapy. Different deep learning architectures are implemented for automating this using acoustic features namely, mel frequency cepstral coefficients, constant-Q cepstral coefficients, i-vectors, and speech disorder specific features. Deep neural network, convolutional neural network(CNN), gated recurrent units, and long short-term memory networks are evaluated for their classification performance. As a further step,a detailed investigation is done to rank these features for their efficacy in highlighting the pathological aspects of the dysarthric speech using the technique of paraconsistent feature engineering. The potency of mel spectrograms in characterizing the dysarthric speech is analyzed using deep CNN models. The effectiveness of using residual connections, squeeze-and-excitation modules, multi-head attention mechanism, and the multi-task learning approach using gender, age, and disorder-type identifications as auxiliary tasks, are studied in this regard. Comparison against the existing approaches in the literature and a number of baseline classifiers shows that good margins in terms of classification accuracy are obtained by the proposed models. A dysarthric speech recognizer can enable interface for assistive technologies. But, due to the scarcity of dysarthric speech databases, the advancements in deep learning cannot be effectively used in building them. Fine-tuning of the wav2vec 2.0 model is done to address this, and the utility of severity information in identifying the common deficits affecting transcription is analysed along. |
Pagination: | |
URI: | http://hdl.handle.net/10603/551005 |
Appears in Departments: | College of Engineering Trivandrum |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
01_title.pdf | Attached File | 126.35 kB | Adobe PDF | View/Open |
02_preliminary pages.pdf | 317.48 kB | Adobe PDF | View/Open | |
03_contents.pdf | 53.84 kB | Adobe PDF | View/Open | |
04_abstract.pdf | 39.8 kB | Adobe PDF | View/Open | |
05_chapter 1.pdf | 85.95 kB | Adobe PDF | View/Open | |
06_chapter 2.pdf | 642.57 kB | Adobe PDF | View/Open | |
07_chapter 3.pdf | 6.96 MB | Adobe PDF | View/Open | |
08_chapter 4.pdf | 790.66 kB | Adobe PDF | View/Open | |
80_recommendation.pdf | 192.56 kB | Adobe PDF | View/Open |
Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).
Altmetric Badge: