Please use this identifier to cite or link to this item:
http://hdl.handle.net/10603/552876
Title: | Predominant instrument recognition and separation in polyphonic music using deep learning frameworks |
Researcher: | LEKSHMI C R |
Guide(s): | Rajeev Rajan |
Keywords: | Engineering Engineering and Technology Engineering Electrical and Electronic |
University: | APJ Abdul Kalam Technological University, Thiruvananthapuram |
Completed Date: | 2023 |
Abstract: | Music performs an indispensable and essential role in our life as it is a way of expressing our emotions and feelings. The interplay of several instruments can be thought of as the basis of music. Automatic recognition of prominent instruments is a highly challenging task in this field. This research primarily focused on the predominant instrument recognition and separation in polyphonic music. The accessibility of this data may improve recommender system performance, and enable indexing as well as retrieval operations for managing huge multimedia archives. Other beneficial applications are source separation, genre recognition, music transcription, and instrument-specific equalization. newlineIn the first task, the automatic identification of multiple predominant instruments in polyphonic music is addressed using convolutional neural networks (CNN), convolutional recurrent neural networks (CRNN), and transformers like Vision Transformer (Vi-T) and Shifted window transformer (Swin-T) through Mel-spectrogram, modgdgram, and its fusion. Modgdgram, a visual representation is obtained by stacking modified group delay functions of consecutive frames successively. CNN and transformer architecture learn the distinctive local characteristics from the visual representation and classify the instrument to the group to which it belongs. We also experimented with a CNN with multi-head attention for the proposed task. The proposed system is systematically evaluated using the IRMAS dataset with eleven classes. A wave generative adversarial network (WaveGAN) architecture is also employed as a part of data augmentation. The efficacy of the proposed WaveGAN augmentation technique is compared with traditional augmentations. We attempted three fusion schemes early fusion or multi-channel input framework, mid-level fusion, and late fusion or multi-stream framework, and found that late fusion is a promising paradigm for estimating multiple predominant instruments in polyphonic music. |
Pagination: | |
URI: | http://hdl.handle.net/10603/552876 |
Appears in Departments: | College of Engineering Trivandrum |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
01_title.pdf | Attached File | 19.14 kB | Adobe PDF | View/Open |
02_preliminaries.pdf | 461.56 kB | Adobe PDF | View/Open | |
03_contents.pdf | 80.15 kB | Adobe PDF | View/Open | |
04_abstract.pdf | 76.38 kB | Adobe PDF | View/Open | |
05_chapter 1.pdf | 397.07 kB | Adobe PDF | View/Open | |
06_chapter 2.pdf | 193.72 kB | Adobe PDF | View/Open | |
07_chapter 3.pdf | 7.35 MB | Adobe PDF | View/Open | |
08_chapter 4.pdf | 2.11 MB | Adobe PDF | View/Open | |
09_annexure.pdf | 189.57 kB | Adobe PDF | View/Open | |
80_recommendation.pdf | 100.76 kB | Adobe PDF | View/Open |
Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).
Altmetric Badge: