Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/552876
Title: Predominant instrument recognition and separation in polyphonic music using deep learning frameworks
Researcher: LEKSHMI C R
Guide(s): Rajeev Rajan
Keywords: Engineering
Engineering and Technology
Engineering Electrical and Electronic
University: APJ Abdul Kalam Technological University, Thiruvananthapuram
Completed Date: 2023
Abstract: Music performs an indispensable and essential role in our life as it is a way of expressing our emotions and feelings. The interplay of several instruments can be thought of as the basis of music. Automatic recognition of prominent instruments is a highly challenging task in this field. This research primarily focused on the predominant instrument recognition and separation in polyphonic music. The accessibility of this data may improve recommender system performance, and enable indexing as well as retrieval operations for managing huge multimedia archives. Other beneficial applications are source separation, genre recognition, music transcription, and instrument-specific equalization. newlineIn the first task, the automatic identification of multiple predominant instruments in polyphonic music is addressed using convolutional neural networks (CNN), convolutional recurrent neural networks (CRNN), and transformers like Vision Transformer (Vi-T) and Shifted window transformer (Swin-T) through Mel-spectrogram, modgdgram, and its fusion. Modgdgram, a visual representation is obtained by stacking modified group delay functions of consecutive frames successively. CNN and transformer architecture learn the distinctive local characteristics from the visual representation and classify the instrument to the group to which it belongs. We also experimented with a CNN with multi-head attention for the proposed task. The proposed system is systematically evaluated using the IRMAS dataset with eleven classes. A wave generative adversarial network (WaveGAN) architecture is also employed as a part of data augmentation. The efficacy of the proposed WaveGAN augmentation technique is compared with traditional augmentations. We attempted three fusion schemes early fusion or multi-channel input framework, mid-level fusion, and late fusion or multi-stream framework, and found that late fusion is a promising paradigm for estimating multiple predominant instruments in polyphonic music.
Pagination: 
URI: http://hdl.handle.net/10603/552876
Appears in Departments:College of Engineering Trivandrum

Files in This Item:
File Description SizeFormat 
01_title.pdfAttached File19.14 kBAdobe PDFView/Open
02_preliminaries.pdf461.56 kBAdobe PDFView/Open
03_contents.pdf80.15 kBAdobe PDFView/Open
04_abstract.pdf76.38 kBAdobe PDFView/Open
05_chapter 1.pdf397.07 kBAdobe PDFView/Open
06_chapter 2.pdf193.72 kBAdobe PDFView/Open
07_chapter 3.pdf7.35 MBAdobe PDFView/Open
08_chapter 4.pdf2.11 MBAdobe PDFView/Open
09_annexure.pdf189.57 kBAdobe PDFView/Open
80_recommendation.pdf100.76 kBAdobe PDFView/Open
Show full item record


Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge: