Predominant instrument recognition and separation in polyphonic music using deep learning frameworks

LEKSHMI C R

Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/552876

Title:	Predominant instrument recognition and separation in polyphonic music using deep learning frameworks
Researcher:	LEKSHMI C R
Guide(s):	Rajeev Rajan
Keywords:	Engineering Engineering and Technology Engineering Electrical and Electronic
University:	APJ Abdul Kalam Technological University, Thiruvananthapuram
Completed Date:	2023
Abstract:	Music performs an indispensable and essential role in our life as it is a way of expressing our emotions and feelings. The interplay of several instruments can be thought of as the basis of music. Automatic recognition of prominent instruments is a highly challenging task in this field. This research primarily focused on the predominant instrument recognition and separation in polyphonic music. The accessibility of this data may improve recommender system performance, and enable indexing as well as retrieval operations for managing huge multimedia archives. Other beneficial applications are source separation, genre recognition, music transcription, and instrument-specific equalization. newlineIn the first task, the automatic identification of multiple predominant instruments in polyphonic music is addressed using convolutional neural networks (CNN), convolutional recurrent neural networks (CRNN), and transformers like Vision Transformer (Vi-T) and Shifted window transformer (Swin-T) through Mel-spectrogram, modgdgram, and its fusion. Modgdgram, a visual representation is obtained by stacking modified group delay functions of consecutive frames successively. CNN and transformer architecture learn the distinctive local characteristics from the visual representation and classify the instrument to the group to which it belongs. We also experimented with a CNN with multi-head attention for the proposed task. The proposed system is systematically evaluated using the IRMAS dataset with eleven classes. A wave generative adversarial network (WaveGAN) architecture is also employed as a part of data augmentation. The efficacy of the proposed WaveGAN augmentation technique is compared with traditional augmentations. We attempted three fusion schemes early fusion or multi-channel input framework, mid-level fusion, and late fusion or multi-stream framework, and found that late fusion is a promising paradigm for estimating multiple predominant instruments in polyphonic music.
Pagination:
URI:	http://hdl.handle.net/10603/552876
Appears in Departments:	College of Engineering Trivandrum

Files in This Item:

File	Description	Size	Format
01_title.pdf	Attached File	19.14 kB	Adobe PDF	View/Open
02_preliminaries.pdf		461.56 kB	Adobe PDF	View/Open
03_contents.pdf		80.15 kB	Adobe PDF	View/Open
04_abstract.pdf		76.38 kB	Adobe PDF	View/Open
05_chapter 1.pdf		397.07 kB	Adobe PDF	View/Open
06_chapter 2.pdf		193.72 kB	Adobe PDF	View/Open
07_chapter 3.pdf		7.35 MB	Adobe PDF	View/Open
08_chapter 4.pdf		2.11 MB	Adobe PDF	View/Open
09_annexure.pdf		189.57 kB	Adobe PDF	View/Open
80_recommendation.pdf		100.76 kB	Adobe PDF	View/Open

Show full item record

Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge:

Shodhganga : a reservoir of Indian theses @ INFLIBNET