Improvement of Deep Cross Modal Retrieval using Common Sub Space Generation

BHATT NIKITA MAHESHKUMAR

Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/368815

Title:	Improvement of Deep Cross Modal Retrieval using Common Sub Space Generation
Researcher:	BHATT NIKITA MAHESHKUMAR
Guide(s):	Ganatra Amit P
Keywords:	Computer Science Computer Science Artificial Intelligence Engineering and Technology
University:	Charotar University of Science and Technology
Completed Date:	2021
Abstract:	The Cross-Modal Retrieval (CMR) has drawn significant attention due to the tremendous proliferation of Multi-Modal data and the need for flexibility by users. The CMR can perform image-sketch matching, text-image matching, audio-video matching, near infrared-visual image matching, which can be useful in many applications like a criminal investigations, hot topic detection, personalized recommendation, etc. However, the first challenge during CMR is the selection of an appropriate distributional representation model for text modality, which preserves semantic similarities between words. The second challenge in CMR is the modality-gap, as each modality (e.g., text, image, video, audio) has different statistical properties, which does not allow a direct comparison for retrieval. It is resolved by generating a common sub-space, which preserves semantic similarities between heterogeneous modalities. newlineTraditional CMR methods perform independent feature learning and correlation learning to generate a common sub-space, which does not achieve satisfactory performance. Recently, the growth of deep networks has achieved a lot of success in CMR by generating binary-valued or real-valued representation in the common sub-space. Further, existing deep learning-based approaches use pairwise labels to generate binary-valued representation, which gives the benefit of low storage requirement and fast retrieval. However, the relative similarity between heterogeneous data is ignored. newlineand#61623; In the first phase, a framework called quotDeep Cross-Modal Retrieval (DCMR)quot is proposed, which has adopted the Glove model and Multi-Layer Perceptron (MLP) for the text modality and VGG-F network for the image modality. To preserve relative similarities between Multi-Modal data, triplet labels are given as input to DCMR, which project similar(positive) data points nearer and dissimilar(negative) data points far in the vector space from the given query. Extensive experiments are performed, which shows that the performance of binary-valued DCMR increases a
Pagination:
URI:	http://hdl.handle.net/10603/368815
Appears in Departments:	Faculty of Technology and Engineering

Files in This Item:

File	Description	Size	Format
17drce009-full thesis.pdf	Attached File	3.08 MB	Adobe PDF	View/Open
80_recommendation.pdf		298.69 kB	Adobe PDF	View/Open
certificate.pdf		118.9 kB	Adobe PDF	View/Open
chapter 1.pdf		773.72 kB	Adobe PDF	View/Open
chapter 2.pdf		983.89 kB	Adobe PDF	View/Open
chapter 3.pdf		787.26 kB	Adobe PDF	View/Open
chapter 4.pdf		1.06 MB	Adobe PDF	View/Open
chapter 5.pdf		656.49 kB	Adobe PDF	View/Open
chapter 6.pdf		278.75 kB	Adobe PDF	View/Open
chapter 7.pdf		350.8 kB	Adobe PDF	View/Open
preliminary pages.pdf		503.32 kB	Adobe PDF	View/Open
title page.pdf		192.15 kB	Adobe PDF	View/Open

Show full item record

Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge:

Shodhganga : a reservoir of Indian theses @ INFLIBNET